Monitoring
Check agent and cluster health with health endpoints, status commands, and logs
Arctic exposes its health through unauthenticated HTTP endpoints, status commands, and structured logs. Use these to watch a single agent, see the health of the whole cluster, and feed an external uptime or log-aggregation system.
Health endpoints
Every agent serves two health endpoints on the API port (8080 by default) with no authentication, so a load balancer or uptime check can poll them directly.
| Endpoint | Answers | Use for |
|---|---|---|
GET /livez | Is the agent process up? | Liveness probes, uptime checks |
GET /readyz | Is the agent ready to serve and route? | Readiness probes before sending traffic |
curl http://AGENT_IP:8080/livez{"status":"ok","timestamp":"2026-01-15T10:30:00Z"}/readyz returns the same shape plus per-check detail and is non-200 while the agent is still starting or a dependency is not ready. Point a liveness probe at /livez and a readiness probe at /readyz.
Checking one agent
arctic health reports on the agent your CLI is pointed at:
arctic health # liveness
arctic health --readyz # readiness, with per-check detailIt exits non-zero (code 4) when the agent is unreachable, so it works in scripts and CI.
Checking the cluster
arctic cluster status gives a cluster-wide view from the agent you query: each peer's last-seen time, reachability, and registry hashes, plus the local registry totals. Use it to spot a peer that is unreachable or has not converged.
arctic cluster statusRegistry hashes that differ between peers mean state has not converged yet. It usually resolves on the next gossip round, or you can force one with arctic cluster sync. See Clustering for how convergence works.
Logs
The agent logs to the journal. Structured JSON (the default, LOG_FORMAT=json) is the right choice when you ship logs to an aggregator; set LOG_FORMAT=text for readable local output.
# Follow the log
journalctl -u arctic -f
# Recent entries
journalctl -u arctic -n 100 --since "1 hour ago"Each reconciler tags its lines with a reconciler field (firewall, network, tproxy, iptun), so you can filter to one subsystem:
journalctl -u arctic | grep 'reconciler=firewall'Raise LOG_LEVEL to debug temporarily for deeper detail; leave it at info in production.
What to watch
- Liveness:
/livezon every agent. A failure means the process is down. - Readiness:
/readyzbefore routing traffic to an agent. - Peer reachability:
arctic cluster statusfor peers stuck unreachable or with stale last-seen times. - License state: an expiring license changes what the agent allows. Check it with
arctic license status; see Licensing. - Log errors: WARN and ERROR lines, especially repeated reconciler errors.
Planned: Prometheus metrics
Metric-based monitoring through a Prometheus endpoint is planned. Until then, use the health endpoints, status commands, and logs described above.
See also
- Troubleshooting - diagnosing specific failures
- CLI reference -
healthandcluster statusin full - Licensing - license expiry states