Tillered Docs | Monitoring

Arctic exposes its health through unauthenticated HTTP endpoints, status commands, and structured logs. Use these to watch a single agent, see the health of the whole cluster, and feed an external uptime or log-aggregation system.

Health endpoints

Every agent serves two health endpoints on the API port (8080 by default) with no authentication, so a load balancer or uptime check can poll them directly.

Endpoint	Answers	Use for
`GET /livez`	Is the agent process up?	Liveness probes, uptime checks
`GET /readyz`	Is the agent ready to serve and route?	Readiness probes before sending traffic

curl http://AGENT_IP:8080/livez

{"status":"ok","timestamp":"2026-01-15T10:30:00Z"}

/readyz returns the same shape plus per-check detail and is non-200 while the agent is still starting or a dependency is not ready. Point a liveness probe at /livez and a readiness probe at /readyz.

Checking one agent

arctic health reports on the agent your CLI is pointed at:

arctic health            # liveness
arctic health --readyz   # readiness, with per-check detail

It exits non-zero (code 4) when the agent is unreachable, so it works in scripts and CI.

Checking the cluster

arctic cluster status gives a cluster-wide view from the agent you query: each peer's last-seen time, reachability, and registry hashes, plus the local registry totals. Use it to spot a peer that is unreachable or has not converged.

arctic cluster status

Registry hashes that differ between peers mean state has not converged yet. It usually resolves on the next gossip round, or you can force one with arctic cluster sync. See Clustering for how convergence works.

Logs

The agent logs to the journal. Structured JSON (the default, LOG_FORMAT=json) is the right choice when you ship logs to an aggregator; set LOG_FORMAT=text for readable local output.

# Follow the log
journalctl -u arctic -f

# Recent entries
journalctl -u arctic -n 100 --since "1 hour ago"

Each reconciler tags its lines with a reconciler field (firewall, network, tproxy, iptun), so you can filter to one subsystem:

journalctl -u arctic | grep 'reconciler=firewall'

Raise LOG_LEVEL to debug temporarily for deeper detail; leave it at info in production.

What to watch

Liveness: /livez on every agent. A failure means the process is down.
Readiness: /readyz before routing traffic to an agent.
Peer reachability: arctic cluster status for peers stuck unreachable or with stale last-seen times.
License state: an expiring license changes what the agent allows. Check it with arctic license status; see Licensing.
Log errors: WARN and ERROR lines, especially repeated reconciler errors.

Planned: Prometheus metrics

Metric-based monitoring through a Prometheus endpoint is planned. Until then, use the health endpoints, status commands, and logs described above.

Monitoring

Health endpoints

Checking one agent

Checking the cluster

Logs

What to watch

See also

On this page