Tillered Docs
Maintenance

Recovery

Break-glass access to an Arctic agent and clearing a stuck cluster lock

This guide covers the break-glass paths for regaining control of an Arctic agent when normal operator access is unavailable, and for clearing a cluster lock that is stuck.

Recovery token

Each agent generates a recovery token at startup. The token is an out-of-band credential that grants admin scope for a single request, intended for operators with local root access on the host.

  • Path: /etc/arctic/recovery.token
  • Contents: 32 random bytes, base64url-encoded
  • File mode: 0600, owned by the agent's user

The token is presented to the agent in the X-Arctic-Recovery HTTP header. Any request carrying a matching token is granted admin scope and is logged at WARN so its use can be audited. The token does not establish a session: it authenticates each request that presents it, and it stays valid until the next agent restart.

The token rotates every restart

A fresh token is generated on every agent startup, which invalidates the previous one. Do not cache or distribute it for long-term use; read it fresh from /etc/arctic/recovery.token each time you need it, and restart the agent after recovery work completes to retire the token you used.

When to use it

Reach for the recovery token when:

  • Operator credentials are lost. The client ID / secret for the cluster are gone and you need to authenticate to rotate them or create new credentials.
  • You need to reach an api_access: internal peer. Internal-only peers reject user-facing endpoints from normal clients; the recovery token is the only way to drive operator operations against them.

Passing the token to the CLI

The CLI resolves the recovery token from three sources, in order of precedence:

  1. --recovery-token <value> flag
  2. $ARCTIC_RECOVERY_TOKEN environment variable
  3. --recovery-token-file <path> flag

For example, reading the token straight off the host:

arctic --recovery-token-file /etc/arctic/recovery.token peers list

Or with the value inline:

ARCTIC_RECOVERY_TOKEN="$(sudo cat /etc/arctic/recovery.token)" \
  arctic credentials rotate

Recovery can be disabled

If the token path is not writable when the agent starts, recovery is disabled for that run; the agent does not fall back to open access. Internal-only mode then becomes strict and there is no break-glass path until the path is made writable and the agent is restarted.

The security boundary for the token is filesystem permissions alone. An attacker with local root on the host can read the token, but that attacker can already do anything to the host, so the token grants no additional reach.

Clearing a stuck cluster lock

Compose apply takes a cluster-wide lock so two operators do not apply conflicting changes at once. If an apply is interrupted (for example the machine running it is killed mid-run), the lock can be left held. Subsequent applies then fail with a contention error that includes the lock ID.

Release the stale lock with:

arctic state unlock <lock-id>

The contention error reports the lock ID to pass here. Useful flags:

FlagDescription
--forceAdmin override on the cluster tier; bypasses the holder-id check.
--cluster-onlyRelease only the cluster lock, not the local file lock.
--local-onlyRelease only the local file lock, not the cluster lock.
--state-dir <path>Override the .arctic/ state directory location.

The unlock operation uses the cluster.lock scope. Only run it once you are sure no other apply is genuinely in progress; releasing a live lock can let two applies collide.

See Also

On this page