Troubleshooting
Diagnose and resolve common Arctic connectivity, handshake, and configuration problems
This page groups the problems operators hit most often into three areas: connectivity between agents, peer handshake failures, and configuration changes that do not take effect. Each entry lists the symptoms, the commands that narrow down the cause, and the fix.
Arctic v1.4.0 runs its TProxy and IP-tunnel data planes in-process and commits
firewall rules straight to the kernel over netlink. There is no separate proxy
daemon to restart, no .nft file on disk to edit, and no kernel WireGuard
interface to inspect. The commands below reflect that: you read kernel state to
diagnose, but the agent is the only thing that writes it.
Connectivity
Agent not responding
Symptoms: curl http://AGENT_IP:8080/livez times out or is refused, and CLI
commands fail with a connection error.
Check the service first:
systemctl status arcticIf it is not running, start it and read the recent log:
systemctl start arctic
journalctl -u arctic -n 50If the service is up, confirm the agent is listening on the API port:
ss -tlnp | grep 8080A healthy agent shows a listener owned by the arctic process:
LISTEN 0 4096 *:8080 *:* users:(("arctic",...))If the process is listening but the host still refuses the connection, a host firewall is the usual cause. These commands inspect the host's own firewall, which is separate from the tables Arctic manages:
# nftables
nft list ruleset | grep 8080
# iptables
iptables -L INPUT -n | grep 8080
# firewalld
firewall-cmd --list-portsOpen TCP 8080, or stop the conflicting firewall (see
Prerequisites for the
recommended host setup). If another process already holds port 8080, free it or
set API_PORT to a different value before starting the agent.
Peers cannot communicate
Symptoms: handshakes fail, heartbeats do not arrive, or peers show as unhealthy.
Test the API path from one host to the other:
curl http://PEER_IP:8080/livezThe IP tunnel carries non-TCP traffic over UDP 51840. Confirm that path is open as well:
nc -u PEER_IP 51840If either fails, look at the route between the hosts:
traceroute PEER_IP
mtr PEER_IPBoth TCP 8080 and UDP 51840 must be reachable in both directions. When agents sit on different networks, check for NAT in the path and verify routing between the subnets.
Traffic is not being routed
Symptoms: a service exists but traffic does not flow, or packets are not being picked up by the proxy.
Confirm the service and its routes:
arctic services list
arctic services get SERVICE_IDInspect the firewall tables the agent commits to the kernel. TCP classification
lives in the arctic table; tunnel marking lives in arctic_iptun:
nft list table inet arctic
nft list table inet arctic_iptunYou should see rules matching the source and destination CIDRs of your routes. If they are missing, check the firewall reconciler's log:
journalctl -u arctic | grep 'reconciler=firewall'Resolution: verify the route CIDRs match the traffic you expect, force a cluster
sync with arctic cluster sync, and confirm the source peer of the service is
the agent you are testing from.
MACVLAN interface not created
Symptoms: a service sets requires_interface but no interface appears, or the
interface exists without an address.
List interfaces and addresses:
ip link show
ip addr showA service interface is named from the service ID, truncated to the kernel's 15-character limit, so look for a device matching the start of your service ID.
Check the network reconciler's log for the reason it was skipped or failed:
journalctl -u arctic | grep 'reconciler=network'Resolution: confirm the host has a suitable parent interface, that the agent has
CAP_NET_ADMIN (the systemd unit grants it), and that the interface name does
not collide with an existing device.
DNS resolution
Symptoms: agents are unreachable by hostname, or lookups fail inside tunneled traffic.
nslookup HOSTNAME
dig HOSTNAMEResolution: verify the host's resolvers, decide whether DNS should travel through Arctic at all, and add a route for the DNS server's IP if it should.
High latency
Symptoms: traffic through Arctic is slow, or round-trip times are high.
Compare a direct path against the tunneled path, and look for loss:
ping PEER_IP
mtr DESTINATIONCheck whether a bandwidth limit is shaping the service:
arctic services get SERVICE_IDResolution: raise or remove the bandwidth_limit_mbps limit if it is too low,
consider KCP transport on lossy links or for short, interactive flows, and rule
out congestion on the underlying network.
Collecting debug information
When you open a support ticket, attach the output of:
arctic version
systemctl status arctic
journalctl -u arctic -n 100
ip addr show
ip route show
nft list table inet arctic
nft list table inet arctic_iptun
arctic peers list
arctic services listHandshake failures
How a handshake works
When you add a peer, the two agents run a challenge-response handshake before they trust each other:
- The initiator sends a 32-byte random challenge.
- Both sides sign a message built from the challenge, the peer IDs, and the license ID.
- Each side verifies the other's signature against the public key it holds.
Only after this succeeds do the peers accept each other's gossip. A failed handshake is retried up to three times, after which the peer is marked unreachable until something changes. See Clustering for the full trust model.
Common errors
Connection refused
Error: handshake failed: connection refusedThe TCP connection to the remote agent could not be made. Confirm the remote
agent is up (curl http://REMOTE_IP:8080/livez), that the host is reachable
(ping REMOTE_IP), and that TCP 8080 is open.
Connection timeout
Error: handshake failed: connection timeoutA path exists but the connection does not complete. Look for a firewall dropping packets, a NAT in the way, or the remote agent listening on a different interface than the one you are reaching.
License mismatch
Error: handshake failed: license mismatchThe two agents were bootstrapped with licenses that carry different customer identities, so they refuse to join the same cluster. Compare the license on each agent:
# Local agent
arctic license status
# Remote agent
arctic license status --url http://REMOTE_IP:8080If they differ, re-bootstrap one agent with the correct license.
Invalid signature
Error: handshake failed: invalid signatureThe peer's signature did not verify against the keys it should have. This points to a corrupted or replaced peer key. Re-bootstrap the affected agent, and contact support if it recurs.
Peer already exists
Error: peer already exists in clusterThe peer is already in the cluster. List peers to confirm:
arctic peers listIf you genuinely need to re-add it, delete it first:
arctic peers delete PEER_ID --yesNode limit exceeded
Error: handshake failed: node limit exceededThe license caps the number of nodes and the cluster is at that cap. Check the limit:
arctic license statusRemove unused peers to free a slot, or update to a license with a higher node count.
Debugging steps
Run the failing command with debug output, or trace the HTTP exchange:
arctic peers add REMOTE_IP:8080 --debug
arctic peers add REMOTE_IP:8080 --traceWatch the logs on both agents while the handshake runs:
# Local agent
journalctl -u arctic -f
# Remote agent
ssh user@REMOTE_IP journalctl -u arctic -fRead the remote agent's cluster identity. This endpoint needs no authentication, which makes it a quick way to confirm what cluster the remote agent thinks it belongs to:
curl http://REMOTE_IP:8080/v1/cluster/identity{
"peer_id": "peer_01HXYZ...",
"public_key": "base64...",
"license_id": "lic_...",
"cluster_id": "clu_01HABC...",
"version": "v1.4.0"
}Confirm license_id matches the rest of your cluster. A handshake needs traffic
in both directions, so test the reverse path too:
# Local to remote
curl http://REMOTE_IP:8080/livez
# Remote to local
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livezFirewall requirements
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 8080 | TCP | Bidirectional | Operator API and peer handshake |
| 51840 | UDP | Bidirectional | IP tunnel (non-TCP traffic) |
If agents sit behind NAT, forward TCP 8080 to each agent, give the public address when you add the peer, and keep the mapping stable.
Recovery steps
If handshakes keep failing after the checks above, restart both agents:
systemctl restart arcticAs a last resort, re-bootstrap an agent. This drops its local database and all state stored only on that node:
systemctl stop arctic
rm /opt/tillered/arctic.db
systemctl start arctic
arctic bootstrap --url http://localhost:8080 --license-file license.jsonIf the problem survives a re-bootstrap, contact support.
Configuration not applied
How configuration is applied
A change you make through the API or arctic compose apply does not touch the
kernel directly. It flows like this:
- The change is written to the agent's SQLite database.
- The write fires an event, and the relevant reconciler wakes up.
- The reconciler computes the desired state and applies it: the network reconciler manages MACVLAN interfaces, the firewall reconciler commits nftables rules over netlink, and the TProxy and IP-tunnel reconcilers push fresh config into their in-process engines.
There are no generated config files and nothing to reload by hand. If applied state drifts from the database, the fix is to get the reconciler to run again, not to edit a file.
Symptoms
- A service was created but traffic is not routed.
- Routes were updated but the old routing still applies.
- A bandwidth limit is not taking effect.
- A
requires_interfaceservice has no interface.
Diagnosis
Force a cluster sync and give it a few seconds:
arctic cluster synccurl -X POST http://AGENT_IP:8080/v1/cluster/sync \
-H "Authorization: Bearer $TOKEN"Read the reconciler logs. Each reconciler tags its log lines with a reconciler
field, so you can filter to the one you care about:
journalctl -u arctic | grep -E 'reconciler=(network|firewall|tproxy|iptun)'Inspect the kernel state the agent should have produced:
# Firewall classification and tunnel marking
nft list table inet arctic
nft list table inet arctic_iptun
# Service interfaces
ip link showCommon issues
Firewall rules missing
Symptoms: nft list table inet arctic does not show the rules you expect.
The agent owns these tables and rewrites them on every reconcile; you do not load them yourself. Look for an error in the firewall reconciler's log, then restart the agent to force a full rebuild:
journalctl -u arctic | grep 'reconciler=firewall'
systemctl restart arcticTProxy engine not applying config
Symptoms: TCP routing reflects old service definitions.
Check the TProxy reconciler and engine log, then restart to re-apply from a clean state:
journalctl -u arctic | grep 'reconciler=tproxy'
systemctl restart arcticIP tunnel not applying config
Symptoms: tunnels for non-TCP traffic are not established.
The tunnel runs inside the agent, so there is no separate interface or service to inspect. Confirm UDP 51840 is reachable between the peers, then read the reconciler log:
nc -u PEER_IP 51840
journalctl -u arctic | grep 'reconciler=iptun'Restart the agent if the log shows the engine failing to start or apply.
MACVLAN interface missing
Symptoms: a requires_interface service has no interface.
journalctl -u arctic | grep 'reconciler=network'Confirm the parent interface exists and that the interface name does not collide with an existing device.
Applied state does not match the database
Sometimes the database holds the right data but the kernel does not reflect it. Confirm what the database actually contains:
arctic services list --json
arctic routes list --service SERVICE_ID --jsonIf the data is correct, the reconciler either errored or never ran. Restart the agent to force every reconciler through a full pass:
systemctl restart arcticHow long changes take
A change normally applies within a second or two: the database write fires an
event and the reconciler runs immediately. As a backstop, every core reconciler
also resyncs on a 60-second timer, so a dropped event still self-corrects within
a minute. Cluster-wide changes additionally need a gossip round to reach other
peers; arctic cluster sync forces that round instead of waiting for the next
heartbeat.
Collecting debug information
journalctl -u arctic --since "10 minutes ago"
arctic services list --json
arctic routes list --service SERVICE_ID --json
nft list table inet arctic
nft list table inet arctic_iptun
systemctl status arcticSee also
- Upgrades - upgrading agents and the CLI
- Recovery - break-glass access and clearing a stuck cluster lock
- Clustering - the trust and gossip model behind handshakes