Troubleshooting
How to diagnose and resolve common Arctic issues
Connectivity issues
This section helps you diagnose and resolve connectivity problems between Arctic agents and clients.
Agent not responding
Symptoms
curl http://AGENT_IP:8080/liveztimes out or fails- CLI commands fail with "connection refused" or timeout errors
Diagnosis
1. Check agent service status
systemctl status arcticIf the service is not running:
systemctl start arctic
journalctl -u arctic -n 502. Check agent is listening
ss -tlnp | grep 8080Expected output shows the agent listening:
LISTEN 0 4096 *:8080 *:* users:(("arctic",...))3. Check firewall rules
# iptables
iptables -L INPUT -n | grep 8080
# nftables
nft list ruleset | grep 8080
# firewalld
firewall-cmd --list-portsEnsure port 8080 is allowed.
Resolution
- Start the agent service if stopped
- Open port 8080 in the firewall
- Check for conflicting services on port 8080
Peers cannot communicate
Symptoms
- Peer handshake fails
- Heartbeats not being received
- Peers showing as unhealthy
Diagnosis
1. Test direct connectivity
From one agent host to another:
curl http://PEER_IP:8080/livez2. Check UDP tunnel port
The IP tunnel uses UDP port 51840:
# Test UDP connectivity
nc -u PEER_IP 518403. Check network path
traceroute PEER_IP
mtr PEER_IPLook for packet loss or high latency.
Resolution
- Ensure both TCP 8080 and UDP 51840 are open between agents
- Check for NAT issues if agents are on different networks
- Verify routing between the networks
Traffic not being routed
Symptoms
- Services are created but traffic does not flow
- Packets are not being captured by TProxy
Diagnosis
1. Check service configuration
arctic services list
arctic services get SERVICE_IDVerify the service exists and has routes.
2. Check NFTables rules
nft list ruleset | grep -A 10 arcticVerify rules exist for your routes.
3. Check agent subsystems are running
journalctl -u arctic | grep -E "(tproxymgr|iptunmgr)"Resolution
- Trigger a config sync:
arctic cluster sync - Verify routes match the traffic you expect to capture
- Check that source/dest CIDRs are correct
MACVLAN interface issues
Symptoms
- Service requires interface but none is created
- Interface exists but has no IP address
Diagnosis
1. List network interfaces
ip link show
ip addr showLook for interfaces named after service IDs (first 15 characters).
2. Check Network Manager logs
journalctl -u arctic | grep netmgrResolution
- Verify the host has a suitable parent interface
- Check the agent has CAP_NET_ADMIN capability
- Ensure no interface name conflicts exist
DNS resolution issues
Symptoms
- Cannot reach agents by hostname
- DNS lookups fail within tunneled traffic
Diagnosis
nslookup HOSTNAME
dig HOSTNAMEResolution
- Verify DNS servers are configured correctly
- Check if DNS traffic should be routed through Arctic
- Add routes for DNS server IPs if needed
High latency
Symptoms
- Traffic through Arctic is slow
- High ping times between services
Diagnosis
1. Measure baseline latency
# Direct connection
ping PEER_IP
# Through Arctic tunnel
ping DESTINATION_THROUGH_TUNNEL2. Check for packet loss
mtr DESTINATION3. Check bandwidth limits
arctic services get SERVICE_IDLook for bandwidth_limit_mbps.
Resolution
- Consider KCP transport for high-latency networks
- Increase or remove bandwidth limits
- Check for network congestion
Collecting debug information
When reporting issues, collect:
# Agent version
arctic version
# Agent status
systemctl status arctic
# Agent logs
journalctl -u arctic -n 100
# Network configuration
ip addr show
ip route show
nft list ruleset
# Arctic configuration
arctic peers list
arctic services listHandshake failures
This section helps you diagnose and resolve peer handshake failures when adding agents to your cluster.
Understanding handshakes
When you add a peer, Arctic performs a handshake:
- The local agent contacts the remote agent
- Both agents exchange Ed25519 public keys
- Both verify signatures against the shared license
- On success, both store each other's peer information
Common error messages
Connection Refused
Error: handshake failed: connection refusedCause: Cannot establish TCP connection to the remote agent.
Resolution:
-
Verify the remote agent is running:
curl http://REMOTE_IP:8080/livez -
Check network connectivity:
ping REMOTE_IP telnet REMOTE_IP 8080 -
Verify firewall allows port 8080
Connection Timeout
Error: handshake failed: connection timeoutCause: Network path exists but connection cannot complete.
Resolution:
- Check for firewall rules blocking the connection
- Verify there are no NAT issues
- Check the remote agent is listening on the expected interface
License Mismatch
Error: handshake failed: license mismatchCause: The agents were bootstrapped with different licenses.
Resolution:
-
Check license IDs on both agents:
# On local agent arctic license show # On remote agent arctic license show --url http://REMOTE_IP:8080 -
If different, re-bootstrap one agent with the correct license
Invalid Signature
Error: handshake failed: invalid signatureCause: The peer's signature does not verify against the license public keys.
Resolution:
- This may indicate a tampered or corrupted peer key
- Re-bootstrap the affected agent
- If persistent, contact support
Peer Already Exists
Error: peer already exists in clusterCause: This peer was previously added to the cluster.
Resolution:
-
List existing peers:
arctic peers list -
The peer may already be connected
-
If you need to re-add, delete first:
arctic peers delete PEER_ID --yes
Node Limit Exceeded
Error: handshake failed: node limit exceededCause: Your license has a maximum number of nodes.
Resolution:
-
Check your license limits:
arctic license show -
Remove unused peers to make room
-
Contact your administrator to upgrade the license
Debugging steps
1. Enable debug logging
Run the CLI with debug output:
arctic peers add REMOTE_IP:8080 --debugOr trace HTTP requests:
arctic peers add REMOTE_IP:8080 --trace2. Check agent logs
View logs on both agents:
# Local agent
journalctl -u arctic -f
# Remote agent (via SSH)
ssh user@REMOTE_IP journalctl -u arctic -f3. Verify cluster identity
Check the remote agent's cluster identity (no auth required):
curl http://REMOTE_IP:8080/v1/cluster/identityResponse shows:
{
"peer_id": "01HXYZ...",
"public_key": "base64...",
"license_id": "lic_...",
"cluster_id": "01HABC..."
}Verify license_id matches your cluster.
4. Test network both directions
Handshakes require bidirectional communication. Test from both sides:
# From local to remote
curl http://REMOTE_IP:8080/livez
# From remote to local (via SSH)
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livezFirewall requirements
Ensure these ports are open:
| Port | Protocol | Direction | Purpose |
|---|---|---|---|
| 8080 | TCP | Bidirectional | API and handshake |
| 51840 | UDP | Bidirectional | IP tunnel |
NAT considerations
If agents are behind NAT:
- Use port forwarding to expose port 8080
- Specify the public address when adding peers
- Consider a VPN for consistent addressing
Recovery steps
If handshakes consistently fail:
-
Restart agents on both sides:
systemctl restart arctic -
Re-bootstrap if needed (loses local state):
# Stop agent systemctl stop arctic # Remove database rm /opt/tillered/arctic.db # Start and re-bootstrap systemctl start arctic arctic bootstrap --url http://localhost:8080 --license-file license.json -
Contact support if the issue persists after trying all steps
Configuration not applied
This section helps you diagnose and resolve issues when Arctic configuration changes are not being applied to the underlying services (TProxy, IP tunnel, NFTables).
Understanding configuration flow
When you create or modify services and routes:
- Changes are stored in the Arctic database
- Subsystem managers detect the changes
- Configuration files are regenerated
- Services (TProxy, IP tunnel) reload their config
- NFTables rules are updated
Symptoms
- Created a service but traffic is not being routed
- Updated routes but old routing still applies
- Bandwidth limits not taking effect
- MACVLAN interface not created
Diagnosis steps
1. Force configuration sync
First, try triggering a manual sync:
arctic cluster synccurl -X POST http://AGENT_IP:8080/v1/cluster/sync \
-H "Authorization: Bearer $TOKEN"Wait 10-30 seconds for configuration to propagate.
2. Check subsystem status
View agent logs for subsystem activity:
journalctl -u arctic | grep -E "(netmgr|fwmgr|tproxymgr|iptunmgr)"Look for errors or warnings from each manager.
3. Verify generated configurations
Check the configuration files were generated:
# NFTables rules
cat /etc/nftables.d/arctic.nft
# or
nft list ruleset | grep -A 20 "table inet arctic"4. Verify services are running
Check that the agent subsystems are active:
journalctl -u arctic | grep -E "(tproxymgr|iptunmgr)"Common issues
Config file not updated
Symptoms: Config file has old content or missing entries.
Resolution:
- Check agent logs for write errors
- Verify disk space:
df -h /opt/tillered - Check file permissions:
ls -la /opt/tillered/
NFTables rules not applied
Symptoms: nft list ruleset does not show expected rules.
Resolution:
-
Check if NFTables service is running:
systemctl status nftables -
Manually reload rules:
nft -f /etc/nftables.d/arctic.nft -
Check for syntax errors:
nft -c -f /etc/nftables.d/arctic.nft
TProxy not reloading
Symptoms: TProxy config updated but old tunnels still active.
Resolution:
-
Restart the agent to force a full reload:
systemctl restart arctic -
Check agent logs for TProxy errors:
journalctl -u arctic | grep tproxymgr
IP tunnel not reloading
Symptoms: IP tunnel config updated but tunnels not established.
Resolution:
-
Verify WireGuard interface exists:
ip link show type wireguard -
Restart the agent if needed:
systemctl restart arctic -
Check agent logs for IP tunnel errors:
journalctl -u arctic | grep iptunmgr
MACVLAN interface missing
Symptoms: Service requires interface but it was not created.
Resolution:
-
Check Network Manager logs:
journalctl -u arctic | grep netmgr -
Verify parent interface exists
-
Check for name conflicts with existing interfaces
Database vs config mismatch
Sometimes the database has correct data but config generation fails.
Check database state
# Services in database
arctic services list -j
# Trigger a config regeneration
arctic cluster syncForce regeneration
Restart the agent to force full config regeneration:
systemctl restart arcticTiming issues
Configuration changes may take up to 60 seconds to apply automatically. The subsystem managers run on periodic intervals:
- Network Manager: Every 30 seconds
- Firewall Manager: Every 30 seconds
- TProxy Manager: Every 30 seconds
- IP Tunnel Manager: Every 30 seconds
Use arctic cluster sync to trigger immediate processing.
Collecting debug information
When reporting issues:
# Agent logs
journalctl -u arctic --since "10 minutes ago"
# Current state
arctic services list -j
arctic routes list --service SERVICE_ID -j
# NFTables rules
nft list ruleset
# Process status
systemctl status arctic