Tillered Docs
Maintenance

Troubleshooting

How to diagnose and resolve common Arctic issues

Connectivity issues

This section helps you diagnose and resolve connectivity problems between Arctic agents and clients.

Agent not responding

Symptoms

  • curl http://AGENT_IP:8080/livez times out or fails
  • CLI commands fail with "connection refused" or timeout errors

Diagnosis

1. Check agent service status

systemctl status arctic

If the service is not running:

systemctl start arctic
journalctl -u arctic -n 50

2. Check agent is listening

ss -tlnp | grep 8080

Expected output shows the agent listening:

LISTEN  0  4096  *:8080  *:*  users:(("arctic",...))

3. Check firewall rules

# iptables
iptables -L INPUT -n | grep 8080

# nftables
nft list ruleset | grep 8080

# firewalld
firewall-cmd --list-ports

Ensure port 8080 is allowed.

Resolution

  • Start the agent service if stopped
  • Open port 8080 in the firewall
  • Check for conflicting services on port 8080

Peers cannot communicate

Symptoms

  • Peer handshake fails
  • Heartbeats not being received
  • Peers showing as unhealthy

Diagnosis

1. Test direct connectivity

From one agent host to another:

curl http://PEER_IP:8080/livez

2. Check UDP tunnel port

The IP tunnel uses UDP port 51840:

# Test UDP connectivity
nc -u PEER_IP 51840

3. Check network path

traceroute PEER_IP
mtr PEER_IP

Look for packet loss or high latency.

Resolution

  • Ensure both TCP 8080 and UDP 51840 are open between agents
  • Check for NAT issues if agents are on different networks
  • Verify routing between the networks

Traffic not being routed

Symptoms

  • Services are created but traffic does not flow
  • Packets are not being captured by TProxy

Diagnosis

1. Check service configuration

arctic services list
arctic services get SERVICE_ID

Verify the service exists and has routes.

2. Check NFTables rules

nft list ruleset | grep -A 10 arctic

Verify rules exist for your routes.

3. Check agent subsystems are running

journalctl -u arctic | grep -E "(tproxymgr|iptunmgr)"

Resolution

  • Trigger a config sync: arctic cluster sync
  • Verify routes match the traffic you expect to capture
  • Check that source/dest CIDRs are correct

MACVLAN interface issues

Symptoms

  • Service requires interface but none is created
  • Interface exists but has no IP address

Diagnosis

1. List network interfaces

ip link show
ip addr show

Look for interfaces named after service IDs (first 15 characters).

2. Check Network Manager logs

journalctl -u arctic | grep netmgr

Resolution

  • Verify the host has a suitable parent interface
  • Check the agent has CAP_NET_ADMIN capability
  • Ensure no interface name conflicts exist

DNS resolution issues

Symptoms

  • Cannot reach agents by hostname
  • DNS lookups fail within tunneled traffic

Diagnosis

nslookup HOSTNAME
dig HOSTNAME

Resolution

  • Verify DNS servers are configured correctly
  • Check if DNS traffic should be routed through Arctic
  • Add routes for DNS server IPs if needed

High latency

Symptoms

  • Traffic through Arctic is slow
  • High ping times between services

Diagnosis

1. Measure baseline latency

# Direct connection
ping PEER_IP

# Through Arctic tunnel
ping DESTINATION_THROUGH_TUNNEL

2. Check for packet loss

mtr DESTINATION

3. Check bandwidth limits

arctic services get SERVICE_ID

Look for bandwidth_limit_mbps.

Resolution

  • Consider KCP transport for high-latency networks
  • Increase or remove bandwidth limits
  • Check for network congestion

Collecting debug information

When reporting issues, collect:

# Agent version
arctic version

# Agent status
systemctl status arctic

# Agent logs
journalctl -u arctic -n 100

# Network configuration
ip addr show
ip route show
nft list ruleset

# Arctic configuration
arctic peers list
arctic services list

Handshake failures

This section helps you diagnose and resolve peer handshake failures when adding agents to your cluster.

Understanding handshakes

When you add a peer, Arctic performs a handshake:

  1. The local agent contacts the remote agent
  2. Both agents exchange Ed25519 public keys
  3. Both verify signatures against the shared license
  4. On success, both store each other's peer information

Common error messages

Connection Refused

Error: handshake failed: connection refused

Cause: Cannot establish TCP connection to the remote agent.

Resolution:

  1. Verify the remote agent is running:

    curl http://REMOTE_IP:8080/livez
  2. Check network connectivity:

    ping REMOTE_IP
    telnet REMOTE_IP 8080
  3. Verify firewall allows port 8080

Connection Timeout

Error: handshake failed: connection timeout

Cause: Network path exists but connection cannot complete.

Resolution:

  1. Check for firewall rules blocking the connection
  2. Verify there are no NAT issues
  3. Check the remote agent is listening on the expected interface

License Mismatch

Error: handshake failed: license mismatch

Cause: The agents were bootstrapped with different licenses.

Resolution:

  1. Check license IDs on both agents:

    # On local agent
    arctic license show
    
    # On remote agent
    arctic license show --url http://REMOTE_IP:8080
  2. If different, re-bootstrap one agent with the correct license

Invalid Signature

Error: handshake failed: invalid signature

Cause: The peer's signature does not verify against the license public keys.

Resolution:

  1. This may indicate a tampered or corrupted peer key
  2. Re-bootstrap the affected agent
  3. If persistent, contact support

Peer Already Exists

Error: peer already exists in cluster

Cause: This peer was previously added to the cluster.

Resolution:

  1. List existing peers:

    arctic peers list
  2. The peer may already be connected

  3. If you need to re-add, delete first:

    arctic peers delete PEER_ID --yes

Node Limit Exceeded

Error: handshake failed: node limit exceeded

Cause: Your license has a maximum number of nodes.

Resolution:

  1. Check your license limits:

    arctic license show
  2. Remove unused peers to make room

  3. Contact your administrator to upgrade the license

Debugging steps

1. Enable debug logging

Run the CLI with debug output:

arctic peers add REMOTE_IP:8080 --debug

Or trace HTTP requests:

arctic peers add REMOTE_IP:8080 --trace

2. Check agent logs

View logs on both agents:

# Local agent
journalctl -u arctic -f

# Remote agent (via SSH)
ssh user@REMOTE_IP journalctl -u arctic -f

3. Verify cluster identity

Check the remote agent's cluster identity (no auth required):

curl http://REMOTE_IP:8080/v1/cluster/identity

Response shows:

{
  "peer_id": "01HXYZ...",
  "public_key": "base64...",
  "license_id": "lic_...",
  "cluster_id": "01HABC..."
}

Verify license_id matches your cluster.

4. Test network both directions

Handshakes require bidirectional communication. Test from both sides:

# From local to remote
curl http://REMOTE_IP:8080/livez

# From remote to local (via SSH)
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livez

Firewall requirements

Ensure these ports are open:

PortProtocolDirectionPurpose
8080TCPBidirectionalAPI and handshake
51840UDPBidirectionalIP tunnel

NAT considerations

If agents are behind NAT:

  1. Use port forwarding to expose port 8080
  2. Specify the public address when adding peers
  3. Consider a VPN for consistent addressing

Recovery steps

If handshakes consistently fail:

  1. Restart agents on both sides:

    systemctl restart arctic
  2. Re-bootstrap if needed (loses local state):

    # Stop agent
    systemctl stop arctic
    
    # Remove database
    rm /opt/tillered/arctic.db
    
    # Start and re-bootstrap
    systemctl start arctic
    arctic bootstrap --url http://localhost:8080 --license-file license.json
  3. Contact support if the issue persists after trying all steps

Configuration not applied

This section helps you diagnose and resolve issues when Arctic configuration changes are not being applied to the underlying services (TProxy, IP tunnel, NFTables).

Understanding configuration flow

When you create or modify services and routes:

  1. Changes are stored in the Arctic database
  2. Subsystem managers detect the changes
  3. Configuration files are regenerated
  4. Services (TProxy, IP tunnel) reload their config
  5. NFTables rules are updated

Symptoms

  • Created a service but traffic is not being routed
  • Updated routes but old routing still applies
  • Bandwidth limits not taking effect
  • MACVLAN interface not created

Diagnosis steps

1. Force configuration sync

First, try triggering a manual sync:

arctic cluster sync
curl -X POST http://AGENT_IP:8080/v1/cluster/sync \
  -H "Authorization: Bearer $TOKEN"

Wait 10-30 seconds for configuration to propagate.

2. Check subsystem status

View agent logs for subsystem activity:

journalctl -u arctic | grep -E "(netmgr|fwmgr|tproxymgr|iptunmgr)"

Look for errors or warnings from each manager.

3. Verify generated configurations

Check the configuration files were generated:

# NFTables rules
cat /etc/nftables.d/arctic.nft
# or
nft list ruleset | grep -A 20 "table inet arctic"

4. Verify services are running

Check that the agent subsystems are active:

journalctl -u arctic | grep -E "(tproxymgr|iptunmgr)"

Common issues

Config file not updated

Symptoms: Config file has old content or missing entries.

Resolution:

  1. Check agent logs for write errors
  2. Verify disk space: df -h /opt/tillered
  3. Check file permissions: ls -la /opt/tillered/

NFTables rules not applied

Symptoms: nft list ruleset does not show expected rules.

Resolution:

  1. Check if NFTables service is running:

    systemctl status nftables
  2. Manually reload rules:

    nft -f /etc/nftables.d/arctic.nft
  3. Check for syntax errors:

    nft -c -f /etc/nftables.d/arctic.nft

TProxy not reloading

Symptoms: TProxy config updated but old tunnels still active.

Resolution:

  1. Restart the agent to force a full reload:

    systemctl restart arctic
  2. Check agent logs for TProxy errors:

    journalctl -u arctic | grep tproxymgr

IP tunnel not reloading

Symptoms: IP tunnel config updated but tunnels not established.

Resolution:

  1. Verify WireGuard interface exists:

    ip link show type wireguard
  2. Restart the agent if needed:

    systemctl restart arctic
  3. Check agent logs for IP tunnel errors:

    journalctl -u arctic | grep iptunmgr

MACVLAN interface missing

Symptoms: Service requires interface but it was not created.

Resolution:

  1. Check Network Manager logs:

    journalctl -u arctic | grep netmgr
  2. Verify parent interface exists

  3. Check for name conflicts with existing interfaces

Database vs config mismatch

Sometimes the database has correct data but config generation fails.

Check database state

# Services in database
arctic services list -j

# Trigger a config regeneration
arctic cluster sync

Force regeneration

Restart the agent to force full config regeneration:

systemctl restart arctic

Timing issues

Configuration changes may take up to 60 seconds to apply automatically. The subsystem managers run on periodic intervals:

  • Network Manager: Every 30 seconds
  • Firewall Manager: Every 30 seconds
  • TProxy Manager: Every 30 seconds
  • IP Tunnel Manager: Every 30 seconds

Use arctic cluster sync to trigger immediate processing.

Collecting debug information

When reporting issues:

# Agent logs
journalctl -u arctic --since "10 minutes ago"

# Current state
arctic services list -j
arctic routes list --service SERVICE_ID -j

# NFTables rules
nft list ruleset

# Process status
systemctl status arctic