Tillered Docs | Troubleshooting

Connectivity issues

This section helps you diagnose and resolve connectivity problems between Arctic agents and clients.

Agent not responding

Symptoms

curl http://AGENT_IP:8080/livez times out or fails
CLI commands fail with "connection refused" or timeout errors

Diagnosis

1. Check agent service status

systemctl status arctic

If the service is not running:

systemctl start arctic
journalctl -u arctic -n 50

2. Check agent is listening

ss -tlnp | grep 8080

Expected output shows the agent listening:

LISTEN  0  4096  *:8080  *:*  users:(("arctic",...))

3. Check firewall rules

# iptables
iptables -L INPUT -n | grep 8080

# nftables
nft list ruleset | grep 8080

# firewalld
firewall-cmd --list-ports

Ensure port 8080 is allowed.

Resolution

Start the agent service if stopped
Open port 8080 in the firewall
Check for conflicting services on port 8080

Peers cannot communicate

Symptoms

Peer handshake fails
Heartbeats not being received
Peers showing as unhealthy

Diagnosis

1. Test direct connectivity

From one agent host to another:

curl http://PEER_IP:8080/livez

2. Check UDP tunnel port

The IP tunnel uses UDP port 51840:

# Test UDP connectivity
nc -u PEER_IP 51840

3. Check network path

traceroute PEER_IP
mtr PEER_IP

Look for packet loss or high latency.

Resolution

Ensure both TCP 8080 and UDP 51840 are open between agents
Check for NAT issues if agents are on different networks
Verify routing between the networks

Traffic not being routed

Symptoms

Services are created but traffic does not flow
Packets are not being captured by TProxy

Diagnosis

1. Check service configuration

arctic services list
arctic services get SERVICE_ID

Verify the service exists and has routes.

2. Check NFTables rules

nft list ruleset | grep -A 10 arctic

Verify rules exist for your routes.

3. Check agent subsystems are running

journalctl -u arctic | grep -E "(tproxymgr|iptunmgr)"

Resolution

Trigger a config sync: arctic cluster sync
Verify routes match the traffic you expect to capture
Check that source/dest CIDRs are correct

MACVLAN interface issues

Symptoms

Service requires interface but none is created
Interface exists but has no IP address

Diagnosis

1. List network interfaces

ip link show
ip addr show

Look for interfaces named after service IDs (first 15 characters).

2. Check Network Manager logs

journalctl -u arctic | grep netmgr

Resolution

Verify the host has a suitable parent interface
Check the agent has CAP_NET_ADMIN capability
Ensure no interface name conflicts exist

DNS resolution issues

Symptoms

Cannot reach agents by hostname
DNS lookups fail within tunneled traffic

Diagnosis

nslookup HOSTNAME
dig HOSTNAME

Resolution

Verify DNS servers are configured correctly
Check if DNS traffic should be routed through Arctic
Add routes for DNS server IPs if needed

High latency

Symptoms

Traffic through Arctic is slow
High ping times between services

Diagnosis

1. Measure baseline latency

# Direct connection
ping PEER_IP

# Through Arctic tunnel
ping DESTINATION_THROUGH_TUNNEL

2. Check for packet loss

mtr DESTINATION

3. Check bandwidth limits

arctic services get SERVICE_ID

Look for bandwidth_limit_mbps.

Resolution

Consider KCP transport for high-latency networks
Increase or remove bandwidth limits
Check for network congestion

Collecting debug information

When reporting issues, collect:

# Agent version
arctic version

# Agent status
systemctl status arctic

# Agent logs
journalctl -u arctic -n 100

# Network configuration
ip addr show
ip route show
nft list ruleset

# Arctic configuration
arctic peers list
arctic services list

Handshake failures

This section helps you diagnose and resolve peer handshake failures when adding agents to your cluster.

Understanding handshakes

When you add a peer, Arctic performs a handshake:

The local agent contacts the remote agent
Both agents exchange Ed25519 public keys
Both verify signatures against the shared license
On success, both store each other's peer information

Common error messages

Connection Refused

Error: handshake failed: connection refused

Cause: Cannot establish TCP connection to the remote agent.

Resolution:

Verify the remote agent is running:
```
curl http://REMOTE_IP:8080/livez
```
Check network connectivity:
```
ping REMOTE_IP
telnet REMOTE_IP 8080
```
Verify firewall allows port 8080

Connection Timeout

Error: handshake failed: connection timeout

Cause: Network path exists but connection cannot complete.

Resolution:

Check for firewall rules blocking the connection
Verify there are no NAT issues
Check the remote agent is listening on the expected interface

License Mismatch

Error: handshake failed: license mismatch

Cause: The agents were bootstrapped with different licenses.

Resolution:

Check license IDs on both agents:

# On local agent
arctic license show

# On remote agent
arctic license show --url http://REMOTE_IP:8080

If different, re-bootstrap one agent with the correct license

Invalid Signature

Error: handshake failed: invalid signature

Cause: The peer's signature does not verify against the license public keys.

Resolution:

This may indicate a tampered or corrupted peer key
Re-bootstrap the affected agent
If persistent, contact support

Peer Already Exists

Error: peer already exists in cluster

Cause: This peer was previously added to the cluster.

Resolution:

List existing peers:
```
arctic peers list
```
The peer may already be connected
If you need to re-add, delete first:
```
arctic peers delete PEER_ID --yes
```

Node Limit Exceeded

Error: handshake failed: node limit exceeded

Cause: Your license has a maximum number of nodes.

Resolution:

Check your license limits:
```
arctic license show
```
Remove unused peers to make room
Contact your administrator to upgrade the license

Debugging steps

1. Enable debug logging

Run the CLI with debug output:

arctic peers add REMOTE_IP:8080 --debug

Or trace HTTP requests:

arctic peers add REMOTE_IP:8080 --trace

2. Check agent logs

View logs on both agents:

# Local agent
journalctl -u arctic -f

# Remote agent (via SSH)
ssh user@REMOTE_IP journalctl -u arctic -f

3. Verify cluster identity

Check the remote agent's cluster identity (no auth required):

curl http://REMOTE_IP:8080/v1/cluster/identity

Response shows:

{
  "peer_id": "01HXYZ...",
  "public_key": "base64...",
  "license_id": "lic_...",
  "cluster_id": "01HABC..."
}

Verify license_id matches your cluster.

4. Test network both directions

Handshakes require bidirectional communication. Test from both sides:

# From local to remote
curl http://REMOTE_IP:8080/livez

# From remote to local (via SSH)
ssh user@REMOTE_IP curl http://LOCAL_IP:8080/livez

Firewall requirements

Ensure these ports are open:

Port	Protocol	Direction	Purpose
8080	TCP	Bidirectional	API and handshake
51840	UDP	Bidirectional	IP tunnel

NAT considerations

If agents are behind NAT:

Use port forwarding to expose port 8080
Specify the public address when adding peers
Consider a VPN for consistent addressing

Recovery steps

If handshakes consistently fail:

Restart agents on both sides:
```
systemctl restart arctic
```

Re-bootstrap if needed (loses local state):

# Stop agent
systemctl stop arctic

# Remove database
rm /opt/tillered/arctic.db

# Start and re-bootstrap
systemctl start arctic
arctic bootstrap --url http://localhost:8080 --license-file license.json

Contact support if the issue persists after trying all steps

Configuration not applied

This section helps you diagnose and resolve issues when Arctic configuration changes are not being applied to the underlying services (TProxy, IP tunnel, NFTables).

Understanding configuration flow

When you create or modify services and routes:

Changes are stored in the Arctic database
Subsystem managers detect the changes
Configuration files are regenerated
Services (TProxy, IP tunnel) reload their config
NFTables rules are updated

Symptoms

Created a service but traffic is not being routed
Updated routes but old routing still applies
Bandwidth limits not taking effect
MACVLAN interface not created

Diagnosis steps

1. Force configuration sync

First, try triggering a manual sync:

arctic cluster sync

curl -X POST http://AGENT_IP:8080/v1/cluster/sync \
  -H "Authorization: Bearer $TOKEN"

Wait 10-30 seconds for configuration to propagate.

2. Check subsystem status

View agent logs for subsystem activity:

journalctl -u arctic | grep -E "(netmgr|fwmgr|tproxymgr|iptunmgr)"

Look for errors or warnings from each manager.

3. Verify generated configurations

Check the configuration files were generated:

# NFTables rules
cat /etc/nftables.d/arctic.nft
# or
nft list ruleset | grep -A 20 "table inet arctic"

4. Verify services are running

Check that the agent subsystems are active:

journalctl -u arctic | grep -E "(tproxymgr|iptunmgr)"

Common issues

Config file not updated

Symptoms: Config file has old content or missing entries.

Resolution:

Check agent logs for write errors
Verify disk space: df -h /opt/tillered
Check file permissions: ls -la /opt/tillered/

NFTables rules not applied

Symptoms: nft list ruleset does not show expected rules.

Resolution:

Check if NFTables service is running:
```
systemctl status nftables
```
Manually reload rules:
```
nft -f /etc/nftables.d/arctic.nft
```
Check for syntax errors:
```
nft -c -f /etc/nftables.d/arctic.nft
```

TProxy not reloading

Symptoms: TProxy config updated but old tunnels still active.

Resolution:

Restart the agent to force a full reload:
```
systemctl restart arctic
```
Check agent logs for TProxy errors:
```
journalctl -u arctic | grep tproxymgr
```

IP tunnel not reloading

Symptoms: IP tunnel config updated but tunnels not established.

Resolution:

Verify WireGuard interface exists:
```
ip link show type wireguard
```
Restart the agent if needed:
```
systemctl restart arctic
```
Check agent logs for IP tunnel errors:
```
journalctl -u arctic | grep iptunmgr
```

MACVLAN interface missing

Symptoms: Service requires interface but it was not created.

Resolution:

Check Network Manager logs:
```
journalctl -u arctic | grep netmgr
```
Verify parent interface exists
Check for name conflicts with existing interfaces

Database vs config mismatch

Sometimes the database has correct data but config generation fails.

Check database state

# Services in database
arctic services list -j

# Trigger a config regeneration
arctic cluster sync

Force regeneration

Restart the agent to force full config regeneration:

systemctl restart arctic

Timing issues

Configuration changes may take up to 60 seconds to apply automatically. The subsystem managers run on periodic intervals:

Network Manager: Every 30 seconds
Firewall Manager: Every 30 seconds
TProxy Manager: Every 30 seconds
IP Tunnel Manager: Every 30 seconds

Use arctic cluster sync to trigger immediate processing.

Collecting debug information

When reporting issues:

# Agent logs
journalctl -u arctic --since "10 minutes ago"

# Current state
arctic services list -j
arctic routes list --service SERVICE_ID -j

# NFTables rules
nft list ruleset

# Process status
systemctl status arctic

Troubleshooting

On this page