Tillered Docs

Choosing a transport

Understanding when to use the TCP or KCP transport for a service, and why

Arctic carries each service's traffic between peers over one of two transports: TCP or KCP. Both deliver a reliable, ordered stream of bytes; what differs is the conditions they are built for, so the right choice comes down to the shape of your traffic and how clean the network path is.

The Two Transports

TCP

The protocol almost everything already runs on, handled by the operating system. Steady and efficient, but it reads loss as congestion and slows down.

KCP

An ARQ protocol over UDP, tuned for low latency on lossy links, with forward error correction and other modules layered on top. It recovers from loss faster and keeps moving, trading extra bandwidth and CPU to do so.

KCP is the less familiar of the two. It was originally built for real-time, interactive traffic such as online games, where a brief stall hurts more than a little extra bandwidth. At its core it is an ARQ protocol over UDP: like TCP it retransmits lost packets, but it is tuned to resend them sooner and more selectively, winning back latency at the cost of bandwidth. On top of that base sit further modules, chief among them forward error correction, which sends redundant data so the receiver can often reconstruct a loss without waiting for a resend. The cost is bandwidth and CPU, which is why TCP stays the default on clean links.

Selecting kcp does not give you raw KCP. Arctic runs that core with a fixed configuration tuned for its own workloads, multiplexes many connections over a single session, and carries its own proxy framing. These modifications keep evolving to suit Arctic's needs, but the trade-offs below are inherent to KCP.

Which Transport to Use

How to choose between TCP and KCPLossy link?KCPyesnoShort, bursty, or interactive?KCPyesnoTCPthe default

When in doubt, use TCP. It is the default and the right answer for most traffic.

Both transports tolerate the occasional dropped packet; every link loses one now and then, and TCP handles that fine. "Lossy" means a sustained loss rate high enough to keep tripping TCP's back-off. Wireless, cellular, satellite, long-haul international, and congested or oversubscribed links are the usual culprits.

To gauge a path, run ping or mtr between the two peers over a representative window and watch the packet loss figure. As a rough guide, below roughly 1% sustained loss TCP is usually fine; a few percent or more is where KCP starts to earn its overhead. Loss often varies with time of day and congestion, so measure during the conditions the service will actually run in, not just a quiet moment.

Setting the Transport

Each service picks its transport with the transport_type field. In a compose cluster.yaml:

services:
  - name: backup-tunnel
    source_peer: site-a
    target_peer: site-b
    transport_type: tcp   # or "kcp"

TCP is the default. The two are different transports with different behaviour, not a tuning switch, so it is worth setting deliberately per service. You can also set it with --transport when creating a service from the CLI. See Service management for the full workflow.

How the Transports Differ

At its core this is a trade between throughput and latency: KCP gives up bandwidth efficiency to lower latency. It is not more bandwidth-efficient than TCP on a clean link, and it is not faster in every case. Pick KCP when latency and loss recovery matter most, and TCP when sustained throughput matters most.

KCP trades throughput for lower latencyA trade: gaining one gives up the otherThroughputTCPLow latencyKCP

The table summarises how that trade plays out across each axis, with the detail below.

TCPKCP
Best forLarge, long, steady transfersShort, bursty, or interactive flows
Network pathClean links, even long-distanceLossy links
StartupSlow warm-upStarts fast
On packet lossBacks off and slows downRecovers and keeps moving
Bandwidth useEfficientHigher (adds redundancy)
CPU costLowerHigher
DefaultYesNo

Flow Length

TCP ramps up slowly and then sustains a high rate. KCP starts moving immediately but holds a lower ceiling. So a short flow can finish before TCP ever reaches its top speed, while a long flow gives TCP time to pull ahead and stay there.

Throughput over the life of a flowthroughputtimeTCPKCPshort flows endKCP aheadlong flows endTCP ahead

The longer and larger the transfer, the more it favours TCP. The more it is made of many small files or short connections, the more it favours KCP.

High latency sharpens this rather than changing it. TCP's warm-up is paced by round trips, so on a high-latency link it takes even longer to reach full speed, and short flows fall further behind. But high latency on its own is not a reason to pick KCP: a clean, high-latency link carrying one long transfer still favours TCP, which sustains full throughput once it has ramped. Arctic's TCP path is already tuned for distance, so reach for KCP because of loss or short, interactive flows, not because a link is simply far away.

Packet Loss

TCP treats packet loss as a sign of congestion and slows down to compensate. On a noisy link it backs off again and again, so even a small loss rate can cut throughput sharply. KCP recovers lost data with less penalty and keeps moving instead of collapsing.

This is not an Arctic quirk; it is a well-known property of TCP (sometimes called the Mathis equation). A small amount of loss causes a disproportionately large drop in throughput, and the effect gets worse with distance: the further the traffic has to travel, the harder the same loss bites. So a link that looks only mildly lossy can still cap TCP well below the bandwidth it actually has.

How TCP and KCP behave on a lossy linkTCP: backs off on every lossKCP: recovers and keeps movinglosslossloss

KCP pays for this with extra bandwidth, so it is a trade, not a free win. On a genuinely bad link the trade is usually worth it.

Streaming

Match the transport to the stream:

  • Interactive, live, or short-lived streams over a lossy link: KCP. The latency and loss recovery are what matter.
  • Bulk streaming of large files over a clean link: TCP. KCP would only add overhead for no latency benefit.

CPU Cost

KCP does more work to provide its fast recovery and ordering, so it uses more CPU than TCP at the same throughput. On a busy or resource-constrained host, factor that in.

Beating TCP on a shared link is the easy part: a transport can win simply by refusing to back off and taking more than its share. The hard part is going faster on a lossy link without starving the TCP sessions beside it. Arctic's KCP is modified to work on this problem: it attempts to avoid crowding out co-resident TCP rather than steamrolling it.

That work is ongoing, but you can now put a hard ceiling on a KCP service with bandwidth_limit_mbps, which caps how much it takes in either direction. This is a blunter tool than the TCP shaper: it bounds total bandwidth rather than queueing flows fairly or managing latency, but a ceiling is often enough to stop KCP crowding out the traffic beside it. Pick by what else shares the bottleneck:

  • Dedicated path, or one where nothing else needs protecting: KCP is fine. There is no co-resident traffic for it to crowd out.
  • Shared path with traffic you do not control: cap the KCP service with bandwidth_limit_mbps so it cannot exceed its share, or lean TCP if you want the shaper's fair queueing and latency control rather than a flat ceiling.

See Also

On this page