tcp congestion control algorithms
on this page
overview
tcp congestion control prevents network collapse by adapting sending rate to network conditions. it operates independently from flow control, using packet loss and delay as signals to adjust the congestion window (cwnd).
congestion window dynamics
Shows the different phases of congestion control and transitions between slow start, congestion avoidance, fast recovery, and timeout recovery.
slow start phase
Visualizes how the congestion window grows exponentially during slow start, doubling each round-trip time until reaching ssthresh or detecting loss.
congestion avoidance (aimd)
Shows the sawtooth pattern of congestion window evolution with linear increase during congestion avoidance and multiplicative decrease upon loss detection.
fast retransmit and recovery
Illustrates the fast retransmit triggered by three duplicate ACKs and the subsequent fast recovery phase without going through slow start.
congestion window evolution
Shows typical cwnd evolution through different phases: slow start, congestion avoidance, loss events, and recovery.
algorithm comparison
tl;dr
Algorithm | Year | Growth Function | Key Feature | Use Case |
Reno | 1990 | AIMD | Fast Recovery | General purpose |
NewReno | 1999 | AIMD | Partial ACK handling | Multiple losses |
CUBIC | 2008 | Cubic function | RTT fairness | High BDP networks |
BBR | 2016 | Model-based | Bandwidth probing | Variable networks |
Vegas | 1994 | Delay-based | RTT monitoring | Low latency |
cubic algorithm
CUBIC uses a cubic function for window growth, providing RTT-fairness and better performance in high bandwidth-delay networks.
cubic window function
W(t) = C * (t - K)³ + W_max
Where:
- t = time since last reduction
- K = ∛(W_max * β / C)
- β = multiplicative decrease factor (0.7)
- C = scaling constant (0.4)
- W_max = window size before last reduction
bbr (bottleneck bandwidth and rtt)
BBR operates by cycling through different states to probe for bandwidth and minimum RTT, maintaining optimal sending rate.
congestion signals
loss-based signals
tl;dr
Signal | Detection | Interpretation | Response |
Timeout | RTO expires | Severe congestion | cwnd = 1, slow start |
3 Dup ACKs | Fast retransmit trigger | Mild congestion | cwnd = cwnd/2, fast recovery |
ECN CE | ECN echo flag | Incipient congestion | cwnd = cwnd/2, continue |
SACK blocks | Selective ACK | Multiple losses | Retransmit gaps |
delay-based signals
RTT_min = minimum observed RTT (empty queue)
RTT_current = current RTT measurement
Queue_delay = RTT_current - RTT_min
If Queue_delay > threshold:
Reduce sending rate
Else:
Increase sending rate
implementation details
key variables
struct tcp_congestion {
uint32_t cwnd; // Congestion window
uint32_t ssthresh; // Slow start threshold
uint32_t flight_size; // Bytes in flight
uint32_t recovery_point; // Seq at fast recovery entry
uint8_t ca_state; // Congestion state
uint32_t dup_ack_count; // Duplicate ACK counter
// Algorithm-specific
union {
struct cubic_state cubic;
struct bbr_state bbr;
struct vegas_state vegas;
} algo_state;
};
linux congestion control
# View available algorithms
sysctl net.ipv4.tcp_available_congestion_control
# cubic reno bbr
# Current algorithm
sysctl net.ipv4.tcp_congestion_control
# cubic
# Change algorithm
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
# Per-socket algorithm
setsockopt(sock, IPPROTO_TCP, TCP_CONGESTION, "bbr", 4);
performance characteristics
Comparing throughput and latency characteristics of different congestion control algorithms under various network conditions.
tuning parameters
initial window (iw)
# RFC 6928 recommends IW = 10 MSS
# Linux default (segments)
sysctl net.ipv4.tcp_init_cwnd
slow start after idle
# Restart slow start after idle period
sysctl net.ipv4.tcp_slow_start_after_idle
# 1 = enabled (default), 0 = disabled
ecn (explicit congestion notification)
# ECN negotiation
sysctl net.ipv4.tcp_ecn
# 0 = disabled
# 1 = enabled when requested
# 2 = enabled by default
common issues
throughput collapse
symptoms:
- repeated timeouts
- very low throughput
- cwnd stuck at minimum
causes:
- severe packet loss
- incorrect rto calculation
- middlebox interference
unfairness
symptoms:
- one flow dominates
- starvation of competing flows
- unequal bandwidth sharing
causes:
- rtt unfairness (reno)
- aggressive algorithms
- different algorithms competing
bufferbloat
symptoms:
- high latency under load
- variable performance
- poor interactive response
solutions:
- use bbr or vegas
- enable ecn
- implement aqm at router
monitoring and debugging
# Monitor congestion metrics
ss -tin | grep -E "cwnd|ssthresh|rtt"
# Track retransmissions
netstat -s | grep -i retrans
# Watch algorithm behavior
tcpprobe -p 8888 > tcp_probe.log
# BPF trace congestion events
bpftrace -e 'kprobe:tcp_enter_loss { printf("Loss event\\n"); }'
key takeaways
- congestion control prevents network collapse through adaptive rate control
- operates independently from flow control (network vs receiver limitation)
- slow start provides rapid initial bandwidth discovery
- aimd provides fairness and stability
- modern algorithms (cubic, bbr) improve performance in challenging networks
- proper algorithm selection depends on network characteristics
references
- rfc 5681 - tcp congestion control
- rfc 6582 - newreno modification
- rfc 8312 - cubic for fast networks
- rfc 6928 - increasing initial window
- google bbr - congestion-based control
next steps
- review sliding window protocol for flow control
- explore tcp fundamentals for base concepts
- analyze connection lifecycle for debugging