tcp congestion control algorithms

published: September 28, 2025

overview

tcp congestion control prevents network collapse by adapting sending rate to network conditions. it operates independently from flow control, using packet loss and delay as signals to adjust the congestion window (cwnd).

congestion window dynamics

TCP Congestion Control State Machine
Rendering diagram...

Shows the different phases of congestion control and transitions between slow start, congestion avoidance, fast recovery, and timeout recovery.

slow start phase

Slow Start Exponential Growth
Rendering diagram...

Visualizes how the congestion window grows exponentially during slow start, doubling each round-trip time until reaching ssthresh or detecting loss.

congestion avoidance (aimd)

AIMD (Additive Increase Multiplicative Decrease)
Rendering diagram...

Shows the sawtooth pattern of congestion window evolution with linear increase during congestion avoidance and multiplicative decrease upon loss detection.

fast retransmit and recovery

Fast Retransmit and Fast Recovery Sequence
Rendering diagram...

Illustrates the fast retransmit triggered by three duplicate ACKs and the subsequent fast recovery phase without going through slow start.

congestion window evolution

Congestion Window Over Time
Rendering diagram...

Shows typical cwnd evolution through different phases: slow start, congestion avoidance, loss events, and recovery.

algorithm comparison

tl;dr

AlgorithmYearGrowth FunctionKey FeatureUse Case
Reno1990AIMDFast RecoveryGeneral purpose
NewReno1999AIMDPartial ACK handlingMultiple losses
CUBIC2008Cubic functionRTT fairnessHigh BDP networks
BBR2016Model-basedBandwidth probingVariable networks
Vegas1994Delay-basedRTT monitoringLow latency

cubic algorithm

CUBIC Window Growth Function
Rendering diagram...

CUBIC uses a cubic function for window growth, providing RTT-fairness and better performance in high bandwidth-delay networks.

cubic window function

W(t) = C * (t - K)³ + W_max

Where:
- t = time since last reduction
- K = ∛(W_max * β / C)
- β = multiplicative decrease factor (0.7)
- C = scaling constant (0.4)
- W_max = window size before last reduction

bbr (bottleneck bandwidth and rtt)

BBR State Machine
Rendering diagram...

BBR operates by cycling through different states to probe for bandwidth and minimum RTT, maintaining optimal sending rate.

congestion signals

loss-based signals

tl;dr

SignalDetectionInterpretationResponse
TimeoutRTO expiresSevere congestioncwnd = 1, slow start
3 Dup ACKsFast retransmit triggerMild congestioncwnd = cwnd/2, fast recovery
ECN CEECN echo flagIncipient congestioncwnd = cwnd/2, continue
SACK blocksSelective ACKMultiple lossesRetransmit gaps

delay-based signals

RTT_min = minimum observed RTT (empty queue)
RTT_current = current RTT measurement
Queue_delay = RTT_current - RTT_min

If Queue_delay > threshold:
    Reduce sending rate
Else:
    Increase sending rate

implementation details

key variables

struct tcp_congestion {
    uint32_t cwnd;           // Congestion window
    uint32_t ssthresh;       // Slow start threshold
    uint32_t flight_size;    // Bytes in flight
    uint32_t recovery_point; // Seq at fast recovery entry
    uint8_t  ca_state;       // Congestion state
    uint32_t dup_ack_count;  // Duplicate ACK counter
    
    // Algorithm-specific
    union {
        struct cubic_state cubic;
        struct bbr_state bbr;
        struct vegas_state vegas;
    } algo_state;
};

linux congestion control

# View available algorithms
sysctl net.ipv4.tcp_available_congestion_control
# cubic reno bbr

# Current algorithm
sysctl net.ipv4.tcp_congestion_control
# cubic

# Change algorithm
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

# Per-socket algorithm
setsockopt(sock, IPPROTO_TCP, TCP_CONGESTION, "bbr", 4);

performance characteristics

Algorithm Performance Comparison
Rendering diagram...

Comparing throughput and latency characteristics of different congestion control algorithms under various network conditions.

tuning parameters

initial window (iw)

# RFC 6928 recommends IW = 10 MSS
# Linux default (segments)
sysctl net.ipv4.tcp_init_cwnd

slow start after idle

# Restart slow start after idle period
sysctl net.ipv4.tcp_slow_start_after_idle
# 1 = enabled (default), 0 = disabled

ecn (explicit congestion notification)

# ECN negotiation
sysctl net.ipv4.tcp_ecn
# 0 = disabled
# 1 = enabled when requested
# 2 = enabled by default

common issues

throughput collapse

symptoms:

  • repeated timeouts
  • very low throughput
  • cwnd stuck at minimum

causes:

  • severe packet loss
  • incorrect rto calculation
  • middlebox interference

unfairness

symptoms:

  • one flow dominates
  • starvation of competing flows
  • unequal bandwidth sharing

causes:

  • rtt unfairness (reno)
  • aggressive algorithms
  • different algorithms competing

bufferbloat

symptoms:

  • high latency under load
  • variable performance
  • poor interactive response

solutions:

  • use bbr or vegas
  • enable ecn
  • implement aqm at router

monitoring and debugging

# Monitor congestion metrics
ss -tin | grep -E "cwnd|ssthresh|rtt"

# Track retransmissions
netstat -s | grep -i retrans

# Watch algorithm behavior
tcpprobe -p 8888 > tcp_probe.log

# BPF trace congestion events
bpftrace -e 'kprobe:tcp_enter_loss { printf("Loss event\\n"); }'

key takeaways

  • congestion control prevents network collapse through adaptive rate control
  • operates independently from flow control (network vs receiver limitation)
  • slow start provides rapid initial bandwidth discovery
  • aimd provides fairness and stability
  • modern algorithms (cubic, bbr) improve performance in challenging networks
  • proper algorithm selection depends on network characteristics

references

  • rfc 5681 - tcp congestion control
  • rfc 6582 - newreno modification
  • rfc 8312 - cubic for fast networks
  • rfc 6928 - increasing initial window
  • google bbr - congestion-based control

next steps

on this page