in Hacker School

Hacker School Day 33: TCP Congestion Control, Part 1

It’s getting hard to keep track of the days! This week I decided to investigate different TCP Congestion Control algorithms in an attempt to create my own dummy version. Linux lets you select from a list of many different shipped algorithms. You can check your currently selected algorithm under /proc:

cat /proc/sys/net/ipv4/tcp_congestion_control

The default for recent kernels is TCP Cubic, which is a less aggressive version of TCP BIC.

TCP Congestion Control involves a few intertwined pieces, and each algorithm can implement these parts in slightly (or relatively significant) ways:

  • slow start: When a connection begins, the sender is in slow start. The initial congestion window (cwnd) defines how many unacknowledged packets/bytes can be in flight at a given time. Slow start will increase the size of this window until either: 1) The slow start threshold is reached, 2) The size of the receiver’s window is reached, or 3) loss occurs. The goal here is to get the sender up to the point where it can efficiently utilize available bandwidth, without overwhelming the receiver or incurring loss.
  • congestion avoidance:  after exiting slow start, congestion avoidance attempts to increase the cwnd little-by-little until loss occurs. Loss is detected by multiple ACKs to the same segment. The goal here is to still send as many packets as efficiently as possible, but to also react to congestion events by reducing the amount of unacknowledged packets in flight.
  • Fast retransmit: normally, detecting a lost segment is based on a timeout. Fast transmit means “If I see multiple ACKs for the same segment, assume it has been lost and send it again”. This is a quicker detection mechanism than waiting for the timeout.
  • Fast recovery: fast recovery can occur after fast retransmit, where the sender returns to the congestion avoidance stage, rather than falling back down to slow start.

There’s a lot of points I’ve hand-waved over for brevity’s sake. Reading the linked Wikipedia articles helps flesh out some of the hand waving. For example, “multiple ACKs” in fast retransmit is usually 3 duplicate ACKs. The real help is having a copy of TCP/IP Illustrated, Volume 1. It really helps solidify some of what’s happening, complete with diagrams of packets in flight while experiencing loss. Additionally, reading the papers behind a given algorithm and its corresponding source code can be critical in understanding the nuances involved. You can read about Cubic specifically here and see its Linux implementation here.

The above list is generalized partitioning of what’s involved in a TCP congestion algorithm, and like I mentioned, each algorithm handles these tasks differently. Primarily, each option handles congestion avoidance with their algorithm. However, any of these variables can be changed. TCP Cubic, for example, provides an advanced formula that bases the congestion window on the time between congestion events. Additionally, it implements HyStart, a different mechanism for slow start that attempts to predict a safe exit point for slow start based on the delay between ACKs. The point is that each of these options perform differently for different workloads.

Linux has many TCP Congestion Control algorithms. My goal was to create a dummy kernel module and switch to that, and compare its performance against the others. The code for these algorithms are self-contained and relatively small, around 300-500 lines of C code, including plenty of comments.

It turns out, comparing congestion control algorithms is harder than I thought. That’s part two.

Write a Comment

Comment