TCP Congestion Control with a Misbehaving Receiver
Stefan Savage, Neal Cardwell, David Wetherall, and Tom Anderson Department of Computer Science and Engineering University of Washington, Seattle
Abstract col modifications, a faulty or malicious receiver can at most cause the sender to transmit data at a slower rate than it otherwise would, In this paper, we explore the operation of TCP congestion control thus harming only itself. Because our work has serious practical when the receiver can misbehave, as might occur with a greedy ramifications for an Internet that depends on trust to avoid conges- Web client. We first demonstrate that there are simple attacks that tion collapse, we also describe backwards-compatible mechanisms allow a misbehaving receiver to drive a standard TCP sender ar- that can be implemented at the sender to mitigate the effects of un- bitrarily fast, without losing end-to-end reliability. These attacks trusted receivers. are widely applicable because they stem from the sender behavior As far as we are aware, the division of trust between sender specified in RFC 2581 rather than implementation bugs. We then and receiver has not been studied previously in the context of con- show that it is possible to modify TCP to eliminate this undesir- gestion control. While end-to-end congestion control protocols as- able behavior entirely, without requiring assumptions of any kind sume that both sender and receiver behave correctly, in many en- about receiver behavior. This is a strong result: with our solution vironments the interests of sender and receiver may differ consid- a receiver can only reduce the data transfer rate by misbehaving, erably – creating significant incentives to violate this “good faith” thereby eliminating the incentive to do so. doctrine. For example, in many wide-area data retrieval applica- tions, such as Web browsing, the sender's interest is to provide 1 Introduction uniform service to all clients requesting data, while the interest of each client receiver is to maximize its own data throughput. Tra- End-to-end congestion control mechanisms, such as those used in ditional cryptographic security mechanisms, such as embodied in TCP, are the primary means used for sharing scarce bandwidth re- IPSEC [KA98], can provide guarantees of authentication and con- sources in the Internet. These mechanisms implicitly rely on both fidentiality, but they can not prevent a receiver from violating TCP's endpoints to cooperate in determining the proper rate at which to congestion control specification and consequently undermining the send data. Obviously, if the sending endpoint misbehaves, and does fairness and stability provided therein. not obey the appropriate congestion control algorithms, then it may The potential congestion resulting from aggressive senders send data more quickly than well-behaved hosts – possibly forc- has received significant attention from the networking commu- ing competing traffic to be delayed or discarded. Less obviously, a nity [She94, FF99, RHE99, VRC98], and has produced proposals misbehaving receiver can achieve the same result. for per-flow bandwidth reservation [ZDE+ 93] and mechanisms to While the possibility of such attacks has been hinted at pre- detect and limit “unfriendly” flows in the network [FF99]. These viously [APS99, PAD + 99], we believe the ease of exploiting this solutions, if workable, would solve the more general problem of vulnerability and the potential impact are not fully appreciated. We unconstrained data transmission and would make the issue of trust note that the population of receivers is extremely large (all Inter- in end-to-end congestion control less pressing. However, given that net users) and has both the incentive (faster Web surfing) and the it is unlikely that such mechanisms will be widely deployed in the opportunity (open source operating systems) to exploit this vulner- near term, we feel it is still prudent to consider the potential im- ability. pact of untrusted receivers on the congestion control mechanisms In this paper, we explore the impact that a misbehaving receiver in today's Internet. can have on TCP congestion control. We present two kinds of re- sults. First, we identify several vulnerabilities that can be exploited 2 Vulnerabilities by a malicious receiver to defeat TCP congestion control. This can be done in a manner that does not break end-to-end reliability se- By systematically considering sequences of message exchanges, mantics and that relies only on the standard behavior of correctly we have been able to identify several vulnerabilities that allow mis- implemented TCP senders. Tests against live Web servers using a behaving receivers to control the sending rate of unmodified, con- modified TCP implementation that we produced for this purpose, forming TCP senders. This section describes these vulnerabilities “TCP Daytona”, confirm that common TCP implementations pos- and techniques for exploiting them. In addition to denial-of-service sess these vulnerabilities. Second, we show that it is possible to attacks, these techniques can be used to enhance the performance modify the design of TCP to eliminate this behavior – without re- of the attacker's TCP sessions at the expense of behaving clients. quiring that the receiver be trusted in any manner. With our proto-
This work was funded by generous grants from NSF (CCR 94-53532), DARPA (F30602-98-1-0205), USENIX, the National Library of Medicine, Cisco, Fuji and In- tel. Correspondence concerning this paper may be sent to [email protected]. 2.1 TCP review Sender Receiver While a detailed description of TCP's error and congestion con- Data trol mechanisms is beyond the scope of this paper, we describe the 1:1461 rudiments of their behavior below to allow those unfamiliar with TCP to understand the vulnerabilities explained later. For simplic- 7 ACK 48 ity, we consider TCP without the Selective Acknowledgment op- RTT 3 tion (SACK) [MMFR96], although the vulnerabilities we describe ACK 97 61 also exist when SACK is used. ACK 14 TCP is a connection-oriented, reliable, ordered, byte-stream Da protocol with explicit flow control. A sending host divides the data ta 146 1:2921 stream into individual segments, each of which is no longer than the Da ta 292 Sender Maximum Segment Size (SMSS) determined during con- 1:4381 Da nection establishment. Each segment is labeled with explicit se- ta 438 1:5841 quence numbers to guarantee ordering and reliability. When a host Da ta 584 receives an in-sequence segment it sends a cumulative acknowl- 1:7301 edgment (ACK) in return, notifying the sender that all of the data preceding that segment's sequence number has been received and can be retired from the sender's retransmission buffers. If an out- of-sequence segment is received, then the receiver acknowledges the next contiguous sequence number that was expected. If out- Figure 1: Sample time line for a ACK division attack. The sender be- standing data is not acknowledged for a period of time, the sender gins with cwnd=1, which is incremented for each of the three valid ACKs will timeout and retransmit the unacknowledged segments. received. After one round-trip time, cwnd=4, instead of the expected value TCP uses several algorithms for congestion control, most no- of cwnd=2. tably slow start and congestion avoidance [Jac88, Ste94, APS99]. Each of these algorithms controls the sending rate by manipulating This attack is demonstrated in Figure 1 with a time line. Here, a congestion window (cwnd) that limits the number of outstanding each message exchanged between sender and receiver is shown as unacknowledged bytes that are allowed at any time. When a con- a labeled arrow, with time proceeding down the page. The labels nection starts, the slow start algorithm is used to quickly increase indicate the type of message, data or acknowledgment, and the se- cwnd to reach the bottleneck capacity. When the sender infers that quence space consumed. In this example we can see that each ac- a segment has been lost it interprets this has an implicit signal of knowledgment is valid, in that it covers data that was sent and pre- network overload and decreases cwnd quickly. After roughly ap- viously unacknowledged. This leads the TCP sender to grow the proximating the bottleneck capacity, TCP switches to the conges- congestion window at a rate that is M times faster than usual. The tion avoidance algorithm which increases the value of cwnd more receiver can control this rate of growth by dividing the segment slowly to probe for additional bandwidth that may become avail- at arbitrary points – up to one acknowledgment per byte received able. (when M = N). At this limit, a sender with a 1460 byte SMSS could We now describe three attacks on this congestion control pro- theoretically be coerced into reaching a congestion window in ex- cedure that exploit a sender's vulnerability to non-conforming re- cess of the normal TCP sequence space (4GB) in only four round- ceiver behavior. trip times! 1 Moreover, while high rates of additional acknowledg- ment traffic may increase congestion on the path to the sender, the 2.2 ACK division penalty to the receiver is negligible since the cumulative nature of TCP uses a byte granularity error control protocol and consequently acknowledgments inherently tolerates any losses that may occur. each TCP segment is described by sequence number and acknowl- edgment fields that refer to byte offsets within a TCP data stream. 2.3 DupACK spoofing However, TCP's congestion control algorithm is implicitly defined TCP uses two algorithms, fast retransmit and fast recovery, to miti- in terms of segments rather than bytes. For example, the most re- gate the effects of packet loss. The fast retransmit algorithm detects cent specification of TCP's congestion control behavior, RFC 2581, loss by observing three duplicate acknowledgments and it immedi- states: ately retransmits what appears to be the missing segment. How- During slow start, TCP increments cwnd by at most ever, the receipt of a duplicate ACK also suggests that segments SMSS bytes for each ACK received that acknowledges are leaving the network. The fast recovery algorithm employs this new data. information as follows (again quoted from RFC 2581): ... During congestion avoidance, cwnd is incremented by 1 Set cwnd to ssthresh plus 3*SMSS. This artificially “in- full-sized segment per round-trip time (RTT). flates” the congestion window by the number of seg- ments (three) that have left the network and which the The incongruence between the byte granularity of error control receiver has buffered. and the segment granularity (or more precisely, SMSS granularity) .. of congestion control leads to the following vulnerability: For each additional duplicate ACK received, increment Attack 1: cwnd by SMSS. This artificially inflates the congestion Upon receiving a data segment containing N bytes, the window in order to reflect the additional segment that receiver divides the resulting acknowledgment into M, has left the network.
where M N, separate acknowledgments – each cov- 1Of course the practical transmission rate is ultimately limited by other factors such ering one of M distinct pieces of the received data seg- as sender buffering, receiver buffering and network bandwidth. ment. Sender Receiver Sender Receiver Data 1:1461 Da ta 1:1461 1 ACK RTT K 1 1461 RTT AC ACK K 1 1 AC ACK 292 ACK 1 Da ta 1461:2 K 1 921 AC Data 2 921:4381 Data Data 43 1:146 81:5841 Da 1 Data ta 14 5841:73 61:292 01 Data 2 1 921:43 Data 81 4381: Da 5841 ta 584 1:7301
Figure 2: Sample time line for a DupACK spoofing attack. The receiver Figure 3: Sample time line for optimistic ACKing attack. The ACK for forges multiple duplicate ACKs for sequence number 1. This causes the the second segment is sent before the segment itself is received, leading sender to retransmit the first segment and send a new segment for each the receiver to grow cwnd more quickly than otherwise. At the end of this additional forged duplicate ACK. example, cwnd=3, rather than the expected value of cwnd=2.
There are two problems with this approach. First, it assumes gestion avoidance), sender-receiver pairs with shorter round-trip that each segment that has left the network is full sized – again an times will transfer data more quickly. unfortunate interaction of byte granularity error control and seg- However, the protocol does not use any mechanism to enforce ment granularity congestion control. Second, and more important, its assumption. Consequently, it is possible for a receiver to emu- because TCP requires that duplicate ACKs be exact duplicates, late a shorter round-trip time by sending ACKs optimistically for there is no way to ascertain which data segment they were sent in data it has not yet received: response to. Consequently, it is impossible to differentiate a “valid” duplicate ACK, from a forged, or “spoofed”, duplicate ACK. For Attack 3: the same reason, the sender cannot distinguish ACKs that are acci- Upon receiving a data segment, the receiver sends a dentally duplicated by the network itself from those generated by a stream of acknowledgments anticipating data that will receiver [APS99]. In essence, duplicate ACKs are a signal that can be sent by the sender. be used by the receiver to force the sender to transmit new segments into the network as follows: This technique is demonstrated in Figure 3. Note that while it is easy for the receiver to anticipate the correct sequence numbers Attack 2: to use in each acknowledgment (since senders generally send full- Upon receiving a data segment, the receiver sends a sized segments), this accuracy is not necessary. As long as the long stream of acknowledgments for the last sequence receiver acknowledges new data the sender will transmit additional number received (at the start of a connection this would segments. Moreover, if an ACK arrives for data that has not yet be for the SYN segment). been sent, this is generally ignored by the sending TCP – allowing a sender to be arbitrarily aggressive in its generation of optimistic Figure 2 shows a time line for this technique. The first four ACKs. ACKs for the same sequence number cause the sender to retransmit Unlike the previous attacks, this technique does not necessarily the first segment. However, cwnd is now set to its initial value preserve end-to-end reliability semantics – if data from the sender plus 3*SMSS, and increased by SMSS for each additional duplicate is lost it may be unrecoverable since it has already been acknowl- ACK, for a total of 4 segments (as per the fast recovery algorithm). edged. However, new features in protocols such as HTTP-1.1 al- Since duplicate ACKs are indistinguishable, the receiver does not low receivers to request particular byte-ranges within a data ob- need to wait for new data to send additional acknowledgments. As ject [FGM+ 99]. This suggests a strategy in which data is gath- a result, the sender will return data at a rate directly proportional ered on one connection and lost segments are then collected se- to the rate at which the receiver sends acknowledgments. After a lectively with application-layer retransmissions on another. Opti- period, the sender will timeout. However, this can easily be avoided mistic ACKing could be used to ramp the transfer rate up to the if the receiver acknowledges the missing segment and enters fast bottleneck rate immediately, and then hold it there by sending ac- retransmit again for a new, later, segment. knowledgments in spite of losses. This ability of the receiver to conceal losses is extremely dangerous because it eliminates the 2.4 Optimistic ACKing only congestion signal available to the sender. A malicious at- tacker could conceal all losses and therefore lead a sender to in- Implicit in TCP's algorithms is the assumption that the time be- crease cwnd indefinitely – possibly overwhelming the network with tween a data segment being sent and an acknowledgment for that useless packets. segment returning is at least one round-trip time. Since TCP's con- gestion window growth is a function of round-trip time (an expo- nential function during slow start and a linear function doing con- 60000 60000
50000 50000
40000 40000
30000 30000
20000 20000
Sequence number (Bytes) Data Segments Sequence number (Bytes) Data Segments 10000 ACKs 10000 ACKs Data Segments (normal) Data Segments (normal) ACKs (normal) ACKs (normal) 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time (sec) Time (sec)
Figure 4: The TCP Daytona ACK division attack convinces the TCP Figure 5: The TCP Daytona DupACK spoofing attack, like the ACK divi- sender to send all but the first few segments of a document in a single burst. sion attack, convinces the TCP sender to send all but the first few segments of a document in a single burst.
3 Implementation experience 60000 To exploit the vulnerabilities described above, we made three mod- ifications to the TCP subsystem of Linux 2.2.10. This resulting 50000 TCP implementation, which we refer to facetiously as “TCP Day- tona”, provides extremely high performance at the expense of its 40000 competitors. We demonstrate these abilities with time sequence plots of packet traces for both normal and modified receiver TCP's. 30000 Needless to say, our implementation is intentionally not “stable”, and would likely lead to congestion collapse if it were widely de- 20000
ployed. Sequence number (Bytes) Data Segments 10000 ACKs Data Segments (normal) 3.1 ACK division ACKs (normal) 0 The TCP Daytona ACK division algorithm adds 24 lines of code 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 that divide each new outgoing ACK into many ACKs for smaller Time (sec) extents of the sequence space. Half of the new code is dedicated to ensuring that the number of outgoing ACKs is no more than Figure 6: The TCP Daytona optimistic ACK attack, by sending a stream should be needed to coerce a sender in slow start to saturate our of early ACKs, convinces the TCP sender to send data much earlier than it test machine's 100Mbps Ethernet interface. normally would. Figure 4 shows client-side TCP sequence number plots of our test machine making an HTTP request for the index.html ob- ject from cnn.com, with and without our ACK division attack en- on a normal transfer. The many duplicate ACKs that the receiver abled. This figure spans the entire transaction, beginning with the sends at around 140ms cause the sender to enter fast recovery and TCP handshake that starts at 0ms and ends at around 70ms, when transmit the rest of the data, which arrives at around 210ms. Were the HTTP request is sent. The first HTTP data from the server ar- there more data, the flurry of duplicate ACKs sent at 210ms-230ms rives at around 140ms. would elicit another burst from the sender. Since there is no more This figure shows that, when this attack is enabled, the many new data, the sender simply fills in the hole it perceives; this seg- small ACKs sent around 140ms convince the Web server to un- ment arrives at around 290ms. This figure illustrates how the Du- leash the entire remainder of the document in a single burst; this pACK spoofing attack can achieve performance essentially equiva- data arrives exactly one round-trip time later. By contrast, with the lent to the ACK division attack – namely, both attacks can convince normal TCP implementation, the server spreads out the data over the sender to empty its entire send buffer in a single burst. the next four round-trip times. In general, as this figure suggests, this attack can convince a TCP sender to send all of its data in a 3.3 Optimistic ACKing single burst. The TCP Daytona implementation of optimistic ACKing consists of 45 lines of code. Because acknowledging data that has not ar- 3.2 DupACK spoofing rived is a fundamentally tricky business, we chose a very simple The TCP Daytona DupACK spoofing attack is implemented by 11 implementation as a proof of concept. When a TCP connection lines of code that cause the receiver to send sufficient duplicate for an HTTP or FTP client receives its first data, we set a timer ACKs such that the sender (re-)enters fast recovery and fills the to expire every 10ms. Any interval would do, but we chose 10ms receiver's advertised flow control window each round-trip time. because it is the smallest interval that Linux 2.2.10 supports on the Figure 5 shows another client-side plot of the same HTTP re- Intel PC platform. Whenever this periodic timer expires, or a new quest, this time with the DupACK spoofing attack superimposed data segment arrives, our receiver sends a new optimistic ACK for one MSS beyond the previous optimistic ACK. Figure 6 shows our optimistic ACK algorithm in action trans- Principle 1. Every message should say what it means: the inter- ferring the same index.html, again with a normal transfer su- pretation of the message should depend only on its content. perimposed. Note that after the first few data segments arrive at around 140ms, the receiver sends a steady stream of ACKs, where Principle 2. The conditions for a message to be acted upon should each ACK is sent about 10ms-70ms before the corresponding data be clearly set out so that someone reviewing a design may see arrives! The result is that the data transfer employing optimistic whether they are acceptable or not. ACKs completes in approximately half the normal transfer time. Principle 3. If the identity of a principal is essential to the mean- Though this is a modest gain relative to the other attacks, a more ing of a message, it is prudent to mention the principal's name bold optimistic ACKing scheme could achieve far greater through- explicitly in the message. put by acknowledging data at a more rapid pace. 4.2 ACK division 3.4 Applicability This vulnerability arises from an ambiguity about how ACKs In order to verify that common TCP implementations have these should be interpreted – a violation of the second principle. TCP's vulnerabilities, we tested each attack against a set of nine Web error-control allows an ACK to specify an arbitrary byte offset in servers running a diverse array of popular server operating systems. the sequence space while the congestion control specification as- To determine the operating system running on each Web server, sumes that an ACK covers an entire segment. nmap we used , a tool that identifies TCP implementations based There are two obvious solutions: either modify the congestion on the “fingerprint” of their characteristic response to TCP seg- control mechanisms to operate at byte granularity or guarantee that ments whose correct response is under-specified [Vas]. To further segment-level granularity is always respected. The first solution decrease the possibility of misidentifying an operating system, in is virtually identical to the “byte counting” modifications to TCP all but three cases we were able to use Web servers that were op- discussed in [All98, All99]. If cwnd is not incremented by a full nmap erated by OS vendors and that confirmed were running the SMSS, but only proportional to the amount of data acknowledged, high-end server operating system from that vendor. then ACK division attacks will have no effect. The second, perhaps Table 1 shows which TCP implementations are vulnerable to simpler, solution is to only increment cwnd by one SMSS when each attack. The attacks are all widely applicable, with three ex- a valid ACK arrives that covers the entire data segment sent. As ceptions. First, Linux 2.2 is not vulnerable to the ACK division mentioned earlier, this technique is employed in the latest versions attack because it increases its congestion window only if at least of Linux (2.2.x) at the time of this writing. one whole previously-unacknowledged segment is acknowledged. Second, Linux 2.0 refuses to count duplicate acknowledgments un- til cwnd is greater than three. Consequently, the DupACK attack 4.3 DupACK spoofing will fail if initiated on connection startup. Finally, Windows NT During fast recovery and fast retransmit, TCP's design violates the appears to have a bug that causes it to rarely, if ever, enter fast re- first principle – the meaning of a duplicate ACK is implicit, depen- covery. This bug renders NT immune to attacks that rely on extra dent on previous context, and consequently difficult to verify. duplicate acknowledgments. TCP assumes that all duplicate ACKs are sent in response to unique and distinct segments. This assumption is unenforceable 4 Solutions without some mechanism for identifying the data segment that led to the generation of each duplicate ACK. The traditional method for As demonstrated in the previous section, TCP's current specifica- guaranteeing association is to employ a nonce [Sch96]. We present tion has several vulnerabilities that allow a misbehaving receiver a simple version of such a nonce protocol below (we will extend it to control the sender's transmission rate. While it is impossible shortly): to force a receiver to behave correctly, it is both possible and de- sirable to remove its incentive to misbehave. That is, we wish to Singular Nonce: ensure that a misbehaving receiver can not obtain data faster than We introduce two new fields into the TCP packet format: a behaving one. In this section we describe simple modifications Nonce and Nonce reply. For each segment, the sender to the TCP protocol that, without changing the nature of conges- fills the Nonce field with a unique random number gen- tion control, allow the verification of what has historically been an erated when the segment is sent. When a receiver gen- implicit contract between sender and the receiver – that each ac- erates an ACK in response to a data segment, it echoes knowledgment faithfully and unambiguously reflects data that has the nonce value by writing it into the Nonce Reply field. been successfully transferred to the receiver. The sender can then arrange to only inflate cwnd in response to duplicate ACKs whose Nonce Reply value corresponds to a data 4.1 Designing robust protocols segment previously sent and not yet acknowledged. We believe TCP's vulnerabilities arise from a combination of un- We note that the singular nonce, as we have described it so stated assumptions, casual specification and a pragmatic need to far, is similar to the Timestamps option [JBB92], with two impor- develop congestion control mechanisms that are backward compat- tant differences. First, the Nonce field preserves association for ible with previous TCP implementations. In retrospect, if the con- duplicate ACKs, while the Timestamps option does not (prefer- tract between sender and receiver had been defined explicitly these ring instead to reuse the previous timestamp value). Second, and vulnerabilities would have been obvious. more important, because Timestamps is a option, a receiver has the We are inspired by Abadi and Needham's paper, Prudent En- choice to not participate in its use. We cannot rely on misbehaving gineering Practice for Cryptographic Protocols, which presents clients to voluntarily participate in their own policing. For the same a set of design rules that are surprisingly germane to this prob- reason, we cannot rely on other TCP options, such as proposed ex- lem [AN96]. In particular we reprint their first three principles be- tensions to SACK [FMM+ 99], to eliminate this vulnerability. low: ACK Division DupACK Spoofing Optimistic Acks Solaris 2.6 Y Y Y Linux 2.0 Y Y(N) Y Linux 2.2 N Y Y Windows NT4/95 Y N Y FreeBSD 3.0 Y Y Y DIGITAL Unix 4.0 Y Y Y IRIX 6.x Y Y Y HP-UX 10.20 Y Y Y AIX 4.2 Y Y Y
Table 1: For each TCP Daytona attack, we denote with a “Y” those operating systems that we found to be vulnerable. Most operating systems we tested were vulnerable to all the Daytona attacks.
Unfortunately, our fix requires the modification of clients and servers and the addition of a TCP field. While it is the only com- Sender Receiver plete solution we have discovered, there are sender-only heuristics Data 1:146 which can mitigate, although not eliminate, the impact of the Du- 1 (27) pACK spoofing attack in a purely backward compatible manner. In ) 61 (27 particular, the sender can maintain a count of outstanding segments ACK 14 sent above the missing segment. For each duplicate acknowledg- Data ment this count is decremented and when it reaches zero any ad- 1461:2 D 921 (6 ditional duplicate acknowledgments are ignored. This simple fix ata 29 2) 21:438
appears to limit the number of segments wrongly sent to contain no 1 (36)