What is Congestion? What gives rise to congestion?

Computer Networks Resource contention: offered load is greater than system capacity • too much data for the network to handle • how is it different from flow control? Lectures 31: TCP Congestion Control

Why is Congestion Bad? Consequences of Congestion Causes of congestion: • packets arrive faster than a router can forward them If queueing delay > RTO, sender retransmits packets, • routers queue packets that they cannot serve immediately adding to congestion

Why is congestion bad? Dropped packets also lead to more retransmissions • if queue overflows, packets are dropped • queued packets experience delay If unchecked, could result in congestion collapse • increase in load results in a decrease in useful work done A packet transmitted (delayed)

10 Mbps When a packet is dropped, “upstream” capacity 10 Mbps already spent on the packet was wasted B 10 Mbps packets queued (delay) free buffer: arriving packets dropped (lost) if buffer overflows Approaches to Congestion Dealing with Congestion Free for all Dynamic adjustment (TCP) • every sender infers the level of congestion • many dropped (and retransmitted) packets • each adapts its sending rate “for the greater good” • can cause congestion collapse

• the long suffering wins What is “the greater good” (performance objective)? Paid-for service • maximizing , even if some users suffer more? • fairness? (what’s fair?) • pre-arrange allocations

• requires negotiation before sending packets • requires a pricing and payment model Constraints: • don’t drop packets of the high-bidders • decentralized control • only those who can pay get good service • unlike routing, no local reaction at routers (beyond buffering and dropping) • long feedback time • dynamic network condition: connections come and go

What is the Performance Objective? Sender Behavior System capacity: load vs. throughput: How does sender detect congestion? • congestion avoidance: operate system at “knee” capacity • explicit feedback from the network? • congestion control: drive system to near “cliff” capacity • implicit feedback: inferred from network performance?

To avoid or prevent congestion, How should the sender adapt? sender must know system capacity congestion • explicit sending rate computed by the network? and operate below it collapse • sender coordinates with receiver? • sender reacts locally? How do senders discover system increase in load ? capacity and control congestion? that results in a decrease in How fast should new • detect congestion useful work TCP senders send? • slow down transmission done, increase in response What does the sender see? time What can the sender change? Jain et al. How Routers Handle Packets How it Looks to the Sender Congestion happens at router links Packet delay Simple resource scheduling: FIFO queue and drop-tail • packet experiences high delay Queue scheduling: manages access to bandwidth • first in first out: packets transmitted in the order they arrive • packet gets dropped along the way How does TCP sender learn of these? • delay:

• round-trip time estimate (RTT) • loss Drop policy: manages access to buffer space • retransmission timeout (RTO) • drop tail: if queue is full, drop the incoming packet • duplicate acknowledgments Jain et al. How do RTT and RTO translate to system capacity? • how to detect “knee” capacity? • how to know if system has “gone off the cliff”?

[Rexford] [Rexford]

What can Sender Do? Discovering System Capacity Upon detecting congestion (packet loss) What TCP sender does: • decrease sending rate • probe for point right before cliff (“pipe size”) • slow down transmission on detecting cliff (congestion) But, what if congestion abated? • fast probing initially, up to a threshold (“slow start”) • suppose some connections ended transmission and • slower probing after threshold is reached (“linear increase”) • there is more bandwidth available • would be a shame to stay at a low sending rate Why not start by sending packet a large amount of data dropped TCP Tahoe

Upon not detecting congestion ) and slow down only

• increase sending rate, a little at a time cwnd • and see if packets are successfully delivered upon congestion? Both good and bad

• pro: obviate the need for explicit feedback from window ( • con: under-shooting and over-shooting cliff capacity

[Rexford] Self-Clocking TCP TCP Congestion Control TCP uses cumulative ACK for flow control and Sender maintains a congestion window (cwnd) retransmission and congestion control • to account for the maximum number of bytes in transit • i.e., number of bytes still awaiting acknowledgments TCP follows a so-called “Law of Packet Conservation”: Do not inject a new packet into the network until a Sender’s send window (wnd) is resident departs (ACK received) wnd = MIN(rwnd, floor(cwnd)) Since packet transmission is timed by receipt of ACK, • rwnd: receiver’s advertised window TCP is said to be self-clocking • initially set cwnd to 1 MSS, never drop below 1 MSS • increase cwnd if there’s no congestion (by how much?) • exponential increase up to ssthresh (initially 64 KB) receiver • linear increase afterwards • on congestion, decrease cwnd (by how much?)

receiver • always struggling to find the right transmission rate, just to the left of cliff [Stevens]

TCP Slow-Start Increasing cwnd When connection begins, increase rate Probing the “pipe-size” (system capacity) in two phases: exponentially until first loss event: Host A Host B 1. slow-start: exponential increase • double cwnd every RTT (or: increased by

1 for every returned ACK) one segment while (cwnd <= ssthresh) { cwnd += 1 ⇒ really, fast start, but from a low base, vs. starting RTT with a whole receiver window’s worth of data as } for every returned ACK two segments TCP originally did, without congestion control OR: cwnd *= 2 for every cwnd-full of ACKs

2. congestion avoidance: linear increase four segments while (cwnd > ssthresh) { Jacobson & Karels cwnd += 1/floor(cwnd) time } for every returned ACK

OR: cwnd += 1 for every cwnd-full of ACKs

Jacobson and Karels TCP Slow Start Example Dealing with Congestion Once congestion is detected, • how should the sender reduce its transmission rate? • how does the sender recover from congestion?

Goals of congestion control: 1. Efficiency: resources are fully utilized TCP connection 1 2. Fairness: if k TCP connections share the same bottleneck link of bottleneck bandwidth R, each connection router should get an average rate of R/k capacity R TCP connection 2 pipe full Stevens

Goals of Congestion Control Adapting to Congestion 3. Responsiveness: fast convergence, D. -M. Chiu, R. Jain / Congestion Avoidance in Computer Networks By how much should cwnd (w) be changed? quick adaptation to current class ought to have the equal share of the bot- Limiting ourselves to only linear adjustments: capacity ~e__~C Responsiveness tleneck. Thus, a system in which x i ( t ) = x j ( t ) V i, j • increase when there’s no congestion: w’ = b w +a sharing the same bottleneck is operating fairly. If i i all users do not get 4.exactly Smoothness equal allocations,: little oscillation the • decrease upon congestion: w’ = bdw +ad system is less fair and• welarger change-step increases need an index or a Goal oothness function that quantifies responsiveness but decreases the fairness. One such ~ Alternatives for the coefficients: index is [6]: smoothness 1. Additive increase, additive decrease: Total 5. Distributed control: load on ai > 0, ad < 0, bi = bd = 1 Fairness: F(x)- (Ex')2 n(r ;ino (explicit) coordination ) " the 2. Additive increase, multiplicative decrease: network between nodes ai > 0, bi = 1, ad = 0, 0 < bd < 1 This index has the following properties: 3. Multiplicative increase, additive decrease: (a) The fairness is bounded between 0 and 1 (or Time Chiu & Jain Fig. 3. Responsiveness and smoothness. a = 0, b > 1, a < 0, b = 1 0% and 100%). A totally fair allocation (with i i d d all xi's equal)Guideline for congestion control (as in routing): has a fairness of 1 and a 4. Multiplicative increase, multiplicative decrease: totally unfair allocation (with all resources (4) Convergence: Finally we require the control be skeptical of good news, react fast to bad news bi > 1, 0 < bd < 1, ai = ad = 0 given to only one user) has a fairness of 1/n scheme to converge. Convergence is generally which is 0 in the limit as n tends to oo. measured by the speed with which (or time taken (b) The fairness is independent of scale, i.e., till) the system approaches the goal state from any unit of measurement does not matter. starting state. However, due to the binary nature (c) The fairness is a continuous function. Any of the feedback, the system does not generally slight change in allocation shows up in the converge to a single steady state. Rather, the sys- fairness. tem reaches an "equilibrium" in which it oscillates (d) If only k of n users share the resource around the optimal state. The time taken to reach equally with the remaining n- k users not this "equilibrium" and the size of the oscillations receiving any resource, then the fairness is jointly determine the convergence. The time de- k/n. termines the responsiveness, and the size of the For other properties of this fairness function, see oscillations determine the smoothness of the con- [61. trol. Ideally, we would like the time as well as (3) Distributedness: The next requirement that oscillations to be small. Thus, the controls with we put on the control scheme is that it be distrib- smaller time and smaller amplitude of oscillations uted. A centralized scheme requires complete are called more responsive and more smooth, re- knowledge of the state of the system. For example, spectively, as shown in Fig. 3. we may want to know each individual user's de- mand or their sum. This information may be 1.5. Outline of this Paper available at the resource. However, conveying this information to each and every user causes consid- In this paper, we develop a simple and intuitive erable overhead, especially since a user may be methodology to explain when and why a control using several resources at the same time. We are converges. We address the following questions: thus primarily interested in control schemes that What are all the possible solutions that converge to can be implemented in real networks and, there- efficient and fair states? How do we compare those fore, we assume that the system does the mini- controls that converge? mum amount of feedback. It only tells whether it The paper is organized as follows. In Section 2 is underloaded or overloaded via the binary feed- we will characterize the set of all linear controls back bits. Other information such as Xsoal and the that converge and, thus, identify the set of feasible number of users sharing the resource are assumed controls. Then we narrow down the feasible set to to be unknown by the users. This restricts the set a subset that satisfies our distributedness criterion. of feasible schemes. We, therefore, describe the set These subset still includes controls that have un- of feasible schemes with and without this restric- acceptable magnitudes of oscillation or those that tion. converge too slowly. Then in Section 3, we discuss 6 D.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks 6 D.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks

how to find the subset of feasible distributed system to this point regardless of the starting how to find the subset of feasible distributed system to this point regardless of the starting controls that represent the optimal trade-off of position. controls that represent the optimal trade-off of position. responsiveness and smoothness, as we defined in All points below the efficiency line represent an responsiveness and smoothness, as we defined in All points below the efficiency line represent an convergence. In Section 4, we discuss how the "underloaded" system and ideally the system convergence. In Section 4, we discuss how the "underloaded" system and ideally the system results extend to nonlinear controls. And in the would ask users to increase their load. Consider, results extend to nonlinear controls. And in the would ask users to increase their load. Consider, last section we summarize the results and discuss for example, the point x 0 = (xl0, x20 ). The ad- last section we summarize the results and discuss for example, the point x 0 = (xl0, x20 ). The ad- some of the practical considerations (such as sim- ditive increase policy of increasing both users' some of the practical considerations (such as sim- ditive increase policy of increasing both users' plicity, robustness, and scalability). allocations by a~ corresponds to moving along a plicity, robustness, and scalability). allocations by a~ corresponds to moving along a 45 ° line. The multiplicative increase policy of 45 ° line. The multiplicative increase policy of increasing both users' allocations by a factor b I increasing both users' allocations by a factor b I 2. Feasible Linear Controls corresponds to moving along the line that con- 2. Feasible Linear Controls corresponds to moving along the line that con- nects the origin to the point. Similarly, all points nects the origin to the point. Similarly, all points 2.1. Vector Representation of the Dynamics above the efficiency line represent an "overloaded" 2.1. Vector Representation of the Dynamics above the efficiency line represent an "overloaded" system and additive decrease is represented by a system and additive decrease is represented by a In determining the set of feasible controls, it is 45 ° line, while multiplicative decrease is rep- In determining the set of feasible controls, it is 45 ° line, while multiplicative decrease is rep- helpful to view the system state transitions as a resented by the line joining the point to the origin. helpful to view the system state transitions as a resented by the line joining the point to the origin. trajectory through an n-dimensional vector space. The fairness at any point (x 1, x2) is given by trajectory through an n-dimensional vector space. The fairness at any point (x 1, x2) is given by We describe this method using a 2-user case, which We describe this method using a 2-user case, which can be viewed in a 2-dimensional space. (Xl + x2) 2 can be viewed in a 2-dimensional space. (Xl + x2) 2 Fairness - Fairness - As shown in Fig. 4, any 2-user resource al- 2(x 2 + x22) " As shown in Fig. 4, any 2-user resource al- 2(x 2 + x22) " location {Xl(t), x2(t)} Can be represented as a location {Xl(t), x2(t)} Can be represented as a point (x 1, x2) in a 2-dimensional space. In this Notice that multiplying both allocations by a fac- point (x 1, x2) in a 2-dimensional space. In this Notice that multiplying both allocations by a fac- figure, the horizontal axis represents allocations to tor b does not change the fairness. That is, figure, the horizontal axis represents allocations to tor b does not change the fairness. That is, user 1, and the vertical axis represents allocations (bx 1, bx2) has the same fairness as (x 1, x2) for all user 1, and the vertical axis represents allocations (bx 1, bx2) has the same fairness as (x 1, x2) for all to user 2. All allocations for which x I + x 2 = Xgoal values of b. Thus, all points on the line joining a to user 2. All allocations for which x I + x 2 = Xgoal values of b. Thus, all points on the line joining a are efficient allocations. This corresponds to the point to origin have the same fairness. We, there- are efficient allocations. This corresponds to the point to origin have the same fairness. We, there- straight line marked "efficiency line". All al- fore, call a line passing through the origin a straight line marked "efficiency line". All al- fore, call a line passing through the origin a locations for which x 1 = x 2 are fair allocations. "equi-fairness" line. The fairness decreases as the locations for which x 1 = x 2 are fair allocations. "equi-fairness" line. The fairness decreases as the Resource AllocationThis corresponds to the straight line marked "fair- slope of the lineAdditive/Multiplicative Factors either increases above or de- This corresponds to the straight line marked "fair- slope of the line either increases above or de- ness line". The two lines intersect at the point creases below the fairness line. ness line". The two lines intersect at the point creases below the fairness line. (Xgoal/2, Xgo~/2 ) that is the optimal point. The Figure 5 shows a complete trajectory of the (Xgoal/2, Xgo~/2 ) that is the optimal point. The Figure 5 shows a complete trajectory of the View resource allocation as a trajectory through an goal of control schemes should be to bring the two-user systemAdditive factor starting from point x 0 using: adding the an same amountgoal of control schemes to both should be to bring the two-user system starting from point x 0 using an additive increase/multiplicative decrease control additive increase/multiplicative decrease control n-dimensional vector space, one dimension per user policy. The pointusers’ allocation moves an allocation along a x 0 is below the efficiency line 45º line policy. The point x 0 is below the efficiency line and so both users are asked to increase. They do and so both users are asked to increase. They do l Equi- so additively by moving along at an angle of 45 o. l Equi- so additively by moving along at an angle of 45 o. This brings them to x~ which happens to be above This brings them to x~ which happens to be above A 2-user allocation trajectory: Fairness Fairness Multiplicative factor: Fairness Fairness the efficiency line. The users are asked to decrease the efficiency line. The users are asked to decrease UserR ~Lm~ Line and they do so multiplicatively. This corresponds UserR ~Lm~ Line and they do so multiplicatively. This corresponds • x1, x2: the two users’ allocations multiplying both users’ to moving towards the origin on the line joining to moving towards the origin on the line joining 2's ~ ~ // 2's ~ ~ // • Efficiency Line: x1 + x2 = xi = R x 1 and the origin.allocation by the This brings them to pointsame factor x 2, x 1 and the origin. This brings them to point x 2, • below this line, system is under-loaded Alloc- ]~ //Overload which happens to be below the efficiency line and Alloc- ]~ //Overload which happens to be below the efficiency line and the cycle repeats.moves an allocation on Notice that x 2 has higher fair- a line the cycle repeats. Notice that x 2 has higher fair- • above, overloaded ness than x 0. Thus,through the origin with every cycle, the fairness (the ness than x 0. Thus, with every cycle, the fairness • Fairness Line: x = x increases slightly, and eventually, the system con- increases slightly, and eventually, the system con- 1 2 verges to the optimal“equi state-fairness,” or rather, in the sense that it verges to the optimal state in the sense that it • Optimal Point: efficient and fair keeps oscillating around the goal. keeps oscillating around the goal. Similar trajectories“equi can-unfairness” line) be drawn for other con- Similar trajectories can be drawn for other con- • Goal of congestion control: trol policies. Although not all control policies con- trol policies. Although not all control policies con- User l's Allocation xt R verge. For example,• the Fig. slope6 shows of this line, not any the trajectory for User l's Allocation xt R verge. For example, Fig. 6 shows the trajectory for to operate at optimal point Fig. 4. Vector representationof a two-usercase. the additive increase/additive decrease control Fig. 4. Vector representationof a two-usercase. the additive increase/additive decrease control Chiu & Jain position on it, determines fairness Chiu & Jain

D.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks 7

Fairness Line xl /" / / //

User It can be shown that only AIMD 2's AIMD Alloc- TCP Congestion Recovery takes system near optimal point ation x2 /,,;):,; ] l ll / / I/1 I l ll/ I Ill I II1~ Once congestion is detected, Additive Increase, Additive Increase, I ,i// ~5~' \Efficiency Line /¢,I i//~ • by how much should sender decrease cwnd? Multiplicative Decrease: Additive Decrease: ID- User l's Allocation xl Fig. 5. AdditiveIncrease/Multiplicative Decrease converges to the optimal point. • how does sender recover from congestion? system converges to an system converges to • which packet(s) to retransmit? policy starting from the position x 0. The system converge to efficiency, but not to fairness. The keeps moving back and forth along a 45 ° line conditions for convergence to efficiency and fair- • how to increase cwnd again? equilibrium near the through x 0. With efficiency, but not to such a policy, the system can ness are derived algebraically in the next section. Optimal PointD.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks fairness7 ] The operating ] point keeps First, reduce the exponential increase / oscillating along , . R Fairness R \ / this line I~ awness Line / Line threshold xl /" ssthresh = cwnd/2 / \// ,," / packet

// ) dropped ~N~ l//j TCP Tahoe

User User fx0 TCP Tahoe: cwnd 2's 2's Alloc- Alloc- • retransmit using Go-Back-N ation ation x2 /,,;):,; x2 ] l ll / • reset cwnd=1 / I/1 I l ll/ / j/j ~ I Ill I II1~ • restart slow-start I ,i// ~5~' \Efficiency Line / ~fficteney Line

I i//~ congestion window ( /¢, f

ID- User l's Allocation xl Chiu & Jain User l's Allocation xl Chiu & Jain Fig. 5. AdditiveIncrease/Multiplicative Decrease converges to theR optimal point. Fig. 6. AdditiveIncrease/Additive Decrease does not converge.R policy starting from the position x 0. The system converge to efficiency, but not to fairness. The keeps moving back and forth along a 45 ° line conditions for convergence to efficiency and fair- through x 0. With such a policy, the system can ness are derived algebraically in the next section.

] The operating ] point keeps / oscillating along , . \ / this line I~ awness / Line \// ,,"

~N~ l//j User fx0 2's Alloc- ation x2 / j/j ~

/ ~fficteney Line f

User l's Allocation xl Fig. 6. AdditiveIncrease/Additive Decrease does not converge. Fast Retransmit Fast Retransmit Example

Motivation: waiting for RTO is too slow TCP Tahoe also does fast retransmit: • with cumulative ACK, receipt of packets following a lost packet causes duplicate ACKs to be returned • interpret 3 duplicate ACKs as an implicit NAK • retransmit upon receiving 3 dupACKs, i.e., on receipt of the rwnd 4th ACK with the same seq#, retransmit segment • why 3 dupACKs? why not 2 or 4?

With fast retransmit, TCP can retransmit after 1 RTT sent segments sender’s 3 dupACKs retransmit on instead of waiting for RTO wnd 4th dupACK ACKed seq#

time (secs) [Hoe]

TCP Tahoe Recovers Slowly TCP Reno and Fast Recovery cwnd re-opening and retransmission of lost packets regulated by returning ACKs TCP Reno does fast recovery: • • duplicate ACK doesn’t grow cwnd, so TCP Tahoe must wait current value of cwnd is the estimated system at least 1 RTT for fast retransmitted packet to cause a non (pipe) capacity duplicated ACK to be returned • after congestion is detected, want to continue • if RTT is large, Tahoe re-grows transmitting at half the estimated capacity cwnd very slowly How? • each returning ACK signals that an outstanding packet has left the network • don’t send any new packet until half of the expected number of ACKs have returned

1 RTT

[Hoe] Fast Recovery 1. on congestion, retransmit lost Summary: TCP Congestion Control segment, set ssthresh = cwnd/2 2. remember highest seq# sent, • When cwnd is below ssthresh, sender in slow-

snd_high; and remember snd_high start phase, window grows exponentially current cwnd, let’s call it pipe 3. decrease cwnd by half • When cwnd is above ssthresh, sender is in 4. increment cwnd for every congestion-avoidance phase, window grows linearly returning dupACK, incl. the 3 used for fast retransmit • When a 3 dupACKs received, ssthresh set to 5. send new packets (above cwnd/2 and cwnd set to new ssthresh pipe snd_high) only when cwnd > pipe • If more dupACKs return, do fast recovery 6. exit fast-recovery when a sshthresh+1 cwnd/2 non-dup ACK is received • Else when RTO occurs, set ssthresh to cwnd/2 7. set cwnd = ssthresh + 1 and set cwnd to 1 MSS and resume linear increase cwnd: number of bytes unACKed [Hoe]

TCP Congestion Control Examples Factors in TCP Performance TCP keeps track of outstanding bytes by two variables: 1. snd_una: lowest unACKed seq#, i.e., snd_una records the seq# associated with the last ACK 2. snd_next: seq# to be sent next • RTT estimate • RTO computation Amount of outstanding bytes: • pipe = snd_next - snd_una sender’s sliding window (wnd) • receiver’s window (rwnd) Scenario: • • 1 byte/pkt congestion window (cwnd) • receiver R takes 1 transmit time to return an ACK • slow-start threshold (ssthresh) • sender S sends out the next packet immediately upon • fast retransmit receiving an ACK • fast recovery • rwnd = ∞ • cwnd = 21, in linear increase mode • pipe = 21 TCP Variants Summary of TCP Variants

Original TCP: TCP New Reno: • loss recovery depends on RTO • implements fast retransmit phase whereby a partial ACK, a non-dupACK that is < snd_high (seq# sent before TCP Tahoe: detection of loss), doesn’t take TCP out of fast recovery, • slow-start and linear increase instead retransmits the next lost segment • interprets 3 dupACKs as loss signal, • only non-dupACK that is ≥ snd_high takes TCP out of fast but restart sslow-start after fast retransmit recovery: resets cwnd to ssthresh+1 and resumes linear increase TCP Reno: • fast recovery, i.e., consumes half returning dupACKs before transmitting one new packet for each additional returning dupACKs • on receiving a non-dupACK, resumes linear-increase from half of old cwnd value