TCP Congestion Control • Congestion Control •RED 15-744: Computer Networking • Assigned Reading • [FJ93] Gateways for Congestion Avoidance • [TFRC] Equation-Based Congestion Control for L-4 TCP Unicast Applications (2 sections)

2

Introduction to TCP Key Things You Should Know Already

• Communication abstraction: • Port numbers • Reliable • TCP/UDP checksum • Ordered • Point-to-point • Sliding window flow control • Byte-stream • Full duplex • Sequence numbers • Flow and congestion controlled • TCP connection setup • Protocol implemented entirely at the ends • TCP reliability • Fate sharing • Timeout • Sliding window with cumulative acks • Data-driven • Ack field contains last in-order packet received • Duplicate acks sent when out-of-order packet received • Chiu&Jain analysis of linear congestion control

3 4

1 Overview TCP Congestion Control • Changes to TCP motivated by • TCP congestion control ARPANET congestion collapse • TFRC • Basic principles •AIMD • TCP and queues • Packet conservation • Queuing disciplines • Reaching steady state quickly • ACK clocking •RED

5 6

AIMD Implementation Issue

• Distributed, fair and efficient • Operating system timers are very coarse – how to pace packets out smoothly? • Packet loss is seen as sign of congestion and • Implemented using a congestion window that limits how results in a multiplicative rate decrease much data can be in the network. • Factor of 2 • TCP also keeps track of how much data is in transit • Data can only be sent when the amount of outstanding • TCP periodically probes for available bandwidth data is less than the congestion window. by increasing its rate • The amount of outstanding data is increased on a “send” and decreased on “ack” • (last sent – last acked) < congestion window Rate • Window limited by both congestion and buffering • Sender’s maximum window = Min (advertised window, cwnd)

Time 7 8

2 Congestion Avoidance Congestion Avoidance Sequence Plot

• If loss occurs when cwnd = W • Network can handle 0.5W ~ W segments • Set cwnd to 0.5W (multiplicative decrease) • Upon receiving ACK Sequence No • Increase cwnd by (1 packet)/cwnd • What is 1 packet?  1 MSS worth of bytes • After cwnd packets have passed by  approximately increase of 1 MSS • Implements AIMD Packets Acks Time

9 10

Congestion Avoidance Behavior Packet Conservation

• At equilibrium, inject packet into network only Congestion when one is removed Window • Sliding window and not rate controlled • But still need to avoid sending burst of packets  would overflow links • Need to carefully pace out packets • Helps provide stability • Need to eliminate spurious retransmissions • Accurate RTO estimation Time Packet loss Cut Grabbing • Better loss recovery techniques (e.g. fast retransmit) + Timeout Congestion back Window Bandwidth and Rate

11 12

3 TCP Packet Pacing Reaching Steady State • Congestion window helps to “pace” the • Doing AIMD is fine in steady state but transmission of data packets slow… • In steady state, a packet is sent when an ack is • How does TCP know what is a good initial received rate to start with? • Data transmission remains smooth, once it is smooth • Self-clocking behavior • Should work both for a CDPD (10s of Kbps or less) and for supercomputer links (10 Gbps and Pb

Pr growing) • Quick initial phase to help get up to speed Sender Receiver (called slow start) As Ar Ab

13 14

Slow Start Packet Pacing Slow Start Example

• How do we get this One RTT clocking behavior to 0R start? 1 • Initialize cwnd = 1 One pkt time

• Upon receipt of every 1R 1 ack, cwnd = cwnd + 1 2 • Implications 3 • Window actually 2R 2 3 increases to W in RTT * 4 6 log2(W) 5 7 4 5 6 7 • Can overshoot window 3R and cause packet loss 8 10 12 14 9 11 13 15

15 16

4 Slow Start Sequence Plot Return to Slow Start

. . • If packet is lost we lose our self clocking as . well • Need to implement slow-start and congestion avoidance together Sequence No • When timeout occurs set ssthresh to 0.5w • If cwnd < ssthresh, use slow start • Else use congestion avoidance

Packets Acks Time

17 18

TCP Saw Tooth Behavior Questions

• Current loss rates – 10% in paper Congestion Window Timeouts may still occur • Uniform reaction to congestion – can different nodes do different things? • TCP friendliness, GAIMD, etc.

• Can we use queuing delay as an indicator? • TCP Vegas

Slowstart Time Initial Fast to pace Slowstart Retransmit • What about non-linear controls? packets and Recovery • Binomial congestion control

19 20

5 Overview Changing Workloads

• New applications are changing the way TCP is used • TCP congestion control • 1980’s Internet • Telnet & FTP  long lived flows • TFRC • Well behaved end hosts • Homogenous end host capabilities • See Hugo Pinto’s slides (slides here are optional) • Simple symmetric routing • TCP and queues • 2000’s Internet • Web & more Web  large number of short xfers • Queuing disciplines • Wild west – everyone is playing games to get bandwidth • Cell phones and toasters on the Internet • Policy routing •RED • How to accommodate new applications?

21 22

TCP Friendliness TCP Friendly Rate Control (TFRC)

• What does it mean to be TCP friendly? • Equation 1 – real TCP response • TCP is not going away • Any new congestion control must compete with TCP flows • Should not clobber TCP flows and grab bulk of link •1st term corresponds to simple derivation • Should also be able to hold its own, i.e. grab its fair share, or it •2nd term corresponds to more complicated timeout will never become popular behavior • How is this quantified/shown? • Is critical in situations with > 5% loss rates  where timeouts occur frequently • Has evolved into evaluating loss/ behavior • Key parameters • If it shows 1/sqrt(p) behavior it is ok •RTO • But is this really true? •RTT • Loss rate

23 24

6 RTO/RTT Estimation Loss Estimation • RTO not used to perform retransmissions • Loss event rate vs. loss rate • Used to model TCP’s extremely slow transmission rate • Characteristics in this mode • Should work well in steady loss rate • Only important when loss rate is high • Should weight recent samples more • Accuracy is not as critical • Should increase only with a new loss • Different TCP’s have different RTO calculation • Should decrease only with long period without loss • Clock granularity critical 500ms typical, 100ms, • Possible choices 200ms, 1s also common • Dynamic window – loss rate over last X packets • RTO = 4 * RTT is close enough for reasonable operation • EWMA of interval between losses •EWMA RTT • Weighted average of last n intervals • Last n/2 have equal weight •RTTn+1 = (1-)RTTn + RTTSAMP

25 26

Loss Estimation Slow Start • Dynamic windows has many flaws • Used in TCP to get rough estimate of • Difficult to chose weight for EWMA network and establish ack clock • Solution WMA • Don’t need it for ack clock • Choose simple linear decrease in weight for • TCP ensures that overshoot is not > 2x last n/2 samples in weighted average • Rate based protocols have no such limitation – • What about the last interval? why? • Include it when it actually increases WMA value • TFRC slow start • What if there is a long period of no losses? • New rate set to min(2 * sent, 2 * recvd) • Special case (history discounting) when current • Ends with first loss report  rate set to ½ interval > 2 * avg current rate

27 28

7 Congestion Avoidance Overview

• Loss interval increases in order to increase rate • Primarily due to the transmission of new packets in • TCP congestion control current interval • History discounting increases interval by removing old • TFRC intervals • .14 packets per RTT without history discounting • TCP and queues • .22 packets per RTT with discounting • Much slower increase than TCP • Queuing disciplines • Decrease is also slower • 4 – 8 RTTs to halve speed •RED

29 30

TCP Performance TCP Congestion Control • Can TCP saturate a link? Rule for adjusting W • Congestion control Only W packets • If an ACK is received: W ← W+1/W may be outstanding • Increase utilization until… link becomes • If a packet is lost: W ← W/2 congested

• React by decreasing window by 50% Source Dest • Window is proportional to rate * RTT Window size • Doesn’t this mean that the network Wmax oscillates between 50 and 100% utilization? Wmax • Average utilization = 75%?? 2 • No…this is *not* right! t

31 32

8 Summary Unbuffered Link TCP Performance

W Minimum window • In the real world, queues play for full utilization important role • Window is proportional to rate * RTT • But, RTT changes as well the window

t • Window to fill links = propagation RTT * bottleneck bandwidth • The router can’t fully utilize the link • If window is larger, packets sit in queue on • If the window is too small, link is not full bottleneck link • If the link is full, next window increase causes drop • With no buffer it still achieves 75% utilization

34 35

TCP Performance Summary Buffered Link

• If we have a large router queue  can get 100% W utilization • But, router queues can cause large delays Buffer Minimum window for full utilization • How big does the queue need to be? • Windows vary from W  W/2 • Must make sure that link is always full t • W/2 > RTT * BW • With sufficient buffering we achieve full link utilization • W = RTT * BW + Qsize • The window is always above the critical threshold • Therefore, Qsize > RTT * BW • Buffer absorbs changes in window size • Ensures 100% utilization • Buffer Size = Height of TCP Sawtooth • Delay? • Minimum buffer size needed is 2T*C • This is the origin of the rule-of-thumb • Varies between RTT and 2 * RTT

36 38

9 Overview Queuing Disciplines • Each router must implement some queuing • TCP congestion control discipline • TFRC • Queuing allocates both bandwidth and buffer space: • TCP and queues • Bandwidth: which packet to serve (transmit) next • Queuing disciplines • Buffer space: which packet to drop next (when required) •RED • Queuing also affects

39 40

Packet Drop Dimensions Typical Internet Queuing • FIFO + drop-tail Aggregation • Simplest choice Per-connection state Single class • Used widely in the Internet • FIFO (first-in-first-out) Class-based queuing • Implies single class of traffic • Drop-tail Drop position Head Tail • Arriving packets get dropped when queue is full regardless of flow or importance Random location • Important distinction: • FIFO: scheduling discipline • Drop-tail: drop policy Early drop Overflow drop

41 42

10 FIFO + Drop-tail Problems Active Queue Management • Leaves responsibility of congestion control • Design active router queue management to to edges (e.g., TCP) aid congestion control • Does not separate between different flows •Why? • No policing: send more packets  get more • Routers can distinguish between propagation service and persistent queuing delays • Synchronization: end hosts react to same • Routers can decide on transient congestion, based on workload events

43 44

Active Queue Designs Overview • Modify both router and hosts • DECbit – congestion bit in packet header • TCP congestion control • Modify router, hosts use TCP • TFRC • Fair queuing • Per-connection buffer allocation • TCP and queues • RED (Random Early Detection) • Drop packet or set bit in packet header as soon as • Queuing disciplines congestion is starting •RED

45 46

11 Internet Problems Design Objectives • Full queues • Keep throughput high and delay low • Routers are forced to have have large queues • Accommodate bursts to maintain high utilizations • Queue size should reflect ability to accept • TCP detects congestion from loss bursts rather than steady-state queuing • Forces network to have long standing queues in steady-state • Improve TCP performance with minimal • Lock-out problem hardware changes • Drop-tail routers treat bursty traffic poorly • Traffic gets synchronized easily  allows a few flows to monopolize the queue space

47 48

Lock-out Problem Full Queues Problem • Random drop • Drop packets before queue becomes full • Packet arriving when queue is full causes some (early drop) random packet to be dropped • Intuition: notify senders of incipient • Drop front congestion • On full queue, drop packet at head of queue • Example: early random drop (ERD): • Random drop and drop front solve the lock- • If qlen > drop level, drop each new packet with fixed out problem but not the full-queues problem probability p • Does not control misbehaving users

49 50

12 Random Early Detection (RED) RED Algorithm • Detect incipient congestion, allow bursts • Maintain running average of queue length

• Keep power (throughput/delay) high • If avgq < minth do nothing • Keep average queue size low • Low queuing, send packets through • Assume hosts respond to lost packets • If avgq > maxth, drop packet • Avoid window synchronization • Protection from misbehaving sources • Randomly mark packets • Else mark packet in a manner proportional • Avoid bias against bursty traffic to queue length • Some protection against ill-behaved users • Notify sources of incipient congestion

51 52

RED Operation RED Algorithm

Max thresh Min thresh • Maintain running average of queue length • Byte mode vs. packet mode – why? • For each packet arrival • Calculate average queue size (avg) Average Queue Length P(drop) • If minth ≤ avgq < maxth

• Calculate probability Pa 1.0 • With probability Pa • Mark the arriving packet

maxP • Else if maxth ≤ avg • Mark the arriving packet minth maxth Avg queue length

53 54

13 Queue Estimation Thresholds

• Standard EWMA: avgq = (1-wq) avgq + wqqlen •minth determined by the utilization • Special fix for idle periods – why? requirement

• Upper bound on wq depends on minth • Tradeoff between queuing delay and utilization • Want to ignore transient congestion • Relationship between maxth and minth • Can calculate the queue average if a burst arrives • Want to ensure that feedback has enough time •Set wq such that certain burst size does not exceed minth to make difference in load • Lower bound on wq to detect congestion relatively • Depends on average queue increase in one quickly RTT

•Typical wq = 0.002 • Paper suggest ratio of 2 • Current rule of thumb is factor of 3

55 56

Packet Marking Coming Up

•maxp is reflective of typical loss rates • Friday lecture: Fair Queuing • Paper uses 0.02 • Read WFQ paper • 0.1 is more realistic value • First two sections of XCP paper • If network needs marking of 20-30% then • Next week: taking a look at routers need to buy a better link! • Will fix readings • Gentle variant of RED (recommended)

• Vary drop rate from maxp to 1 as the avgq varies from maxth to 2* maxth

• More robust to setting of maxth and maxp

57 58

14