TCP Header

0 4 10 16 31 Source port Destination port Sequence number Acknowledgement CS 268: Lecture 6 HdrLen Flags Advertised window (TCP Congestion Control) Checksum Urgent pointer Options (variable)

¡ Sequence number, acknowledgement, and advertised window – used by Ion Stoica sliding-window based flow control February 6, 2006 ¡ Flags: - SYN, FIN – establishing/terminating a TCP connection - ACK – set when Acknowledgement field is valid - URG – urgent data; Urgent Pointer says where non-urgent data starts - PUSH – don’t wait to fill segment - RESET – abort connection

4

Today’s Lecture TCP Header (Cont)

¢ Checksum – 1’s complement and is computed over

- TCP header

Basics of Transport - TCP data - Pseudo-header (from IP header) • Note: breaks the layering! Basics of Congestion Control Source address Destination address Comments on Congestion Control 0 Protocol (TCP) TCP Segment length

2 5

Duties of Transport TCP Connection Establishment

Demultiplexing: Three-way handshake - IP header points to protocol - Goal: agree on a set of parameters: the start sequence - Transport header needs demultiplex further number for each side • UDP: port Client (initiator) Server • TCP: source and destination address/port SYN, Se - Well known ports and ephemeral ports qNum = x

x + 1 and Ack = um = y CK, SeqN Data reliability (if desired): SYN and A - UDP: checksum, but no data recovery ACK, A - TCP: checksum and data recovery ck = y + 1

3 6

1

TCP Issues Observations

Connection confusion: Throughput is ~ (w/RTT) - ISNs can’t always be the same

Sender has to buffer all unacknowledged packets,

Source spoofing: because they may require retransmission - Need to make sure ISNs are random

Receiver may be able to accept out-of-order packets, SYN floods: but only up to its buffer limits - SYN cookies

State management with many connections - Server-stateless TCP (NSDI 05)

7 10

TCP Flow Control What Should the Receiver ACK?

Make sure receiving end can handle data 1. ACK every packet, giving its sequence number

Negotiated end-to-end, with no regard to network 2. Use negative ACKs (NACKs), indicating which packet did not arrive

Ends must ensure that no more than W packets are in flight 3. Use cumulative ACK, where an ACK for number - Receiver ACKs packets n implies ACKS for all k < n - When sender gets an ACK, it knows packet has arrived 4. Use selective ACKs (SACKs), indicating those that did arrive, even if not in order

8 11

Sliding Window Error Recovery

Must retransmit packets that were dropped

¡ ¢

To do this efficiently £ - Keep transmitting whenever possible

¤ - Detect dropped packets and retransmit quickly

¥

¦ Requires:

¤ - Timeouts (with good timers)

- Other hints that packet were dropped

¥

¦

§©¨      © "! ¨ # $%§&¨ ' )(* +&  ,-  .   © "! ¨ # $ 9 12

2 Timer Dangers of Increasing Load

packet

knee cliff

t loss Use exponential averaging: Knee – point after which u p

- Throughput increases very slow h

¢¡¤£¦¥¨§ © ¡¤£¦¥ ¡© ¥ ¡¤£¦¥ g u

- Delay increases fast o congestion

¡¤£¦¥¨§ © ¡¤£¤¥¨ ¡© ¥¤ ¦¡ ¡¤£¦¥ ¡¤£¦¥¥ r

h collapse

 "!$#% &¡¤£¦¥¨§' ¢¡¤£¦¥¨)(*+¡¤£¦¥ T

Cliff – point after which - Throughput starts to decrease very fast to zero (congestion Load

Question: Why not set timeout to average delay? y

collapse) a l - Delay approaches infinity e

D

,  -

 ¨ ( 43 5 $ +5)6 798  ( (  !  5 ¨6© ( ¨¢5 :  & 5

.0/21 In an M/M/1 queue

 =6 43  :   ¨08*  (  :   © ?>

;)/2< - Delay = 1/(1 – utilization)

@)/2A   3  :.  B8  (C5 & #-¨ +ED  ¨5   5 (  +   ,  

13 Load 16

Hints Cong. Control vs. Cong. Avoidance

When should I suspect a packet was dropped? Congestion control goal

- Stay left of cliff

When I receive several duplicate ACKs Congestion avoidance goal - Receiver sends an ACK whenever a packet arrives - Stay left of knee knee cliff - ACK indicates seq. no. of last received consecutively t u

received packet p h

- Duplicate ACKs indicates missing packet g u congestion o r

h collapse T

Load

14 17

TCP Congestion Control Control System Model [CJ89]

User 1 Can the network handle the rate of data? x1

x2 Determined end-to-end, but TCP is making User 2 Σ Σx >X guesses about the state of the network i goal

xn User n y Two papers: - Good science vs great engineering

¢ Simple, yet powerful model

¢ Explicit binary signal of congestion

15 18

3 Multiplicative Increase, Possible Choices Multiplicative Decrease

¢ fairness ¥ ¤ a + b x (t) increase line

+ = I I i

£ ¡ x (t 1) Converges to (x ,x ) i + 1h 2h aD bD xi (t)decrease stable cycle, (bIbDx1h,

¢ but is not fair b b x ) Multiplicative increase, additive decrease I D 2h

- aI=0, bI>1, aD<0, bD=1 2 x

¢ :

Additive increase, additive decrease 2

r

- a >0, b =1, a <0, b =1 e (bdx1h,bdx2h)

I I D D s U ¢ Multiplicative increase, multiplicative decrease

- aI=0, bI>1, aD=0, 0

¢ Additive increase, multiplicative decrease - a >0, b =1, a =0, 0

User 1: x1 19 22

Multiplicative Increase, Additive Increase, Additive Decrease Multiplicative Decrease

fairness fairness

line line

Fixed point at (bI(x1h+aD), bI(x2h+aD)) Converges to (x1h,x2h) stable and (bDx1h+aI, b x +a ) bI aD fair cycle D 2h I x = x = (x1h,x2h) 1h 2h 1− b

I 2 2 x x

: : 2 2

r r

e (x1h+aD,x2h+aD) e (bDx1h,bDx2h) Fixed point is s s unstable! U U

efficiency efficiency line line

User 1: x1 User 1: x1 20 23

Additive Increase, Additive Decrease Modeling

fairness

(x1h+aD+aI), line

Reaches x2h+aD+aI)) Critical to understanding complex systems stable cycle, - [CJ89] model relevant after 15 years, 106 increase of but does not , 1000x increase in number of users (x1h,x2h) converge to 2

x

fairness : Criteria for good models 2

r

e (x1h+aD,x2h+aD) - Two conflicting goals: reality and simplicity s

U - Realistic, complex model → too hard to understand, too limited in applicability - Unrealistic, simple model → can be misleading

efficiency line

User 1: x1 21 24

4

TCP Congestion Control Slow Start Example ¢ [CJ89] provides theoretical basis for basic The congestion segment 1 window size grows cwnd = 1 congestion avoidance mechanism very rapidly ACK 2 cwnd = 2 segment 2 segment 3 Must turn this into real protocol ACK 4 cwnd = 4 ¢ TCP slows down segment 4 segment 5 the increase of segment 6 cwnd when segment 7 cwnd >= ssthresh ACK8 cwnd = 8

25 28

TCP Congestion Control Congestion Avoidance

Maintains three variables: ¢ Slow down “Slow Start” - cwnd: congestion window

- flow_win: flow window; receiver advertised window ¢ ssthresh is lower-bound guess about location of knee - Ssthresh: threshold size (used to update cwnd)

- ¢ If cwnd > ssthresh then each time a segment is acknowledged For sending, use: win = min(flow_win, cwnd) increment cwnd by 1/cwnd (cwnd += 1/cwnd).

¢ So cwnd is increased by one only if all segments have been acknowledged.

26 29

Slow Start/Congestion Avoidance TCP: Slow Start Example

cwnd = 1

Goal: reach knee quickly Assume that cwnd = 2 ssthresh = 8 cwnd = 4 Upon starting (or restarting): 14 12 - Set cwnd =1 10 cwnd = 8 ) s

- Each time a segment is acknowledged increment t n 8 cwnd by one (cwnd++). e m ssthresh g 6 e s

n i

( 4

Slow Start is not actually slow d n 2 cwnd = 9 w - cwnd increases exponentially C 0

0 2 4 6 t= t= t= t=

Roundtrip times cwnd = 10 27 30

5 Putting Everything Together: TCP Pseudocode Fast Recovery

Initially: cwnd = 1; while (next < unack + win) After a fast-retransmit set cwnd to ssthresh/2 ssthresh = infinite; transmit next packet; - i.e., don’t reset cwnd to 1 New ack received: if (cwnd < ssthresh) /* Slow Start*/ where win = min(cwnd, But when RTO expires still do cwnd = 1 cwnd = cwnd + 1; flow_win); else /* Congestion Avoidance */ Fast Retransmit and Fast Recovery cwnd = cwnd + 1/cwnd; seq # unack next - Implemented by TCP Reno Timeout: - Most widely used version of TCP today /* Multiplicative decrease */ ssthresh = cwnd/2; win cwnd = 1; Lesson: avoid RTOs at all costs!

31 34

The big picture Fast Retransmit and Fast Recovery

cwnd

cwnd Congestion Avoidance Slow Start Timeout

Congestion Avoidance Time Slow Start Retransmit after 3 duplicated acks - prevent expensive timeouts

No need to slow start again

Time At steady state, cwnd oscillates around the optimal window size.

32 35

Fast Retransmit Engineering vs Science in CC

¢ Don’t wait for window to cwnd = 1 segment 1 Great engineering built useful protocol: drain ACK 2 - TCP Reno, etc. cwnd = 2 segment 2

¢ Resend a segment after 3 segment 3 duplicate ACKs ACK 3 Good science by CJ and others ACK 4 cwnd = 4 segment 4 - Basis for understanding why it works so well segment 5 segment 6 ACK 4 segment 7 ACK 4

3 duplicate ACK 4 ACKs

33 36

6

Behavior of TCP TCP: Cooperation and Compatibility

Are packets smoothly paced? TCP assumes all flows employ TCP-like - NO! Ack-compression congestion control - TCP-friendly or TCP-compatible

Are long-lived flows nicely interleaved?

- NO! Selfish flows: can get all the bandwidth they like

How does throughput depend on drop rate? If new congestion control are developed, they must be TCP-friendly Tput ~ 1/sqrt(d)

37 40

Extensions to TCP

Selective acknowledgements: TCP SACK

Explicit congestion notification: ECN

Delay-based congestion avoidance: TCP Vegas

Discriminating between congestion losses and other losses: cross-layer signaling and guesses

Randomized drops (RED) and other router mechanisms 38

Issues with TCP

Fairness: - Throughput depends on RTT

High speeds: - to reach 10gbps, packet losses occur every 90 minutes!

Short flows: - How to set initial cwnd properly

What about flows that want congestion control, but don’t want reliable delivery?

39

7