TCP Header
0 4 10 16 31 Source port Destination port Sequence number Acknowledgement CS 268: Lecture 6 HdrLen Flags Advertised window (TCP Congestion Control) Checksum Urgent pointer Options (variable)
¡ Sequence number, acknowledgement, and advertised window – used by Ion Stoica sliding-window based flow control February 6, 2006 ¡ Flags: - SYN, FIN – establishing/terminating a TCP connection - ACK – set when Acknowledgement field is valid - URG – urgent data; Urgent Pointer says where non-urgent data starts - PUSH – don’t wait to fill segment - RESET – abort connection
4
Today’s Lecture TCP Header (Cont)
¢ Checksum – 1’s complement and is computed over
- TCP header
Basics of Transport - TCP data - Pseudo-header (from IP header) • Note: breaks the layering! Basics of Congestion Control Source address Destination address Comments on Congestion Control 0 Protocol (TCP) TCP Segment length
2 5
Duties of Transport TCP Connection Establishment
Demultiplexing: Three-way handshake - IP header points to protocol - Goal: agree on a set of parameters: the start sequence - Transport header needs demultiplex further number for each side • UDP: port Client (initiator) Server • TCP: source and destination address/port SYN, Se - Well known ports and ephemeral ports qNum = x
x + 1 and Ack = um = y CK, SeqN Data reliability (if desired): SYN and A - UDP: checksum, but no data recovery ACK, A - TCP: checksum and data recovery ck = y + 1
3 6
1
TCP Issues Observations
Connection confusion: Throughput is ~ (w/RTT) - ISNs can’t always be the same
Sender has to buffer all unacknowledged packets,
Source spoofing: because they may require retransmission - Need to make sure ISNs are random
Receiver may be able to accept out-of-order packets, SYN floods: but only up to its buffer limits - SYN cookies
State management with many connections - Server-stateless TCP (NSDI 05)
7 10
TCP Flow Control What Should the Receiver ACK?
Make sure receiving end can handle data 1. ACK every packet, giving its sequence number
Negotiated end-to-end, with no regard to network 2. Use negative ACKs (NACKs), indicating which packet did not arrive
Ends must ensure that no more than W packets are in flight 3. Use cumulative ACK, where an ACK for number - Receiver ACKs packets n implies ACKS for all k < n - When sender gets an ACK, it knows packet has arrived 4. Use selective ACKs (SACKs), indicating those that did arrive, even if not in order
8 11
Sliding Window Error Recovery
Must retransmit packets that were dropped
¡ ¢
To do this efficiently £ - Keep transmitting whenever possible
¤ - Detect dropped packets and retransmit quickly
¥
¦ Requires:
¤ - Timeouts (with good timers)
- Other hints that packet were dropped
¥
¦
§©¨ © "! ¨ # $%§&¨ ' )(* +& ,- . © "! ¨ # $ 9 12
2 Timer Algorithm Dangers of Increasing Load
packet
knee cliff
t loss Use exponential averaging: Knee – point after which u p
- Throughput increases very slow h
¢¡¤£¦¥¨§ © ¡¤£¦¥ ¡© ¥ ¡¤£¦¥ g u
- Delay increases fast o congestion
¡¤£¦¥¨§ © ¡¤£¤¥¨ ¡© ¥¤ ¦¡ ¡¤£¦¥ ¡¤£¦¥¥ r
h collapse
"!$#% &¡¤£¦¥¨§' ¢¡¤£¦¥¨)(*+¡¤£¦¥ T
Cliff – point after which - Throughput starts to decrease very fast to zero (congestion Load
Question: Why not set timeout to average delay? y
collapse) a l - Delay approaches infinity e
D
, -
¨ ( 43 5 $ +5)6 798 ( ( ! 5 ¨6© ( ¨¢5 : & 5
.0/21 In an M/M/1 queue
=6 43 : ¨08* ( : © ?>
;)/2< - Delay = 1/(1 – utilization)
@)/2A 3 :. B8 (C5 & #-¨ +ED ¨5 5 ( + ,
13 Load 16
Hints Cong. Control vs. Cong. Avoidance
When should I suspect a packet was dropped? Congestion control goal
- Stay left of cliff
When I receive several duplicate ACKs Congestion avoidance goal - Receiver sends an ACK whenever a packet arrives - Stay left of knee knee cliff - ACK indicates seq. no. of last received consecutively t u
received packet p h
- Duplicate ACKs indicates missing packet g u congestion o r
h collapse T
Load
14 17
TCP Congestion Control Control System Model [CJ89]
User 1 Can the network handle the rate of data? x1
x2 Determined end-to-end, but TCP is making User 2 Σ Σx >X guesses about the state of the network i goal
xn User n y Two papers: - Good science vs great engineering
¢ Simple, yet powerful model
¢ Explicit binary signal of congestion
15 18
3 Multiplicative Increase, Possible Choices Multiplicative Decrease
¢ fairness ¥ ¤ a + b x (t) increase line
+ = I I i
£ ¡ x (t 1) Converges to (x ,x ) i + 1h 2h aD bD xi (t)decrease stable cycle, (bIbDx1h,
¢ but is not fair b b x ) Multiplicative increase, additive decrease I D 2h
- aI=0, bI>1, aD<0, bD=1 2 x
¢ :
Additive increase, additive decrease 2
r
- a >0, b =1, a <0, b =1 e (bdx1h,bdx2h)
I I D D s U ¢ Multiplicative increase, multiplicative decrease
- aI=0, bI>1, aD=0, 0 ¢ Additive increase, multiplicative decrease - a >0, b =1, a =0, 0 User 1: x1 19 22 Multiplicative Increase, Additive Increase, Additive Decrease Multiplicative Decrease fairness fairness line line Fixed point at (bI(x1h+aD), bI(x2h+aD)) Converges to (x1h,x2h) stable and (bDx1h+aI, b x +a ) bI aD fair cycle D 2h I x = x = (x1h,x2h) 1h 2h 1− b I 2 2 x x : : 2 2 r r e (x1h+aD,x2h+aD) e (bDx1h,bDx2h) Fixed point is s s unstable! U U efficiency efficiency line line User 1: x1 User 1: x1 20 23 Additive Increase, Additive Decrease Modeling fairness (x1h+aD+aI), line Reaches x2h+aD+aI)) Critical to understanding complex systems stable cycle, - [CJ89] model relevant after 15 years, 106 increase of but does not bandwidth, 1000x increase in number of users (x1h,x2h) converge to 2 x fairness : Criteria for good models 2 r e (x1h+aD,x2h+aD) - Two conflicting goals: reality and simplicity s U - Realistic, complex model → too hard to understand, too limited in applicability - Unrealistic, simple model → can be misleading efficiency line User 1: x1 21 24 4 TCP Congestion Control Slow Start Example ¢ [CJ89] provides theoretical basis for basic The congestion segment 1 window size grows cwnd = 1 congestion avoidance mechanism very rapidly ACK 2 cwnd = 2 segment 2 segment 3 Must turn this into real protocol ACK 4 cwnd = 4 ¢ TCP slows down segment 4 segment 5 the increase of segment 6 cwnd when segment 7 cwnd >= ssthresh ACK8 cwnd = 8 25 28 TCP Congestion Control Congestion Avoidance Maintains three variables: ¢ Slow down “Slow Start” - cwnd: congestion window - flow_win: flow window; receiver advertised window ¢ ssthresh is lower-bound guess about location of knee - Ssthresh: threshold size (used to update cwnd) - ¢ If cwnd > ssthresh then each time a segment is acknowledged For sending, use: win = min(flow_win, cwnd) increment cwnd by 1/cwnd (cwnd += 1/cwnd). ¢ So cwnd is increased by one only if all segments have been acknowledged. 26 29 Slow Start/Congestion Avoidance TCP: Slow Start Example cwnd = 1 Goal: reach knee quickly Assume that cwnd = 2 ssthresh = 8 cwnd = 4 Upon starting (or restarting): 14 12 - Set cwnd =1 10 cwnd = 8 ) s - Each time a segment is acknowledged increment t n 8 cwnd by one (cwnd++). e m ssthresh g 6 e s n i ( 4 Slow Start is not actually slow d n 2 cwnd = 9 w - cwnd increases exponentially C 0 0 2 4 6 t= t= t= t= Roundtrip times cwnd = 10 27 30 5 Putting Everything Together: TCP Pseudocode Fast Recovery Initially: cwnd = 1; while (next < unack + win) After a fast-retransmit set cwnd to ssthresh/2 ssthresh = infinite; transmit next packet; - i.e., don’t reset cwnd to 1 New ack received: if (cwnd < ssthresh) /* Slow Start*/ where win = min(cwnd, But when RTO expires still do cwnd = 1 cwnd = cwnd + 1; flow_win); else /* Congestion Avoidance */ Fast Retransmit and Fast Recovery cwnd = cwnd + 1/cwnd; seq # unack next - Implemented by TCP Reno Timeout: - Most widely used version of TCP today /* Multiplicative decrease */ ssthresh = cwnd/2; win cwnd = 1; Lesson: avoid RTOs at all costs! 31 34 The big picture Fast Retransmit and Fast Recovery cwnd cwnd Congestion Avoidance Slow Start Timeout Congestion Avoidance Time Slow Start Retransmit after 3 duplicated acks - prevent expensive timeouts No need to slow start again Time At steady state, cwnd oscillates around the optimal window size. 32 35 Fast Retransmit Engineering vs Science in CC ¢ Don’t wait for window to cwnd = 1 segment 1 Great engineering built useful protocol: drain ACK 2 - TCP Reno, etc. cwnd = 2 segment 2 ¢ Resend a segment after 3 segment 3 duplicate ACKs ACK 3 Good science by CJ and others ACK 4 cwnd = 4 segment 4 - Basis for understanding why it works so well segment 5 segment 6 ACK 4 segment 7 ACK 4 3 duplicate ACK 4 ACKs 33 36 6 Behavior of TCP TCP: Cooperation and Compatibility Are packets smoothly paced? TCP assumes all flows employ TCP-like - NO! Ack-compression congestion control - TCP-friendly or TCP-compatible Are long-lived flows nicely interleaved? - NO! Selfish flows: can get all the bandwidth they like How does throughput depend on drop rate? If new congestion control algorithms are developed, they must be TCP-friendly Tput ~ 1/sqrt(d) 37 40 Extensions to TCP Selective acknowledgements: TCP SACK Explicit congestion notification: ECN Delay-based congestion avoidance: TCP Vegas Discriminating between congestion losses and other losses: cross-layer signaling and guesses Randomized drops (RED) and other router mechanisms 38 Issues with TCP Fairness: - Throughput depends on RTT High speeds: - to reach 10gbps, packet losses occur every 90 minutes! Short flows: - How to set initial cwnd properly What about flows that want congestion control, but don’t want reliable delivery? 39 7