Internet Transport Protocols UDP and TCP
Dr. T. Znati Computer Science Department
Outline
Transport Layer Review UDP Protocol UDP Characteristics UDP Functionalities TCP Protocol TCP Characteristics Connection Management TCP Flow and Congestion Control
1 Design Issues TRANSPORT LAYER
Transport Layer Services and Protocols Transport layer provides a logical connection between application processes running on different hosts Transport Layer Services Connection Management If connection-oriented Multiplexing and De-Multiplexing Data Segmentation and Reassembly Error, Flow and Congestion Control
2 Transport Layer Concepts
Host Host Application Application
Logical Connection Transport Transport
IP IP Network Network Access Access Physical Physical Router IP Internet NA NA Internet PHY PHY
Internet Transport Layer Protocols
The IP suite offers two transport protocols User Datagram Protocol (UDP) Connectionless protocol “Best Effort” Service Unreliable Unordered datagram delivery No error or flow control Transmission Control Protocol Connection-oriented protocol Reliable, ordered delivery of byte stream Error, flow and congestion control No delay guarantees and no bandwidth guarantees
3 User Datagram Protocol UDP
UDP Characteristics UDP is a connectionless datagram service. No need to establish a connection prior to data transfer Datagrams may be generated and transmitted at any time. UDP datagrams are self-contained. UDP is unreliable: No acknowledgements for reliable delivery of data. Checksums cover the header, and only optionally cover the data. Contains no mechanism to detect missing or out of sequence datagrams. No mechanism for automatic retransmission. No mechanism for flow control Sender can over-run the receiver.
4 UDP Service
UDP provides unreliable connectionless delivery service using IP to transport datagrams UDP does not enhance the “best effort” service provided by IP UDP provides “ports” to distinguish among multiple destinations within a host Ports are used to multiplexing and demultiplex applications’ traffic
UDP Operation
A1 A2 B1 B2
App App App App
Socket
OS
UDP
IP UDP uses port number to demultiplex packets
5 User Datagram Protocol
UDP Source Port UDP Destination Port UDP Message Length UDP Checksum
Data
Physical Physical IP Header UDP Header Data Header Trailer
IP Datagram Physical Data Frame
UDP Checksum
UDP source port is optional (set to 0 if not used) UDP checksum is optional (set to 0 if not used) Using UDP checksum option is useful, however, since IP checksum does not cover the data portion of the datagram It provides the only way to ensure that data has arrived intact and should be used Also, the only way to verify the UDP header
6 UDP Checksum
The UDP checksum covers more information than is present in the UDP datagram A pseudo-header is prepended to the UDP datagram, and a checksum over the entire object is computed The pseudo-header contains source and destination IP addresses, IP protocol type (code 17 for UDP), and UDP datagram length (the pseudo-header not included) It guarantees that the datagram has reached the proper destination The checksum is computed as a one’s complement sum (sum modulo 216-1) of all 16-bit words of the header and the pseudo-header (checksum field is set to 0) and taking one’s complement of the result
Appropriate Uses of UDP
Inward data collection Outward data dissemination Request-response Real-time applications Streaming of real-time audio and video. On-time delivery is important Minimum overhead
7 Transmission Control Protocol TCP
Transmission Control Protocol
TCP provides a reliable virtual circuit service TCP guarantees ordered delivery of a stream of data without loss or duplicatio, despite unreliable network packet delivery service TCP assumes little about the underlying communication system TCP can be used with a variety of packet delivery services Dial-up telephone lines Local and wide area networks Low and high-speed long haul networks
8 TCP Interface Characteristics
Virtual Circuit connection TCP is full-duplex TCP is stream-oriented protocol Data is viewed as a stream of bytes TCP provides an unstructured stream TCP does not mark boundaries between streams Application’s responsibility to understand stream contents Buffered transfer TCP collects enough data to fill a reasonably large datagram before transmission TCP provides a push mechanism to force transfer, if needed
TCP supports a “stream of bytes” service Sender
Receiver
9 Stream Service is emulated using TCP “Segments” Sender
Segment sent when: TCP Data Segment is full MSS bytes, Segment not full, but times out, or “Pushed” by application.
TCP Data MSS = Maximum Segment Size Receiver
TCP MSS
TCP segments are the messages that carry data between TCP sender and receiver TCP must decide how many bytes to put into each message that it sends. The current size of a TCP segment, CSS, is determined by two factors W – The size of the receiver’s window (bytes) Maximum Segment Size (MSS) – A ceiling on TCP segment size, never to be exceeded CSS = minimum(MSS, W)
10 TCP Interface Characteristics
TCP specifications describe generally how applications use TCP, but do not dictate details of an interface There have been numerous implementations of TCP TCP Reno, TCP Tahoe, TCP Las Vegas, SACK TCP, ….
Connection Establishment – Three-way
Handshake Passive Server Client Listening for Connection Send data unit Requests with SYN bit set SYN x and seq# = x Receive data unit
Send data unit with SYN bit set ACK x+1/SYN y and seq# = y, Receive data unit acknowledge x+1
Send data unit acknowledge y+1 ACK y+1 Receive data unit
11 Connection Termination
TCP connections are terminated with “graceful close” Graceful close ensures that the connection is terminated after all data had been received One side issues Close request, and is not permitted to transmit data after sending it The other side acknowledges it, but can send data until it sends Close request too After the second Close request is acknowledged, the connection is closed
Connection Termination
No data transmission FIN x from this side Last segment sent by receiver • Sequence number : y ACK x+1/ Data y • Length: k
ACK y+k+1
FIN y+k
ACK y+k+1
12 TCP Segment Format
0 15 31 Source Port Destination Port Sequence Number Acknowledgement Number Data U A P R S F Reserved R C S S Y I Window Offset G K H T N N Checksum Urgent pointer Options Padding
Data
TCP Segment Fields
Source and Destination Port Number 16-bit – end-point identifiers Sequence Number 32-bit – number of the first data octet in this segment except for when the SYN flag is set When the SYN flag is set, sequence number identifies “the Initial Sequence Number” and the first data octet is ISN+1 Acknowledgement Number 32-bit – number of the next octet expected to receive Data Offset 4-bit – number of 32-bit words in the header (including options) Header Length
13 TCP Flags URG: segment contains urgent data: urgent pointer points to the last octet of urgent data ACK: acknowledgement field is significant PSH: segment is sent with the push function RST: reset the connection Closes a half-open connection SYN: synchronize sequence numbers Used during open request FIN: no more data from sender Used during close request
TCP Segment Fields
Window – 16-bit number of unacknowledged octets that the sender is allowed to transmit Maximum value limits the value of the window size to 216 , unless “window scale” option is used Checksum – 16-bit one’s complement of a one’s complement sum of all 16-bit words of the segment and a pseudo-header (the checksum field is set to zero) The pseudo-header contains the sender’s and receiver’s IP addresses, the protocol type (code 6 for TCP), and the segment length field) A zero padding is added to the segment to make the segment multiple of 16 bits The pseudo-header and the padding are not transmitted
14 Pseudo-header
Source address (32-bits) Destination address (32-bits) 00000000 Protocol TCP segment length 00000110
Out-of-Band Data
Out-of-band data is needed to handle abort or program interrupt signals These “signals” should not wait for the octets already in the TCP stream to be transmitted Telnet uses this feature to send “interrupt” commands TCP uses URGENT feature to accomodate out-of-band data
15 Out-of-Band Data
Out-of-Band Out-of-Band Data Data
User Network User Process Process Send Receive Buffer Buffer
Application is required to check on the “Out-of-Band” data stream before processing regular data stream
Out-of-Band Data
TCP notifies associated application of the beginning and end of urgent mode How TCP informs the application depends on the operation system Urgent data is detected by the URG flag set and retrieved by the urgent pointer Unfortunately, TCP does not denote the beginning of the urgent data in the segment It’s up to the application to decide, where the urgent data starts
16 TCP Options
Originally, one option, Maximum Segment Size was defined The 16-bit option may be used only in the initial connection request segment If this option is not used, any segment size is allowed Later, two other options have gained widespread acceptance Window scale factor Timestamps
TCP Options
End of options
Kind = 0 (8 bits)
No operation : padding
Kind = 1 (8 bits)
17 TCP Options
Maximum Segment Size
Kind = 2 Length = 4 Maximum Segment Size (8 bits) (8 bits) (16 bits)
Window Scale Factor
Kind = 3 Length = 3 Shift Count (8 bits) (8 bits) (8 bits)
Window Scale Option – Purpose
18 Window Scale Option
The option increases the TCP receive window above its maximum value of 65,535 bytes A much better alternative than changing the TCP header to accomodate the need for larger windows When window scale option is used, the value of the field is multiplied by 2F, where F is between 1 and 14, is a scale option specified in the initial connection request segment A value F = 14 results in a maximum window size of over 109 bytes (65535 214) Internally, TCP maintains a “real” window size of 32 bits The option can only appear in a SYN segment Thus, scale factor is agreed upon and fixed in each direction at the connection setup
TCP Options Timestamp – Used for 2 distinct mechanisms RTT Measurement (RTTM) – To track the RTT for data in order to identify changes in latency that may require ack timer adjustment Protect Against Wrapped Sequences (PAWS) – Used in high speed networks, to detect delayed packets when they occur within the current transmission windows
Kind = 8 Length = 10 Timestamp Value (MSB) (8 bits) (8 bits) (16 bits)
Timestamp Value (LSB) Timestamp Echo Reply (16 bits) (MSB) (16 bits)
Timestamp Echo Reply (LSB) (16 bits) This option can be used throughout the TCP connection
19 TCP FLOW CONTROL
TCP Data Flow Control
TCP uses a variable size window protocol to regulate the flow control Its unique feature is that it decouples acknowledgements of received data units from the granting of a permission to send additional data Each acknowledgement contains a window advertisement Increased window advertisement allows sender to proceed with sending octets not acknowledged yet Decreased window advertisement causes the sender to stop at the boundary of the window
20 TCP Sliding Window
Window Size
Data Outstanding Data OK Data not OK AcK’d Un-Ack’d data to send to send yet
Window is meaningful to the sender Current window size is “advertised” by receiver Usually 4k – 8k Bytes when connection set-up
Acknowledgements and Retransmission
TCP uses a cumulative acknowledgement scheme ACK reports on the number of octets so far accumulated and specifies the next octet expected Advantages ACKs are easy to generate unambiguously Lost ACKs do not necessarily force a retransmission Disadvantages Receiver does not report on the segments successfully received, if one intermediate segment is lost Estimating timeout becomes challenging
21 TCP Window Mechanisms
TCP uses a credit allocation scheme The sender includes the sequence number of the first octet in the segment data field The receiver acknowledges the incoming segment with a message of the form (ACK = i, WIN = j ) All octets up to i-1 are acknowledged Permission is granted to send an additional window of j octets starting at i through i+j-1
TCP Credit Allocation Scheme
The credit allocation scheme is flexible Assume last message issued by receiver carries ACK = i and WIN = j To increase the credit by an amount k > j, when no additional data have arrived, the receiver issues ACK = i and WIN = k To acknowledge an incoming segment containing m octets of data (m < j) without granting additional credits, the receiver issues ACK = i+m and WIN = j-m The receiver can issue cumulative acks, and is not required to issue an ack immediately after receiving a segment
22 TCP Window Control
Sender Receiver 1001 2400 1001 2400
SN 1001 1601 2400 LEN 600 1601 2400
ACK 1601 1601 2400 WIN 800 1601 4000
ACK 1601 1601 4000 WIN 2400 1601 4000
Effect of Window Size
D W
23 TCP Actual Throughput
TCP performance can be affected by several complicating factors: In most cases, a number of TCP connections are multiplexed over the same network interface, so each connection is only allocated a fraction of bandwidth This reduces R, therefore S as well TCP connections usually involve hopping across multiple networks Router delay becomes the biggest contributor to D Actual data rate may be reduced below R if a bottleneck router exists along the path If any segment is lost and must be retransmitted, throughput is reduced
TCP Sliding Window Fundamental Questions
How much data can a TCP sender have outstanding in the network? How much data should TCP retransmit when an error occurs? Just selectively repeat the missing data? How does the TCP sender avoid over- running the receiver’s buffers? TCP uses Flow and Congestion Control to deal with these issues
24 Transmission Control Protocol FLOW CONTROL
TCP Flow Control
Original window size is negotiated during connection establishment Receiver throttles sender by advertising a window size no larger than the amount of data it can buffer. To avoid buffer overflow at the receiver side, the following invariant is always true:
LastByteRcvd - LastByteRead <= MaxRcvBuffer
25 TCP Flow Control
TCP receiver advertises the following window:
AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - LastByteRead)
It is the amount of free space available in the receiver’s buffer.
TCP Flow Control Window
The TCP sender must adhere to AdvertisedWindow from the receiver such that:
LastByteSent – LastByteAcked <= AdvertisedWindow
This defines an Effective Window:
EffectiveWindow = AdvertisedWindow – (LastByteSent - LastByteAcked)
26 TCP Flow Control
Sender Flow Control Rules: The EffectiveWindow must be > 0 for sender to send more data. LastByteWritten – LastByteAcked <= MaxSendBuffer When equality is reached, the send buffer is full!! TCP sender must block the sending application.
Transmission Control Protocol CONGESTION CONTROL
27 TCP Congestion Control
TCP sliding window provides an adequate mechanism to control the data flow between the sender and the receiver TCP sender still needs to perform congestion control to ensure that the network can handle the window agreed upon by the sender and the receiver TCP relies on ACKs to control congestion If the ACK is not received on time, TCP retransmits the segment An absence of an ACK signals congestion in the network
TCP Window Control
TCP does not rely on an explicit negative acknowledgement to detect errors TCP relies exclusively on positive ACKs and retransmissions when an ACK does not arrive within a given period of time Retransmission TimeOut (RTO) should be on the order of RTT (Round-Trip Time) RTT and its statistics, however, vary considerably as the Internet load changes Thus timers play a major role in TCP congestion
28 TCP Timeout Timers
Two strategies can be used to set timers’ value: Static timer values Adaptive timer values
Round-trip time (RTT) Retransmission TimeOut (RTO)
Guard Band Sender Estimated RTT
Data Data ACK ACK
Receiver
Timer Values
The timeout timer should be set to a value somewhat longer than the RTT Two strategies can be used to set the timer value: Static timer value Adaptive timer value
29 Static Value Timers
A fixed value timer, possibly based on the Internet typical behavior, is used for RTO value The strategy suffers from the inability to respond to Internet changing conditions A high value leads to sluggish service and unnecessary long time of response to a lost packet A low value leads to a positive feedback condition that causes more retransmissions (some unnecessary) in a case of congestion
Adaptive Timers – Smoothed Averages
An estimated value of RTT, SRTT(), is computed as a smoothed average SRTTk 1 SRTTk 1RTTk 1 The smaller the value of , the greater weight of more recent observations With small values of , the average quickly reflects a rapid change in the observed quantity A disadvantage, a brief surge in the observed value followed by a return to the average causes strong fluctuations = 0.9 is recommended
30 Adaptive Timers – Timeout Value
Given an SRTT value, what should be the RTO value? Use a constant value: RTOk 1 SRTTk 1
is not proportional to SRTT If SRTT(k+1) is large, may become insignificant, relatively As a consequence, fluctuations in the RTT cause unnecessary retransmissions If SRTT(k+1) value is small, may become relatively large As a consequence, the mechanism depends solely on value, not reacting to changing conditions
Adaptive Timers – Internet Original Scheme
31 TCP Congestion Control
The rate at which a TCP entity can send data is determined by the rate of incoming ACKs to previous segments (with granted credit) This rate is determined by the “bottleneck” in the round trip path between the source and the destination
TCP Congestion Control
Congestion control in the Internet is complex IP is a connectionless, stateless protocol No provision for determining congestion, let alone control it TCP provides only end-to-end flow control and relies on implicit feedback to deduce the presence of congestion ICMP quench messages are too crude to provide any effective control No cooperative, distributed algorithm exists between different TCP entities On the contrary, TCP entities may be selfish and do not cooperate to maintain some level of control
32 TCP Congestion Control
The only tool is the sliding-window flow control and error control mechanisms Not enough, in a dynamic environment A number of clever techniques have been developed and added to control congestion in the Internet Mechanisms for congestion detection Mechanisms for congestion avoidance Mechanisms for congestion recovery
TCP Flow and Congestion Control
TCP is self-clocking: returning of ACKs functions as pacing signal After an initial burst, the sender’s segment rate matches the arrival rate of ACKs Sender rate equals the rate of the slowest link on the path This way TCP senses the bottleneck and regulates its rate accordingly
33 TCP Flow and Congestion Control Network Bottleneck Pb is the minimum segment spacing on the slowest link Pr is the segment spacing at the receiver As is the ACK spacing at the sender Ab is the ACK spacing at the bottleneck Pr Pb
Data
Pb = Pr = Ab = As
ACK
As Ab
TCP Flow and Congestion Control
Problem caused by the fact that the sender does not know where is the bottleneck: the network or the receiver If ACKs arrive relatively slow due to network congestion, then the sender must transmit segments at a rate slower than the ACKs On the other hand, if slow pace is due to the receiver, this pace should dictate the transmission policy Thus, TCP sliding window must be enhanced to factor in the network congestion
34 Improving TCP Congestion Effectiveness Three techniques have been suggested for retransmission timer management RTT Variance Estimation Exponential Backoff Karn’s Algorithm Four techniques have been suggested for window management Slow Start Dynamic Window Sizing on Congestion Fast Retransmission Fast Recovery
RTT Variance Estimation Factors of Influence RTT exhibits relatively high variance due to three sources A low data rate on the TCP connection relative to the progation time makes the transmission delay relatively important in the RTT estimation Thus, high variation in datagram sizes induces high variation in RTT SRTT estimator can be heavily influenced by characteristics of the data, that have nothing to do with the network Abrupt changes in the Internet load and condition TCP receiver may use cumulative ACKs
35 RTO Scheme Revisited
TCP original scheme suggests to compute RTO as follows RTOk 1 SRTTk 1; 2
If RTT has low variance, it results in unnecessary high values of RTO In unstable environments, it may be inadequate to protect against retransmissions
TCP Timeout Estimates
There will be some (unknown) distribution Router queues grow when there is more of RTTs. traffic, until they become unstable. We are trying to estimate an RTO to As load grows, variance of delay grows
minimize the probability of a false timeout. rapidly. Probability
Variance grows rapidly
with load Average Queueing Average Delay
Variance
Mean RTT Load Amount of Traffic Arriving to the Router
36 RTT Variance Estimates
A more effective approach is to estimate RTT standard deviation However, too costly as it involves square and square root calculation A less expensive alternative – the mean deviation MDEVX EX EX
Dynamic Estimate of RTT Variability The algorithm to dynamically estimate the variability in estimating RTT uses an exponential smoothing technique SRTT k 1 SRTT k 1RTT k 1 SERRk 1 RTT k 1 SRTT k 1 SDEV k 1 ' SDEV k 1'SERRk 1 RTOk 1 SRTT k 1 SDEV k 1 7 3 ; ' ; 4 8 4
37 Round Trip Time ESTIMATION UNDER CONGESTION
TCP Congestion Control
RTT variance estimation is efficient in detecting data loss It may not be sufficient in a case of retransmission Two other factors must be taken into consideration when managing retransmission timers in a case of congestion: What RTO value should be used on a retransmitted segment (i.e., when timeout occurs)? Exponential RTO Backoff What RTT samples should be used as input to compute SRTTT( ) in a case of retransmission? Karn’s Algorithm
38 Exponential RTO Backoff
Timeouts are usually due to congestion Maintaining the same RTO value for retransmission is ill advised May cause sustained congestion due to the development of a similar pattern of behavior of all active TCP connections All sources wait for local RTO time and retransmit again A more efficient scheme must try to increase RTO values to give the Internet more time to clear the congestion Exponential RTO Backoff scheme RTO = q RTO Typically q = 2
Retransmission and Timeouts TCP Source cannot distinguish between these two cases
Host A Host B Host A Host B
Retransmission Retransmission Wrong RTT Sample Wrong RTT Sample
Feeding the resulting RTT into the RTO algorithm may lead to false measurements and oscillations
39 RTT Samples – Karn’s Algorithm
Karn’s algorithm uses the following rules Do NOT use the measured RTT for a retransmitted segment to update SRTT and SDEV Calculate the backoff RTO when a retransmission occurs RTO = q RTO ; (typically q = 2) Use the backoff RTO value for succeeding segment until an acknowledgement arrives for a segment that has not been retransmitted When the acknowledgement is received to an unretransmitted segment, use the normal algorithm to update SRTT and SDEV and compute future RTO values
TCP Congestion Control
Armed with an efficient RTO estimation mechanism, use lack of acknowledgements as congestion indication and maintain a congestion window to estimate congestion in the network CongestionWindow :: a variable maintained by the TCP source for each connection. TCP is modified such that the maximum number of bytes of unacknowledged data allowed is the minimum of CongestionWindow and AdvertisedWindow
40 Additive Increase, Multiplicative Decrease (AIMD)
MaxWindow :: min (CongestionWindow , AdvertisedWindow) EffectiveWindow = MaxWindow – (LastByteSent –LastByteAcked)
CongestionWindow (cwnd) is set based on the perceived level of congestion. The Host receives implicit (packet drop) or explicit (packet mark) indications of internal congestion.
Additive Increase
Additive Increase is a reaction to perceived available capacity. Linear increase basic idea For each “cwnd’s worth” of packets sent, increase cwnd by 1 segment. In practice, cwnd is incremented fractionally for each arriving ACK.
increment = MSS x (MSS /cwnd) cwnd = cwnd + increment
41 Source Destination
Increment by 1 segment for each RTT
Additive Increase
Multiplicative Decrease
Multiplicative Decrease key assumption is that a dropped packet and the resultant timeout are due to congestion at a router or a switch. TCP reacts to a timeout by halving cwnd. Although cwnd is defined in bytes, the literature often discusses congestion control in terms of segments, cwnd is not allowed below the size of a single packet
42 AIMD Analysis
AIMD is a necessary condition for TCP congestion control to be stable. The simple Congestion Control strategy involves timeouts that cause retransmissions, it is important that hosts have an accurate timeout mechanism. Timeouts set as a function of average RTT and standard deviation of RTT However, TCP hosts only sample round-trip time once per RTT using coarse-grained clock.
Typical TCP Behavior
70 60 50 40 30 20 10
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 Time (seconds) “Sawtooth” Pattern
43 TCP Congestion Control WINDOW MANAGEMENT
Window Management Techniques
A number of strategies have been suggested to effectively manage the sending window Slow Start Dynamic Window Sizing on Congestion Fast Retransmission Fast Recovery These techniques are incorporated in all modern implementations of TCP
44 Slow Start
TCP is self-clocking and paces itself approximately to the rate of the bottleneck At the initialization phase, no such pacing is available for guidance Sending at full window speed is ill-advised as it may cause large packet drops Some means of gradually expanding the initial window until pacing takes over is needed “Slow Start” is the procedure recommended for initial window expansion until pacing takes over
Phases of Congestion Control
*The variable ss_thresh can be thought of as an estimate of the level below which congestion is not expected.
45 TCP Slow Start The goal of TCP Slow Start is to discover roughly the proper sending rate quickly Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced: Intialize cwnd =1 Each time a segment is acknowledged, increment cwnd by one (cwnd++). Continue until the threshold, ss_thresh, is reached If cwnd > ss_thresh, increase linearly Additive Increase
TCP Slow Start
Each ACK adds 2 “credits” cwnd = 1
This gives the cwnd = 2 sender the permission to send two segments cwnd = 4 Slow start increases rate exponentially fast cwnd = 8 Rate is doubled every RTT!
46 Dynamic Window Sizing on Congestion
Slow Start enables the sender to quickly determine a reasonable window size for the connection What happens if packet loss (timeout) occurs? This could be a sign of congestion, but it is not clear how serious the congestion is Would resetting cwnd = 1 and restarting slow-start be enough? This may not be conservative enough Need a more aggressive response
Congestion Avoidance: Additive Increase
When a timeout occurs, use these rules: Set a slow-start threshold, ss_thresh, equal to half of cwnd ss_thresh = cwnd / 2 Set cwnd = 1 and perform slow start until cwnd = ss_thresh Increase cwnd only by 1, for every ACK received For cwnd ss_thresh, cwnd is increased by 1 for each RTT
47 Slow Start – Packet Loss
Initial values ss_thresh = 8
cwnd = 1 Loss after transmission 7 cwnd currently 12 Set ss_thresh = cwnd/2 Set cwnd = 1
TCP Improvement FAST RETRANSMIT FAST RECOVERY
48 Acknowledgments in TCP
Receiver sends ACK to sender 1K SeqNo=0 ACK is used for flow control, error control, and congestion control AckNo=1024 ACK number sent is the next sequence number expected 1K SeqNo=1024
Delayed ACK: TCP receiver normally AckNo=2048 delays transmission of an ACK (for
about 200ms) 1K SeqNo=2048 Why? 1K SeqNo=3072
ACKs are not delayed when packets are No=2048 received out of sequence Ack Why? Lost segment
Fast Retransmission Policy – TCP Tahoe
When a segment is lost, original TCP waits for an ACK that is not coming and eventually times-out. It is often the case that of the segments sent after the lost segment arrived at the receiver. For each segment received, the receiver sends a duplicate ACK, notifying the sender that the receiver is waiting for the missing segment. TCP Tahoe interprets duplicate ACK’s as an indication that a segment was lost.
49 Fast Retransmit
1K Se If three or more duplicate qNo=0
ACKs are received in a AckNo=1024
row, the TCP sender 1K SeqNo=102 considers the segment as 4 1K lost. SeqNo=2048
024 duplicate AckNo=1
1K Se Then TCP performs a qNo=3072
retransmission of the duplicate AckNo=1024
missing segment, without 1K SeqNo=102 waiting for a timeout to 4 1K SeqNo= happen. 4096 Fast Retransmit repairs a single segment loss
Inferring Non-Loss from ACKs
Duplicate ACKs not only signify that a packet is lost but also provide hints that segments are leaving the network Segments, beyond the lost segment, which triggered the transmission of ACKs Therefore, advancing the sliding window does not increase the number of segments stored in the network The load in the network queues does not increase
50 Fast Recovery
Fast recovery algorithm governs the transmission of new data until a non-duplicate ACK arrives. Fast recovery avoids slow start after a fast retransmit Intuition is that duplicate ACKs indicate that data is getting through The fast retransmit and fast recovery algorithms are usually implemented together
Fast Recovery cwnd=12 sshtresh=5 1K SeqNo=0
cwnd=12 AckNo=1024 After three duplicate ACKs sshtresh=5 1K set: SeqNo=1024 1K Retransmit “lost packet” SeqNo=2048 ss_thresh = cwnd/2 cwnd=12 AckNo=1024 sshtresh=5 cwnd = cwnd+3 1K SeqNo=3072 Enter congestion avoidance cwnd=12 AckNo=1024 Increment cwnd by one for sshtresh=5 1K each additional duplicate SeqNo=1024 ACK 1K SeqNo=4096 When ACK arrives that acknowledges “new data” cwnd=9 AckNo=2048 (AckNo=2028), set: sshtresh=9 cwnd=ss_thresh Enter congestion avoidance
51 Fast Recovery
cwnd (initial) ss_thresh fast-retransmit fast-retransmit timeout
new ACK new ACK
Time Slow Start Congestion Avoidance
Summary of TCP Behavior
TCP Response to 3 Response to Partial ACK Response to “full” ACK of Variation dupACK’s of Fast Retransmission Fast Retransmission Do fast retransmit, Tahoe ++cwnd ++cwnd enter slow start Exit fast recovery, deflate Exit fast recovery, deflate Do fast retransmit, Reno window, enter congestion window, enter congestion enter fast recovery avoidance avoidance Do fast retransmit, Fast retransmit and deflate Exit modified fast recovery, NewReno enter modified fast window – remain in deflate window, enter recovery modified fast recovery congestion avoidance
• When entering slow start, if connection is new, • When entering either fast recovery or ss_thresh = arbitrarily large value modified fast recovery, cwnd = 1. ss_thresh = max(flight size/2, 2*MSS) else cwnd = ss_thresh. ss_thresh = max(flight size/2, 2*MSS) cwnd = 1. • In congestion avoidance cwnd += 1*MSS per RTT • In slow start ++ cwnd on new ACK
52 Summary
Transmission Control Protocol Discussion Segment format and functionality TCP Flow Control TCP Congestion Control AIMD strategy
53