<<

Internet Transport Protocols UDP and TCP

Dr. T. Znati Computer Science Department

Outline

Review  UDP Protocol  UDP Characteristics  UDP Functionalities  TCP Protocol  TCP Characteristics  Connection Management  TCP Flow and Congestion Control

1 Design Issues TRANSPORT LAYER

Transport Layer Services and Protocols  Transport layer provides a logical connection between application processes running on different hosts  Transport Layer Services  Connection Management  If connection-oriented  Multiplexing and De-Multiplexing  Data Segmentation and Reassembly  Error, Flow and Congestion Control

2 Transport Layer Concepts

Host Host Application Application

Logical Connection Transport Transport

IP IP Network Network Access Access Physical Physical Router IP Internet NA NA Internet PHY PHY

Internet Transport Layer Protocols

 The IP suite offers two transport protocols  (UDP)  Connectionless protocol  “Best Effort” Service  Unreliable  Unordered datagram delivery  No error or flow control  Transmission Control Protocol  Connection-oriented protocol  Reliable, ordered delivery of byte stream  Error, flow and congestion control  No delay guarantees and no bandwidth guarantees

3 User Datagram Protocol UDP

UDP Characteristics UDP is a connectionless datagram service.  No need to establish a connection prior to data transfer  Datagrams may be generated and transmitted at any time. UDP datagrams are self-contained. UDP is unreliable:  No acknowledgements for reliable delivery of data.  Checksums cover the header, and only optionally cover the data.  Contains no mechanism to detect missing or out of sequence datagrams.  No mechanism for automatic retransmission.  No mechanism for flow control  Sender can over-run the receiver.

4 UDP Service

 UDP provides unreliable connectionless delivery service using IP to transport datagrams  UDP does not enhance the “best effort” service provided by IP  UDP provides “ports” to distinguish among multiple destinations within a host  Ports are used to multiplexing and demultiplex applications’ traffic

UDP Operation

A1 A2 B1 B2

App App App App

Socket

OS

UDP

IP UDP uses port number to demultiplex packets

5 User Datagram Protocol

UDP Source Port UDP Destination Port UDP Message Length UDP Checksum

Data

Physical Physical IP Header UDP Header Data Header Trailer

IP Datagram Physical Data Frame

UDP Checksum

 UDP source port is optional (set to 0 if not used)  UDP checksum is optional (set to 0 if not used)  Using UDP checksum option is useful, however, since IP checksum does not cover the data portion of the datagram  It provides the only way to ensure that data has arrived intact and should be used  Also, the only way to verify the UDP header

6 UDP Checksum

 The UDP checksum covers more information than is present in the UDP datagram  A pseudo-header is prepended to the UDP datagram, and a checksum over the entire object is computed  The pseudo-header contains source and destination IP addresses, IP protocol type (code 17 for UDP), and UDP datagram length (the pseudo-header not included)  It guarantees that the datagram has reached the proper destination  The checksum is computed as a one’s complement sum (sum modulo 216-1) of all 16-bit words of the header and the pseudo-header (checksum field is set to 0) and taking one’s complement of the result

Appropriate Uses of UDP

 Inward data collection  Outward data dissemination  Request-response  Real-time applications  Streaming of real-time audio and video.  On-time delivery is important  Minimum overhead

7 Transmission Control Protocol TCP

Transmission Control Protocol

 TCP provides a reliable service  TCP guarantees ordered delivery of a stream of data without loss or duplicatio, despite unreliable network packet delivery service  TCP assumes little about the underlying communication system  TCP can be used with a variety of packet delivery services  Dial-up telephone lines  Local and wide area networks  Low and high-speed long haul networks

8 TCP Interface Characteristics

 Virtual Circuit connection  TCP is full-duplex  TCP is stream-oriented protocol  Data is viewed as a stream of bytes  TCP provides an unstructured stream  TCP does not mark boundaries between streams  Application’s responsibility to understand stream contents  Buffered transfer  TCP collects enough data to fill a reasonably large datagram before transmission  TCP provides a push mechanism to force transfer, if needed

TCP supports a “stream of bytes” service Sender

Receiver

9 Stream Service is emulated using TCP “Segments” Sender

 Segment sent when: TCP Data  Segment is full  MSS bytes,  Segment not full, but times out, or  “Pushed” by application.

TCP Data MSS = Receiver

TCP MSS

 TCP segments are the messages that carry data between TCP sender and receiver  TCP must decide how many bytes to put into each message that it sends.  The current size of a TCP segment, CSS, is determined by two factors  W – The size of the receiver’s window (bytes)  Maximum Segment Size (MSS) – A ceiling on TCP segment size, never to be exceeded  CSS = minimum(MSS, W)

10 TCP Interface Characteristics

 TCP specifications describe generally how applications use TCP, but do not dictate details of an interface  There have been numerous implementations of TCP  TCP Reno, TCP Tahoe, TCP Las Vegas, SACK TCP, ….

Connection Establishment – Three-way

Handshake Passive Server Client Listening for Connection Send data unit Requests with SYN bit set SYN x and seq# = x Receive data unit

Send data unit with SYN bit set ACK x+1/SYN y and seq# = y, Receive data unit acknowledge x+1

Send data unit acknowledge y+1 ACK y+1 Receive data unit

11 Connection Termination

 TCP connections are terminated with “graceful close”  Graceful close ensures that the connection is terminated after all data had been received  One side issues Close request, and is not permitted to transmit data after sending it  The other side acknowledges it, but can send data until it sends Close request too  After the second Close request is acknowledged, the connection is closed

Connection Termination

No data transmission FIN x from this side Last segment sent by receiver • Sequence number : y ACK x+1/ Data y • Length: k

ACK y+k+1

FIN y+k

ACK y+k+1

12 TCP Segment Format

0 15 31 Source Port Destination Port Sequence Number Acknowledgement Number Data U A P R S F Reserved R C S S Y I Window Offset G K H T N N Checksum Urgent pointer Options Padding

Data

TCP Segment Fields

 Source and Destination Port Number  16-bit – end-point identifiers  Sequence Number  32-bit – number of the first data octet in this segment except for when the SYN flag is set  When the SYN flag is set, sequence number identifies “the Initial Sequence Number” and the first data octet is ISN+1  Acknowledgement Number  32-bit – number of the next octet expected to receive  Data Offset  4-bit – number of 32-bit words in the header (including options)  Header Length

13 TCP Flags  URG: segment contains urgent data:  urgent pointer points to the last octet of urgent data  ACK: acknowledgement field is significant  PSH: segment is sent with the push function  RST: reset the connection  Closes a half-open connection  SYN: synchronize sequence numbers  Used during open request  FIN: no more data from sender  Used during close request

TCP Segment Fields

 Window – 16-bit number of unacknowledged octets that the sender is allowed to transmit  Maximum value limits the value of the window size to 216 , unless “window scale” option is used  Checksum – 16-bit one’s complement of a one’s complement sum of all 16-bit words of the segment and a pseudo-header (the checksum field is set to zero)  The pseudo-header contains the sender’s and receiver’s IP addresses, the protocol type (code 6 for TCP), and the segment length field)  A zero padding is added to the segment to make the segment multiple of 16 bits  The pseudo-header and the padding are not transmitted

14 Pseudo-header

Source address (32-bits) Destination address (32-bits) 00000000 Protocol TCP segment length 00000110

Out-of-Band Data

 Out-of-band data is needed to handle abort or program interrupt signals  These “signals” should not wait for the octets already in the TCP stream to be transmitted  Telnet uses this feature to send “interrupt” commands  TCP uses URGENT feature to accomodate out-of-band data

15 Out-of-Band Data

Out-of-Band Out-of-Band Data Data

User Network User Process Process Send Receive Buffer Buffer

Application is required to check on the “Out-of-Band” data stream before processing regular data stream

Out-of-Band Data

 TCP notifies associated application of the beginning and end of urgent mode  How TCP informs the application depends on the operation system  Urgent data is detected by the URG flag set and retrieved by the urgent pointer  Unfortunately, TCP does not denote the beginning of the urgent data in the segment  It’s up to the application to decide, where the urgent data starts

16 TCP Options

 Originally, one option, Maximum Segment Size was defined  The 16-bit option may be used only in the initial connection request segment  If this option is not used, any segment size is allowed  Later, two other options have gained widespread acceptance  Window scale factor  Timestamps

TCP Options

 End of options

Kind = 0 (8 bits)

 No operation : padding

Kind = 1 (8 bits)

17 TCP Options

 Maximum Segment Size

Kind = 2 Length = 4 Maximum Segment Size (8 bits) (8 bits) (16 bits)

 Window Scale Factor

Kind = 3 Length = 3 Shift Count (8 bits) (8 bits) (8 bits)

Window Scale Option – Purpose

18 Window Scale Option

 The option increases the TCP receive window above its maximum value of 65,535 bytes  A much better alternative than changing the TCP header to accomodate the need for larger windows  When window scale option is used, the value of the field is multiplied by 2F, where F is between 1 and 14, is a scale option specified in the initial connection request segment  A value F = 14 results in a maximum window size of over 109 bytes (65535 214)  Internally, TCP maintains a “real” window size of 32 bits  The option can only appear in a SYN segment  Thus, scale factor is agreed upon and fixed in each direction at the connection setup

TCP Options  Timestamp – Used for 2 distinct mechanisms  RTT Measurement (RTTM) – To track the RTT for data in order to identify changes in latency that may require ack timer adjustment  Protect Against Wrapped Sequences (PAWS) – Used in high speed networks, to detect delayed packets when they occur within the current transmission windows

Kind = 8 Length = 10 Timestamp Value (MSB) (8 bits) (8 bits) (16 bits)

Timestamp Value (LSB) Timestamp Echo Reply (16 bits) (MSB) (16 bits)

Timestamp Echo Reply (LSB) (16 bits) This option can be used throughout the TCP connection

19 TCP FLOW CONTROL

TCP Data Flow Control

 TCP uses a variable size window protocol to regulate the flow control  Its unique feature is that it decouples acknowledgements of received data units from the granting of a permission to send additional data  Each acknowledgement contains a window advertisement  Increased window advertisement allows sender to proceed with sending octets not acknowledged yet  Decreased window advertisement causes the sender to stop at the boundary of the window

20 TCP Sliding Window

Window Size

Data Outstanding Data OK Data not OK AcK’d Un-Ack’d data to send to send yet

 Window is meaningful to the sender  Current window size is “advertised” by receiver  Usually 4k – 8k Bytes when connection set-up

Acknowledgements and Retransmission

 TCP uses a cumulative acknowledgement scheme  ACK reports on the number of octets so far accumulated and specifies the next octet expected  Advantages  ACKs are easy to generate unambiguously  Lost ACKs do not necessarily force a retransmission  Disadvantages  Receiver does not report on the segments successfully received, if one intermediate segment is lost  Estimating timeout becomes challenging

21 TCP Window Mechanisms

 TCP uses a credit allocation scheme  The sender includes the sequence number of the first octet in the segment data field  The receiver acknowledges the incoming segment with a message of the form (ACK = i, WIN = j )  All octets up to i-1 are acknowledged  Permission is granted to send an additional window of j octets starting at i through i+j-1

TCP Credit Allocation Scheme

 The credit allocation scheme is flexible  Assume last message issued by receiver carries ACK = i and WIN = j  To increase the credit by an amount k > j, when no additional data have arrived, the receiver issues ACK = i and WIN = k  To acknowledge an incoming segment containing m octets of data (m < j) without granting additional credits, the receiver issues ACK = i+m and WIN = j-m  The receiver can issue cumulative acks, and is not required to issue an ack immediately after receiving a segment

22 TCP Window Control

Sender Receiver 1001 2400 1001 2400

SN 1001 1601 2400 LEN 600 1601 2400

ACK 1601 1601 2400 WIN 800 1601 4000

ACK 1601 1601 4000 WIN 2400 1601 4000

Effect of Window Size

D W

23 TCP Actual Throughput

 TCP performance can be affected by several complicating factors:  In most cases, a number of TCP connections are multiplexed over the same network interface, so each connection is only allocated a fraction of bandwidth  This reduces R, therefore S as well  TCP connections usually involve hopping across multiple networks  Router delay becomes the biggest contributor to D  Actual data rate may be reduced below R if a bottleneck router exists along the path  If any segment is lost and must be retransmitted, throughput is reduced

TCP Sliding Window Fundamental Questions

 How much data can a TCP sender have outstanding in the network?  How much data should TCP retransmit when an error occurs?  Just selectively repeat the missing data?  How does the TCP sender avoid over- running the receiver’s buffers?  TCP uses Flow and Congestion Control to deal with these issues

24 Transmission Control Protocol FLOW CONTROL

TCP Flow Control

 Original window size is negotiated during connection establishment  Receiver throttles sender by advertising a window size no larger than the amount of data it can buffer.  To avoid buffer overflow at the receiver side, the following invariant is always true:

LastByteRcvd - LastByteRead <= MaxRcvBuffer

25 TCP Flow Control

 TCP receiver advertises the following window:

AdvertisedWindow = MaxRcvBuffer - (LastByteRcvd - LastByteRead)

 It is the amount of free space available in the receiver’s buffer.

TCP Flow Control Window

 The TCP sender must adhere to AdvertisedWindow from the receiver such that:

LastByteSent – LastByteAcked <= AdvertisedWindow

 This defines an Effective Window:

EffectiveWindow = AdvertisedWindow – (LastByteSent - LastByteAcked)

26 TCP Flow Control

 Sender Flow Control Rules:  The EffectiveWindow must be > 0 for sender to send more data.  LastByteWritten – LastByteAcked <= MaxSendBuffer  When equality is reached, the send buffer is full!!  TCP sender must block the sending application.

Transmission Control Protocol CONGESTION CONTROL

27 TCP Congestion Control

 TCP sliding window provides an adequate mechanism to control the data flow between the sender and the receiver  TCP sender still needs to perform congestion control to ensure that the network can handle the window agreed upon by the sender and the receiver  TCP relies on ACKs to control congestion  If the ACK is not received on time, TCP retransmits the segment  An absence of an ACK signals congestion in the network

TCP Window Control

 TCP does not rely on an explicit negative acknowledgement to detect errors  TCP relies exclusively on positive ACKs and retransmissions when an ACK does not arrive within a given period of time  Retransmission TimeOut (RTO) should be on the order of RTT (Round-Trip Time)  RTT and its statistics, however, vary considerably as the Internet load changes  Thus timers play a major role in TCP congestion

28 TCP Timeout Timers

 Two strategies can be used to set timers’ value:  Static timer values  Adaptive timer values

Round-trip time (RTT) Retransmission TimeOut (RTO)

Guard Band Sender Estimated RTT

Data Data ACK ACK

Receiver

Timer Values

 The timeout timer should be set to a value somewhat longer than the RTT  Two strategies can be used to set the timer value:  Static timer value  Adaptive timer value

29 Static Value Timers

 A fixed value timer, possibly based on the Internet typical behavior, is used for RTO value  The strategy suffers from the inability to respond to Internet changing conditions  A high value leads to sluggish service and unnecessary long time of response to a lost packet  A low value leads to a positive feedback condition that causes more retransmissions (some unnecessary) in a case of congestion

Adaptive Timers – Smoothed Averages

 An estimated value of RTT, SRTT(), is computed as a smoothed average SRTTk 1 SRTTk 1RTTk 1  The smaller the value of , the greater weight of more recent observations  With small values of , the average quickly reflects a rapid change in the observed quantity  A disadvantage, a brief surge in the observed value followed by a return to the average causes strong fluctuations   = 0.9 is recommended

30 Adaptive Timers – Timeout Value

 Given an SRTT value, what should be the RTO value?  Use a constant value: RTOk 1 SRTTk 1 

  is not proportional to SRTT  If SRTT(k+1) is large,  may become insignificant, relatively  As a consequence, fluctuations in the RTT cause unnecessary retransmissions  If SRTT(k+1) value is small,  may become relatively large  As a consequence, the mechanism depends solely on  value, not reacting to changing conditions

Adaptive Timers – Internet Original Scheme 

31 TCP Congestion Control

 The rate at which a TCP entity can send data is determined by the rate of incoming ACKs to previous segments (with granted credit)  This rate is determined by the “bottleneck” in the round trip path between the source and the destination

TCP Congestion Control

 Congestion control in the Internet is complex  IP is a connectionless, stateless protocol  No provision for determining congestion, let alone control it  TCP provides only end-to-end flow control and relies on implicit feedback to deduce the presence of congestion  ICMP quench messages are too crude to provide any effective control  No cooperative, distributed algorithm exists between different TCP entities  On the contrary, TCP entities may be selfish and do not cooperate to maintain some level of control

32 TCP Congestion Control

 The only tool is the sliding-window flow control and error control mechanisms  Not enough, in a dynamic environment  A number of clever techniques have been developed and added to control congestion in the Internet  Mechanisms for congestion detection  Mechanisms for congestion avoidance  Mechanisms for congestion recovery

TCP Flow and Congestion Control

 TCP is self-clocking: returning of ACKs functions as pacing signal  After an initial burst, the sender’s segment rate matches the arrival rate of ACKs  Sender rate equals the rate of the slowest link on the path  This way TCP senses the bottleneck and regulates its rate accordingly

33 TCP Flow and Congestion Control Network Bottleneck  Pb is the minimum segment spacing on the slowest link  Pr is the segment spacing at the receiver  As is the ACK spacing at the sender  Ab is the ACK spacing at the bottleneck Pr Pb

Data

Pb = Pr = Ab = As

ACK

As Ab

TCP Flow and Congestion Control

 Problem caused by the fact that the sender does not know where is the bottleneck: the network or the receiver  If ACKs arrive relatively slow due to network congestion, then the sender must transmit segments at a rate slower than the ACKs  On the other hand, if slow pace is due to the receiver, this pace should dictate the transmission policy  Thus, TCP sliding window must be enhanced to factor in the network congestion

34 Improving TCP Congestion Effectiveness  Three techniques have been suggested for retransmission timer management  RTT Variance Estimation  Exponential Backoff  Karn’s Algorithm  Four techniques have been suggested for window management  Slow Start  Dynamic Window Sizing on Congestion  Fast Retransmission  Fast Recovery

RTT Variance Estimation Factors of Influence  RTT exhibits relatively high variance due to three sources  A low data rate on the TCP connection relative to the progation time makes the transmission delay relatively important in the RTT estimation  Thus, high variation in datagram sizes induces high variation in RTT  SRTT estimator can be heavily influenced by characteristics of the data, that have nothing to do with the network  Abrupt changes in the Internet load and condition  TCP receiver may use cumulative ACKs

35 RTO Scheme Revisited

 TCP original scheme suggests to compute RTO as follows RTOk 1  SRTTk 1;   2

 If RTT has low variance, it results in unnecessary high values of RTO  In unstable environments, it may be inadequate to protect against retransmissions

TCP Timeout Estimates

 There will be some (unknown) distribution  Router queues grow when there is more of RTTs. traffic, until they become unstable.  We are trying to estimate an RTO to  As load grows, variance of delay grows

minimize the probability of a false timeout. rapidly. Probability

Variance grows rapidly

with load Average Queueing Average Delay

Variance

Mean RTT Load Amount of Traffic Arriving to the Router

36 RTT Variance Estimates

 A more effective approach is to estimate RTT standard deviation  However, too costly as it involves square and square root calculation  A less expensive alternative – the mean deviation MDEVX   EX  EX 

Dynamic Estimate of RTT Variability  The algorithm to dynamically estimate the variability in estimating RTT uses an exponential smoothing technique SRTT k 1   SRTT k 1RTT k 1 SERRk 1  RTT k 1 SRTT k 1 SDEV k 1  ' SDEV k 1'SERRk 1 RTOk 1  SRTT k 1  SDEV k 1 7 3   ; ' ;   4 8 4

37 Round Trip Time ESTIMATION UNDER CONGESTION

TCP Congestion Control

 RTT variance estimation is efficient in detecting data loss  It may not be sufficient in a case of retransmission  Two other factors must be taken into consideration when managing retransmission timers in a case of congestion:  What RTO value should be used on a retransmitted segment (i.e., when timeout occurs)?  Exponential RTO Backoff  What RTT samples should be used as input to compute SRTTT( ) in a case of retransmission?  Karn’s Algorithm

38 Exponential RTO Backoff

 Timeouts are usually due to congestion  Maintaining the same RTO value for retransmission is ill advised  May cause sustained congestion due to the development of a similar pattern of behavior of all active TCP connections  All sources wait for local RTO time and retransmit again  A more efficient scheme must try to increase RTO values to give the Internet more time to clear the congestion  Exponential RTO Backoff scheme  RTO = q RTO  Typically q = 2

Retransmission and Timeouts  TCP Source cannot distinguish between these two cases

Host A Host B Host A Host B

Retransmission Retransmission Wrong RTT Sample Wrong RTT Sample

 Feeding the resulting RTT into the RTO algorithm may lead to false measurements and oscillations

39 RTT Samples – Karn’s Algorithm

 Karn’s algorithm uses the following rules  Do NOT use the measured RTT for a retransmitted segment to update SRTT and SDEV  Calculate the backoff RTO when a retransmission occurs  RTO = q RTO ; (typically q = 2)  Use the backoff RTO value for succeeding segment until an acknowledgement arrives for a segment that has not been retransmitted  When the acknowledgement is received to an unretransmitted segment, use the normal algorithm to update SRTT and SDEV and compute future RTO values

TCP Congestion Control

 Armed with an efficient RTO estimation mechanism, use lack of acknowledgements as congestion indication and maintain a congestion window to estimate congestion in the network  CongestionWindow :: a variable maintained by the TCP source for each connection.  TCP is modified such that the maximum number of bytes of unacknowledged data allowed is the minimum of CongestionWindow and AdvertisedWindow

40 Additive Increase, Multiplicative Decrease (AIMD)

MaxWindow :: min (CongestionWindow , AdvertisedWindow) EffectiveWindow = MaxWindow – (LastByteSent –LastByteAcked)

 CongestionWindow (cwnd) is set based on the perceived level of congestion.  The Host receives implicit (packet drop) or explicit (packet mark) indications of internal congestion.

Additive Increase

 Additive Increase is a reaction to perceived available capacity.  Linear increase basic idea  For each “cwnd’s worth” of packets sent, increase cwnd by 1 segment.  In practice, cwnd is incremented fractionally for each arriving ACK.

increment = MSS x (MSS /cwnd) cwnd = cwnd + increment

41 Source Destination

Increment by 1 segment for each RTT

Additive Increase

Multiplicative Decrease

 Multiplicative Decrease key assumption is that a dropped packet and the resultant timeout are due to congestion at a router or a switch.  TCP reacts to a timeout by halving cwnd.  Although cwnd is defined in bytes, the literature often discusses congestion control in terms of segments,  cwnd is not allowed below the size of a single packet

42 AIMD Analysis

 AIMD is a necessary condition for TCP congestion control to be stable.  The simple Congestion Control strategy involves timeouts that cause retransmissions, it is important that hosts have an accurate timeout mechanism.  Timeouts set as a function of average RTT and standard deviation of RTT  However, TCP hosts only sample round-trip time once per RTT using coarse-grained clock.

Typical TCP Behavior

70 60 50 40 30 20 10

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 Time (seconds)  “Sawtooth” Pattern

43 TCP Congestion Control WINDOW MANAGEMENT

Window Management Techniques

 A number of strategies have been suggested to effectively manage the sending window  Slow Start  Dynamic Window Sizing on Congestion  Fast Retransmission  Fast Recovery  These techniques are incorporated in all modern implementations of TCP

44 Slow Start

 TCP is self-clocking and paces itself approximately to the rate of the bottleneck  At the initialization phase, no such pacing is available for guidance  Sending at full window speed is ill-advised as it may cause large packet drops  Some means of gradually expanding the initial window until pacing takes over is needed  “Slow Start” is the procedure recommended for initial window expansion until pacing takes over

Phases of Congestion Control

*The variable ss_thresh can be thought of as an estimate of the level below which congestion is not expected.

45 TCP Slow Start  The goal of TCP Slow Start is to discover roughly the proper sending rate quickly  Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced:  Intialize cwnd =1  Each time a segment is acknowledged, increment cwnd by one (cwnd++).  Continue until the threshold, ss_thresh, is reached  If cwnd > ss_thresh, increase linearly  Additive Increase

TCP Slow Start

 Each ACK adds 2 “credits” cwnd = 1

 This gives the cwnd = 2 sender the permission to send two segments cwnd = 4  Slow start increases rate exponentially fast cwnd = 8  Rate is doubled every RTT!

46 Dynamic Window Sizing on Congestion

 Slow Start enables the sender to quickly determine a reasonable window size for the connection  What happens if packet loss (timeout) occurs?  This could be a sign of congestion, but it is not clear how serious the congestion is  Would resetting cwnd = 1 and restarting slow-start be enough?  This may not be conservative enough  Need a more aggressive response

Congestion Avoidance: Additive Increase

 When a timeout occurs, use these rules:  Set a slow-start threshold, ss_thresh, equal to half of cwnd  ss_thresh = cwnd / 2  Set cwnd = 1 and perform slow start until cwnd = ss_thresh  Increase cwnd only by 1, for every ACK received  For cwnd  ss_thresh, cwnd is increased by 1 for each RTT

47 Slow Start – Packet Loss

 Initial values  ss_thresh = 8

 cwnd = 1  Loss after transmission 7  cwnd currently 12  Set ss_thresh = cwnd/2  Set cwnd = 1

TCP Improvement FAST RETRANSMIT FAST RECOVERY

48 Acknowledgments in TCP

 Receiver sends ACK to sender 1K SeqNo=0  ACK is used for flow control, error control, and congestion control AckNo=1024  ACK number sent is the next sequence number expected 1K SeqNo=1024

 Delayed ACK: TCP receiver normally AckNo=2048 delays transmission of an ACK (for

about 200ms) 1K SeqNo=2048  Why? 1K SeqNo=3072

 ACKs are not delayed when packets are No=2048 received out of sequence Ack  Why? Lost segment

Fast Retransmission Policy – TCP Tahoe

 When a segment is lost, original TCP waits for an ACK that is not coming and eventually times-out.  It is often the case that of the segments sent after the lost segment arrived at the receiver.  For each segment received, the receiver sends a duplicate ACK, notifying the sender that the receiver is waiting for the missing segment.  TCP Tahoe interprets duplicate ACK’s as an indication that a segment was lost.

49 Fast Retransmit

1K Se  If three or more duplicate qNo=0

ACKs are received in a AckNo=1024

row, the TCP sender 1K SeqNo=102 considers the segment as 4 1K lost. SeqNo=2048

024 duplicate AckNo=1

1K Se  Then TCP performs a qNo=3072

retransmission of the duplicate AckNo=1024

missing segment, without 1K SeqNo=102 waiting for a timeout to 4 1K SeqNo= happen. 4096  Fast Retransmit repairs a single segment loss

Inferring Non-Loss from ACKs

 Duplicate ACKs not only signify that a packet is lost but also provide hints that segments are leaving the network  Segments, beyond the lost segment, which triggered the transmission of ACKs  Therefore, advancing the sliding window does not increase the number of segments stored in the network  The load in the network queues does not increase

50 Fast Recovery

 Fast recovery algorithm governs the transmission of new data until a non-duplicate ACK arrives.  Fast recovery avoids slow start after a fast retransmit  Intuition is that duplicate ACKs indicate that data is getting through  The fast retransmit and fast recovery algorithms are usually implemented together

Fast Recovery cwnd=12 sshtresh=5 1K SeqNo=0

cwnd=12 AckNo=1024  After three duplicate ACKs sshtresh=5 1K set: SeqNo=1024 1K  Retransmit “lost packet” SeqNo=2048  ss_thresh = cwnd/2 cwnd=12 AckNo=1024 sshtresh=5  cwnd = cwnd+3 1K SeqNo=3072  Enter congestion avoidance  cwnd=12 AckNo=1024 Increment cwnd by one for sshtresh=5 1K each additional duplicate SeqNo=1024 ACK 1K SeqNo=4096  When ACK arrives that acknowledges “new data” cwnd=9 AckNo=2048 (AckNo=2028), set: sshtresh=9  cwnd=ss_thresh  Enter congestion avoidance

51 Fast Recovery

cwnd (initial) ss_thresh fast-retransmit fast-retransmit timeout

new ACK new ACK

Time Slow Start Congestion Avoidance

Summary of TCP Behavior

TCP Response to 3 Response to Partial ACK Response to “full” ACK of Variation dupACK’s of Fast Retransmission Fast Retransmission Do fast retransmit, Tahoe ++cwnd ++cwnd enter slow start Exit fast recovery, deflate Exit fast recovery, deflate Do fast retransmit, Reno window, enter congestion window, enter congestion enter fast recovery avoidance avoidance Do fast retransmit, Fast retransmit and deflate Exit modified fast recovery, NewReno enter modified fast window – remain in deflate window, enter recovery modified fast recovery congestion avoidance

• When entering slow start, if connection is new, • When entering either fast recovery or ss_thresh = arbitrarily large value modified fast recovery, cwnd = 1. ss_thresh = max(flight size/2, 2*MSS) else cwnd = ss_thresh. ss_thresh = max(flight size/2, 2*MSS) cwnd = 1. • In congestion avoidance cwnd += 1*MSS per RTT • In slow start ++ cwnd on new ACK

52 Summary

 Transmission Control Protocol Discussion  Segment format and functionality  TCP Flow Control  TCP Congestion Control  AIMD strategy

53