<<

Chapter 6

• Transport Service • Elements of Transport Protocols • Congestion Control • Protocols – UDP • Internet Protocols – TCP • Performance Issues • Delay-Tolerant Networking

Revised: August 2011

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 The Transport Layer

Responsible for delivering data across networks with the desired Application reliability or quality Transport Network Link Physical

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Transport Service

Data transport from a process on the source machine to a process on the destination machine with a desired level of reliability that is independent of the physical networks actually being used. − Transport entity process can be physically located in the operating system kernel, in a NIC card, in a library package bound into the network applications or else in a separate user process. − Transport code runs entirely on end user’s computers in direct contrast to the , which runs mostly on routers. • Services Provided to the Upper Layer » • Transport Service Primitives » • Berkeley Sockets » • Socket Example: Internet File Server »

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Services Provided to the Upper Layers (1)

Transport layer adds reliability to the network layer • Offers connectionless (e.g., UDP) and connection- oriented (e.g, TCP) service to applications

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Services Provided to the Upper Layers (2) New term: “segments” – transport layer “packets” Transport layer sends segments in packets (in frames)

Segment

Segment

Transport Layer Protocols in the Suite: 1. UDP – connectionless 2. TCP – connection-oriented reliable service 3. SCTP (Stream Control Transmission Protocol) – Emerging; tailored for multimedia and wireless environments 4. SST (Structured Stream Transport) – experimental; finer grained TCP-like 5. DCCP ( Congestion Controlled Protocol) – UDP with congestion control CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Transport Service Primitives (1) Service Primitives for TCP. • Transport layer makes it possible for the (TCP) transport service to be more reliable than the underlying network. − TCP hides the imperfections of the network layer − Transport Layer provides services to the • Greater reliability is accomplished entirely without any dependence (or reliance) upon underlying network primitives. TCP’s primitives that applications might call to transport data for a simple connection-oriented service: • Client calls CONNECT, SEND, RECEIVE, DISCONNECT • Server calls LISTEN, RECEIVE, SEND, DISCONNECT

Segment

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Transport Service Primitives (2)

State diagram for a simple connection-oriented service

Solid lines (right) show client state sequence

Dashed lines (left) show server state sequence

Transitions in italics are due to segment arrivals.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Berkeley Sockets

Very widely used primitives started with TCP on • Notion of “sockets” as transport endpoints • Like simple set plus SOCKET, BIND, and ACCEPT

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Socket Example – Internet File Server (1)

Client code See code on page 504 . . .

Get server’s IP address

Make a socket

Try to connect . . .

9 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Socket Example – Internet File Server (2)

Client code (cont.) . . .

Write data (equivalent to send)

Loop reading (equivalent to receive) until no more data; exit implicitly calls close

10 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Socket Example – Internet File Server (3)

Server code See code on page 505 . . .

Make a socket

Assign address

Prepare for incoming connections . . . 11 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Socket Example – Internet File Server (4)

Server code (continued) . . .

Block waiting for the next connection

Read (receive) request and treat as file name

Write (send) all file data

Done, so close this connection

12 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Elements of Transport Protocols

• Addressing » • Connection establishment » • Connection release » • Error control and » • » • Crash recovery »

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Addressing

• Transport layer adds TSAPs • Multiple clients and servers can run on a host with a single network (IP) address • TSAPs are ports for TCP/UDP

This slide explains how the OSI protocols “address” services on adjacent layers of the OSI reference model.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Instructor’s Slide Addition: Addressing • Textbook describes Internet Port Addresses in terms of OSI’s “TSAP” concept • Your Instructor believes that this is confusing for those not familiar with OSI, even though these OSI concepts are helpful for a formal orientation. • This instructor-inserted slide seeks to summarize pgs 509-512 and 554 of the textbook in purely Internet terms, which he believes is much simpler.

• OSI’s TSAPs and Internet’s Ports both provide addresses for application-layer applications so that the transport layer can forward packets to the appropriate application layer processing entity (e.g., a process server local within that device such as Unix’s inetd).

://www.ietf.org/assignments/service-names-port-numbers/service-names- port-numbers.xml Security firewalls also extensively • Three types of Ports use port references and definitions 1. Well-known Ports: ports 0 – 1023 to identify specific applications. 2. Registered Ports: ports 1024 – 49151 » Ports 0 through 49151 are formally registered worldwide through IANA (see above URL for current registration results) 3. Dynamic Ports: Ports 49152 – 65535 » These ports are not registered thru IANA but rather use a locally scoped port ID system such as portmapper (see textbook page 510) . Example is Sun Microsystems’ ONC RPC protocol system, which is a popular approach for remote procedure calls (see pages 543-546) Another Instructor Slide Insertion • Tannenbaum is a European and therefore apparently thinks that connection- oriented systems are “normal” and connectionless systems “abnormal” or “unusual” or “bad” – they didn’t exist within OSI. Regardless, he introduces and describes Transport layer concepts solely in terms of the connection-oriented variant.

• The next 11 slides are true for the Internet’s TCP. • They are NOT true for the Internet’s UDP, which provides a connectionless service to the application layer.

• Textbook doesn’t really explain this distinction until UDP and TCP are subsequently introduced late in the chapter circa page 541

• This reflects the textbook’s origins: earlier editions were written in a time of protocol diversity and discussed data communications in terms of the OSI Reference Model and OSI Protocols. These foundational concepts still remain important to understand for protocol development and research. Connection Establishment (1)

TCP is a connection-oriented protocol. It therefore provides services common to other connection-oriented approaches (e.g., circuit switching) Key problem is to ensure reliability even though packets may be lost, corrupted, delayed, and duplicated • Don’t treat an old or duplicate packet as new • (Use ARQ and checksums for loss/corruption)

Approach: • Don’t reuse sequence numbers within twice the MSL (Maximum Segment Lifetime) of 2T=240 secs • Three-way handshake for establishing connection

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Establishment (2) Requirement: need a practical and foolproof way to reject delayed duplicate segments to ensure that no session confusion can happen. Use a sequence number space large enough that it will not wrap, even when sending at full rate • Clock (high bits) advances & keeps state over crash

Need seq. number not to Need seq. number not to wrap within T seconds climb too slowly for too long

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Establishment (3)

Three-way handshake provides a foolproof way to ensure that an initial CONNECTION_REQUEST sequence number is not a duplicate of a recent connection (since seq numbers are not remembered across different connections at a destination).

Three-way handshake used for initial packet • Since no state from previous connection • Both hosts contribute fresh seq. numbers • CR = Connection Request

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Establishment (4)

Three-way handshake protects against odd cases: a) a) Duplicate CR. Spurious ACK does not connect X b) Duplicate CR and DATA. Same plus DATA will be rejected (wrong ACK).

b) X X

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Release (1)

Key problem is to ensure reliability while releasing Asymmetric release (when one side breaks connection) is abrupt and may lose data

X

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Release (2) Symmetric release (both sides agree to release) can’t be handled solely by the transport layer • Two-army problem shows pitfall of agreement − Blue is more powerful than white but can only win if 2 parts can synchronize their attack, but only comms is thru the white army where the messenger may be captured and the other part not receive it, leaving the attacker vulnerable if attack alone.

Attack? Attack?

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Release (3)

Normal release sequence, initiated by transport user on Host 1

• “Normal” in this context means “without lost packets” • DR=Disconnect Request • Both DRs are ACKed by the other side

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Connection Release (4)

Error cases are handled with timer and retransmission

Final ACK lost, Lost DR causes Extreme: Many lost Host 2 times out retransmissions DRs cause both hosts to timeout

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Error Control and Flow Control (1) Error control is ensuring that the data is delivered with the desired level of reliability, usually that all of the data is delivered without any errors. Flow control is keeping a fast transmitter from overrunning a slow receiver. Foundation for TCP error control is a sliding window (from ) with checksums and retransmissions Flow control manages buffering at sender/receiver • Issue is that data goes to/from the network and applications at different times • Window tells sender available buffering at receiver − Buffers are needed at both the sender and the receiver − A host may have many connections, each of which are treated separately, so it may need a substantial amount of buffering for the sliding windows. • Makes a variable-size sliding window − Need to dynamically adjust buffer allocations as connections open and close and traffic patterns change. CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Error Control and Flow Control (2)

Different buffer strategies trade efficiency / complexity

a) Chained fixed- size buffers b) Chained variable- size buffers

c) One large circular buffer

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Error Control and Flow Control (3) Example of how dynamic window management might work in a datagram network with 4-bit sequence numbers. A and B are communicating hosts. Flow control example: A’s data is limited by B’s buffer B’s Buffer … means lost packet

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 2 3 4 5 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 7 8 9 10 In line 16 above B allocated more buffers but the allocation segment (TPDU) was lost. To prevent deadlocks like this, each host should periodicallyCN5E by send Tanenbaum control & Wetherall, segments © Pearson Education giving-Prentice the Hallack andand D. Wetherall, buffer 2011 status on each connection . Multiplexing Multiplexing = sharing several conversations over the same connection Kinds of transport / network sharing that can occur: • Multiplexing: connections share a network address • Inverse multiplexing: addresses share a connection

Multiplexing Inverse Multiplexing Question: If this example is for an IPv4 network, how many network interface cards (NICs) are on that computer? How about an IPv6 network? 28 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Crash Recovery Network and Transport layers automatically handle network and crashes. Issue here is how the transport layer recovers from host (end system) crashes. Want clients continue to be able to work if the server quickly reboots. Rule of Thumb: Recovery from a layer N crash can only be done by layer N + 1 because only the higher level retains enough status info to reconstruct where it was before the problem occurred. Application needs to help recovering from a crash • Transport can fail since A(ck) / W(rite) not atomic. Note: C = Crash

All 8 possible combos For each strategy there of client and server is some seq of events Strategies shown here. causing protocol to fail. CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Congestion Control

Two layers are responsible for congestion control: − Transport layer, controls the offered load [here] » Internet relies heavily on the transport layer for congestion control with specific algorithms built into TCP and other protocols. » Question: Is UDP among those “and other protocols”? − Network layer, experiences congestion and signals it using ECN [previous chapter]

• Desirable bandwidth allocation » • Regulating the sending rate » • Wireless issues »

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Desirable Bandwidth Allocation (1)

Efficient use of bandwidth gives high goodput, low delay • The metric here is load/delay – it will rise with load until delay starts to climb rapidly

Goodput rises more slowly than Delay begins to rise sharply load when congestion sets in when congestion sets in

Question: What packet behavior naturally happens at the Network Layer when congestion appears that the “desired response” above is trying to correct?

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Desirable Bandwidth Allocation (2)

This slide and the next one address the issue of how to divide bandwidth between 2 diff transport senders in order to resolve congestion Definition: An allocation is max-min fair if the bandwidth given to one flow cannot be increased without decreasing the bandwidth given to another flow with an allocation that is no larger. Fair use gives bandwidth to all flows (no starvation) • Max-min fairness gives equal shares of bottleneck

Question: Does one always want Bottleneck link max-min fair use for all traffic? CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Desirable Bandwidth Allocation (3)

Min-Max Fair Example We want bandwidth levels to converge quickly when traffic patterns change Flow 1 slows quickly Flow 1 speeds up quickly when Flow 2 stops Rate decreases when 2 starts

Example where bandwidth allocation changes over time and converges quickly

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Regulating the Sending Rate (1)

Sender may need to slow down for different reasons: 1. Flow control, when the receiver is not fast enough [right] − Previously discussed flow control with variable sized window to do this. 2. Congestion control, when the network is not fast enough [next slide] A fast network feeding a low-capacity receiver  flow control is needed

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Regulating the Sending Rate (2)

Our focus is dealing with this problem – congestion

• The way that a transport protocol should regulate the sending rate depends on the form of the feedback returned by the network (see next slide). • Feedback can be explicit or implicit • Feedback can be precise or imprecise A slow network feeding a high-capacity receiver  congestion control is needed

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Regulating the Sending Rate (3)

Different congestion signals the network may use to tell the transport endpoint to slow down (or speed up)

XCP – eXplicit Congestion Control ECN – Explicit Congestion Notification Feedback: Packet loss retransmit packets at IP layer  transport layer slowing down the sending rate for that flow CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Regulating the Sending Rate (4) This and next slides reports on the results of Chiu and Jain’s 1989 study that contrasts responding to congestion feedback by additive responses (+- 1 Mbps) versus multiplicative responses (+- 10%) If two flows competing for the bandwidth of a single link increase / decrease their bandwidth in the same way when the network signals free/busy they will not converge to a fair allocation + /– constant

+/– percentage Congestion signal sent to both when- ever sum of their use cross the fairness line

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Regulating the Sending Rate (5) Pure additive or multiplicative responses do not converge to fairness. The AIMD (Additive Increase Multiplicative Decrease) control law does converge to a fair and efficient point! • AIMD – additively increase bandwidth allocations but then multiplicatively decrease them when congestion is signaled.

• TCP uses AIMD for this reason (see slides 60 - 66)

bandwidth User User 2’s

User 1’s bandwidth

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Wireless Issues Wireless links lose packets due to transmission errors • Do not want to confuse this loss with congestion • Or connection will run slowly over wireless links! Strategy: • Wireless links use ARQ, which masks errors − Address causes for packet drop at layer to reduce false congestion signals occurring to TCP from IP layer (IP layer not see most drops)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Internet Protocols – UDP

• Introduction to UDP » • Remote Procedure Call » • Real-Time Transport »

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Introduction to UDP (1)

UDP is a connectionless protocol for the Transport Layer UDP () is a shim over IP • Header has ports (TSAPs), length and checksum.

UDP has an optional checksum for extra reliability.

UDP does NOT do flow control, congestion control, or retransmission based upon receipt of a bad segment. If any of these services are needed, then the application will need to do them for itself.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Introduction to UDP (2)

Checksum covers UDP segment and IP pseudo-header • Fields that change in the network are zeroed out • Provides an end-to-end reliability delivery check to help detect mis-delivered packets.

This IP pseudo-header is added to the UDP checksum calculation. TCP uses this same pseudo-header for its checksum calculation.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 RPC (Remote Procedure Call) RPC connects applications over the network with the familiar abstraction of procedure calls • Stubs package parameters/results into a message • UDP with retransmissions is a low-latency transport • Sun’s ONC (Open Network Computing) is a very popular RPC- based product suite (e.g., NFS (), port mapper, …)

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 43 Instructor Slide Addition: UDP RPC Example

• The Network File System (NFS) is one of the Sun ONC RPC (Open Network Computing Remote Procedure Calls) series of protocols. Like other ONC RPC protocols, it was designed to be locally used over LANs and operate with a minimum of latency, in this case to transparently enable distributed file operations as if they were a local-computer only file system. NFS therefore natively uses UDP as its transport layer protocol.

• In the early 1990s NFS also became widely deployed within Boeing’s factories. This proved to be problematic for Boeing because its factories were such an unusually high EMI (electromagnetic interference) environment that its WIRED V2 protocols experienced such high error rates that NFS transmissions often/frequently timed out unsuccessfully (i.e., failed).

• Boeing management tasked the instructor to represent Boeing to Sun Microsystems to resolve this issue. The instructor proposed to Sun that NFS be optionally configurable to also run over TCP protocols. Thus, most deployments would continue to be run over UDP but Boeing’s factories could be optionally configured to use TCP instead.

• Sun was willing to implement this proposed change into their products, which is why NFS today can also be optionally configured to use TCP transports. UDP remains the default.

• As you should expect by this point of the course, NFS deployments using TCP cannot directly interoperate with NFS deployments using UDP. Therefore, servers used in both environments need to be configured to support both transport variants of NFS (dual stacks). Real- (1)

RTP (Real-time Transport Protocol) provides support for sending real-time multimedia over UDP (e.g., voice, video, …) • Often implemented as part of the application

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Real-Time Protocol (2)

RTP header contains fields to describe the type of media and synchronize it across multiple streams • RTCP sister protocol helps with management tasks

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Real-Time Protocol (3)

Buffer at receiver is used to delay packets and absorb jitter so that streaming media is played out smoothly

Packet 8’s network delay is too large for buffer to help Constant rate

Variable rate

Constant rate

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Real-Time Protocol (4) High jitter, or more variation in delay, requires a larger play-out buffer to avoid play-out misses • Propagation delay does not affect buffer size (i.e., the issue affecting human perceptions of streaming media is jitter, not latency)

Buffer

Misses

Question: If propagation delay does not effect buffer size, what does?

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 Instructor Slide Addition: RTCP

• The Real Time Control Protocol (RTCP) is a companion protocol to RTP that is also defined in RFC 3550. • RTCP handles feedback, synchronization, and user interfaces to real time operations. • Instructor was tasked by Microsoft to work within the IETF to develop RTCP to flexibly enable a standards-based user control for the Microsoft Netshow product’s distributed multimedia Internet streaming operations: streaming fast forward, pause, stop, rewind, skip to a predetermined point, etc. It can change resolutions and do a wide-variety of control tasks to the streamed multimedia. Internet Protocols – TCP

The Transmission Control Protocol (TCP) is the Internet’s most popular transport layer connection-oriented protocol. TCP only accepts user data streams from local application processes. All TCP connections are full duplex and Point-to-Point. • The TCP service model » • The TCP segment header » • TCP connection establishment » • TCP connection state modeling » • TCP sliding window » • TCP timer management » • TCP congestion control »

Question: Can TCP be used to convey traffic?

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 The TCP Service Model (1)

TCP provides applications with a reliable byte stream between processes; it is the workhorse of the Internet • Port Registry: https://www.ietf.org/assignments/service-names- port-numbers/service-names-port-numbers.xml • Popular servers run on well-known ports

Ports are used by all Transport protocols as well as security Firewalls. CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 The TCP Service Model (2)

Applications using TCP see only the byte stream [right] and not the segments [left], which are sent as separate IP packets

Four segments, each with 512 bytes 2048 bytes of data of data and carried in an IP packet delivered to application in a single READ call

A TCP connection is a byte stream, not a message stream. TCP breaks the application’s data stream into pieces not exceeding 64 KB – in practice often 1460 data bytes or less in order to fit into a single Ethernet frame. Further fragmentation may be done at the IP Layer.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 The TCP Segment Header

TCP header includes addressing (ports), sliding window (seq. / ack. number), flow control (window), error control (checksum) and more. See textbook pages 557 – 560.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Connection Establishment TCP is a connection-oriented protocol. This means that it has an explicit connection setup, which is implemented as a 3-way handshake. TCP sets up connections with the three-way handshake • Release is symmetric, also as described before • Note the SYN Flood attack (DoS) vulnerability described on page 561 of the Textbook (solution: listening process must remember seq #, e.g., SYN cookies)

Normal case Simultaneous connect

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Connection State Modeling (1)

The TCP connection finite state machine has more states than our simple example from earlier.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Connection State Modeling (2)

Solid line is the normal path for a client.

Dashed line is the normal path for a server.

Light lines are unusual events.

Transitions are labeled by the cause and action, separated by a slash.

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Sliding Window (1)

TCP adds flow control to the sliding window as explained previously (page 522+)

• i.e., TCP window mgmt decouples (by sending both ACK and WIN) the issues of error control and flow control. • ACK + WIN is the sender’s transmission limit − WIN indicates receiver’s current buffer availability

Example of a TCP Window Mgmt where a receiver has a 4096-byte buffer CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Sliding Window (2) – a Detail Need to add special cases to avoid unwanted behavior • E.g., silly window syndrome [e.g., one byte of data shown below] • Solution: Receiver should not send window update until it can handle the max advertised segment size

Silly window syndrome: Receiver application reads single bytes, so sender always sends one byte segments Question: Are there popular applications that usually send only one byte of data? CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Timer Management – a detail TCP estimates retransmit timer values from segment RTTs • Timeout value sensitive to variance in RTT as well as the smoothed RTT − Tracks both average and variance (for Internet case) • Timeout is set to average plus 4 x variance

LAN case – small, Internet case – regular RTT large, varied RTT Detail: actually are 4 TCP timers – see pages 570-571 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (1) TCP uses AIMD (Additive Increase Multiplicative Decrease) with loss signal to control congestion • AIMD was previously covered on page 537 of textbook; slide 38 • Review: AIMD – additively increase bandwidth allocations but then multiplicatively decrease them when congestion is signaled − Implemented as a congestion window (cwnd) for the number of segments that may be in the network • However, TCP actually uses several mechanisms that work together for congestion control:

Name Mechanism Purpose ACK clock Congestion window (cwnd) Smooth out packet bursts Slow-start Double cwnd each RTT Rapidly increase send rate to reach roughly the right level Additive Increase cwnd by 1 packet Slowly increase send rate to Increase each RTT probe at about the right level Fast Resend lost packet after 3 Recover from a lost packet retransmit duplicate ACKs; send new without stopping ACK clock / recovery packet for each new ACK CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (2) Congestion window (cwnd) controls the sending rate − cwnd = number of bytes the sender may have in the network at any given time » Question: how can the sender know its cwnd? − Segment Sending Rate = cwnd / RTT − window can stop sender quickly either due to congestion or flow control » Effective window is the smaller of what either the sender or receiver thinks is the situation − TCP congestion control Leverages the 4 techniques listed on prev slide: 1. ACK clock (regular receipt of ACKs) paces traffic and smoothes out sender bursts

ACKs pace new segments into the network and smooth bursts CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (3) 2. Slow start grows congestion window exponentially • Doubles every RTT while keeping ACK clock going • Mechanism to rapidly increase the send rate to quickly find the approximate maximum level

Increment cwnd for each new ACK

CWND = congestion window CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (4)

To keep slow start under control, the sender keeps a threshold for the connection called the slow start threshold. Once threshold crossed, TCP switches from slow start to additive ACK increase.

3. Additive increase grows cwnd slowly (e.g., TCP Tahoe) • Adds 1 every RTT • Conforms to the ACK clock rate

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (5) Slow start followed by additive increase (TCP Tahoe) at the slow start threshold − The threshold is half of the cwnd (congestion window) of previous data loss − The connection spends much of its time with a reasonable cwnd when slow-start and additive increase are combined. • Issue: we have to start over after packet loss because by the loss is detected (e.g., 3 duplicate ACK # in ACKs received) the ACK clock has stopped.

Loss causes timeout; ACK clock has stopped so slow-start again

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (6)

However, with fast recovery, we get the classic sawtooth (TCP Reno) − Retransmit lost packet after 3 duplicate ACKs − New packet for each dup. ACK until loss is repaired

The ACK clock doesn’t stop, so no need to slow-start

65 CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 TCP Congestion Control (7) Much of TCP’s complexity comes from inferring from a stream of duplicate ACKs which packets have arrived and which have been lost. SACK fixes this. SACK (Selective ACKs) extend ACKs with a vector to describe received segments and hence losses • Selectively ACKs up to three ranges of bytes for arrived segments • Allows for more accurate retransmissions / recovery

No way for us to efficiently know that 2 and 5 were lost with only ACKs

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011 End

Chapter 6

CN5E by Tanenbaum & Wetherall, © Pearson Education-Prentice Hall and D. Wetherall, 2011