<<

The : Tutorial and Survey SAMI IREN and PAUL D. AMER University of Delaware AND PHILLIP T. CONRAD Temple University

Transport layer protocols provide for end-to-end communication between two or more hosts. This paper presents a tutorial on transport layer concepts and terminology, and a survey of transport layer services and protocols. The transport layer protocol TCP is used as a reference point, and compared and contrasted with nineteen other protocols designed over the past two decades. The service and protocol features of twelve of the most important protocols are summarized in both text and tables.

Categories and Subject Descriptors: C.2.0 [Computer-Communication Networks]: General—Data communications; Open System Interconnection Reference Model (OSI); C.2.1 [Computer-Communication Networks]: and Design—Network communications; Packet-switching networks; Store and forward networks; C.2.2 [Computer-Communication Networks]: Network Protocols; Protocol architecture (OSI model); C.2.5 [Computer- Communication Networks]: Local and Wide-Area Networks General Terms: Networks Additional Key Words and Phrases: Congestion control, , transport protocol, transport service, TCP/IP

1. INTRODUCTION work of routers, bridges, and communi- cation links that moves information be- In the OSI 7-layer Reference Model, the tween hosts. A good transport layer transport layer is the lowest layer that service (or simply, transport service) al- operates on an end-to-end basis be- lows applications to use a standard set tween two or more communicating of primitives and run on a variety of hosts. This layer lies at the boundary networks without worrying about differ- between these hosts and an - ent network interfaces and reliabilities.

This work was supported, in part, by the National Science Foundation (NCR 9314056), the U.S. Army Research Office (DAAL04-94-G-0093), and the Adv Telecomm/Info Dist’n Research Program (ATIRP) Consortium sponsored by ARL under Fed Lab Program, Cooperative Agreement DAAL01-96-2-0002. Authors’ addresses: S. Iren and P. D. Amer, Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716; P. T. Conrad, Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 2000 ACM 0360-0300/99/1200–0360 $5.00

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 361

CONTENTS (TCP) — and then contrast TCP with many alternative designs. 1. Introduction 2. Transport Layer Concepts and Terminology Section 2 introduces the basic con- 2.1 Introduction to TCP cepts and terminology of the transport 2.2 General Role of the Transport Layer layer through a simple example illus- 2.3 Terminology: SDUs, PDUs, and the like trating a TCP connection. Section 3 sur- 2.4 Example TCP Connection, Step-by-Step 2.5 What this example shows. . . and does not show veys the range of different services that 3. Transport Service can be provided by a transport layer. 3.1 CO-message vs. CO-byte vs. CL Similarly, Section 4 surveys the range 3.2 Reliability of protocol designs that provide these 3.3 Blocking vs. Non-Blocking 3.4 vs. Unicast services. (The important distinction be- 3.5 Priority vs. No-priority tween service and protocol is a major 3.6 Security vs. No-security theme throughout this paper.) Section 5 3.7 Status-reporting vs. No-status-reporting briefly surveys nine widely imple- 3.8 Quality-of-service vs. No-quality-of-service 4. Transport Protocol Features mented transport protocols other than 4.1 Connection-oriented vs. Connectionless TCP (UDP, TP0, TP4, SNA-APPN, 4.2 Transaction-oriented DECnet-NSP, ATM, XTP, T/TCP and 4.3 CO Protocol Features RTP) and two others that, although not 4.4 Error Control: Sequence Numbers, Acks, and Retransmissions widely implemented, have been particu- 4.5 Flow/Congestion Control larly influential (VMTP and NETBLT). 4.6 /Demultiplexing This section also includes briefer de- 4.7 Splitting/Recombining scriptions of eight experimental proto- 4.8 Concatenation/Separation cols that appear in the research litera- 4.9 Blocking/Unblocking 4.10 Segmentation/Reassembly ture (Delta-t, MSP, SNR, DTP, k-XP, 5. Transport Protocol Examples TRUMP, POC, and TPϩϩ). Section 6 5.1 UDP concludes the paper with an overview of 5.2 TP4 past, current, and future trends that 5.3 TP0 5.4 NETBLT have influenced transport layer design 5.5 VMTP including the impact of wireless net- 5.6 T/TCP works. This section also presents a few 5.7 RTP of the debates concerning transport pro- 5.8 APPN (SNA) 5.9 NSP (DECnet) tocol design. As an appendix, tables are 5.10 XTP provided summarizing TCP and eleven 5.11 SSCOP/AAL5 (ATM) of the transport protocols discussed in 5.12 Miscellaneous Transport Protocols Section 5. Similar tables for the experi- 6. Future Directions and Conclusion 6.1 Impacts of Trends and New Technologies mental protocols are omitted for reasons 6.2 Wireless Networks of space, but are available on the au- 6.3 Debates thors’ Web site: www.eecis.udel.edu/ 6.4 Final Observations ˜amer/PEL/survey/. APPENDIX This survey concentrates on unicast service and protocols — that is, commu- nication between exactly two hosts (or Essentially, the transport layer isolates two host processes). Multicast protocols applications from the technology, de- [Armstrong et al. 1992; Bormann et al. sign, and idiosyncracies of the network. 1994; Braudes and Zabele 1993; Deer- Dozens of transport protocols have ing 1989; Floyd et al. 1995; McCanne et been developed or proposed over the last al. 1996; Smith and Koifman 1996] pro- two decades. To put this research in vide communication among n Ն 2 perspective, we focus first on the fea- hosts. Multicast represents an impor- tures of probably the most well-known tant research area currently undergoing transport protocol — namely the Inter- significant change and development, net’s Transmission Control Protocol and is worthy of a separate survey.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 362 • S. Iren et al.

A previous study surveying eight years of thought about transport proto- transport protocols can be found in Do- col design. eringer et al. [1990]. A final reason that TCP provides a good starting point for study, is that the 2. TRANSPORT LAYER CONCEPTS AND history of research and development on TERMINOLOGY TCP can be traced in publicly available documents. Ongoing research and de- From an application programmer’s per- velopment of transport protocols, partic- spective, the transport layer provides ularly TCP, is the focus of two working interprocess communication between groups of the Internet Society. The two processes that most often are run- end2end working group of the Internet ning on different hosts. This section in- Research Task Force (IRTF, www.irtf. troduces some basic transport layer con- org) discusses ongoing long-term re- cepts and terminology through an search on transport protocols in general example: a simple document retrieval (including TCP), while the tcp-impl over the World Wide Web (herein Web) group of the Internet Engineering Task utilizing the TCP transport protocol. Force (IETF, www.ietf.org) focuses on short-term TCP implementation issues. 2.1 Introduction to TCP Both groups maintain active mailing lists where ideas are discussed and de- Although we provide a broad survey of bated openly. The work of these groups the transport layer, the service and pro- can be found in journal articles, confer- tocol features of TCP are used through- ence proceedings, and documents out this paper as a point of reference. known as Internet Drafts and Requests Over the last two decades the Inter- for Comments (RFCs). RFCs contain not net protocol suite (also called the only all the Internet Standards, but also TCP/IP protocol suite) has come to be other information of historical and tech- the most ubiquitous form of computer nical interest. It is much more difficult networking. Hence, the most widely for researchers to obtain similar infor- used transport protocols today are TCP mation concerning proprietary proto- and its companion transport protocol, cols. the User Protocol (UDP). A few other protocols are widely used, 2.2 General Role of the Transport Layer mainly because of their connection to the proprietary protocol suites of partic- To illustrate the role that the transport ular vendors. Examples include the layer plays in a familiar application, the transport protocols from IBM’s SNA, remainder of Section 2 examines the and Digital’s DECnet. However, the role of TCP in a simple interaction over success of the Internet has led nearly all the Web. vendors in the direction of TCP/IP as The Web is an example of a client/ the future of networking. server application. A human interacts The Internet’s marked success would with a Web browser (client) running on not alone be sufficient justification for a “local” machine. The Web browser organizing a survey around a single pro- communicates with a server on some tocol. Also important is that TCP pro- “remote” machine. The Web uses an ap- vides examples of many significant is- plication layer protocol called the Hy- sues that arise in transport protocol pertext Transfer Protocol (HTTP) [Bern- design. The design choices made in TCP ers-Lee et al. 1996]. HTTP is a simple have been the subject of extensive re- request/response protocol. Suppose, for view, experimentation, and large-scale example, that you have a personal com- experience, involving the best research- puter with Internet access, and you ers and practitioners in the field. TCP wish to retrieve the page “http://www. represents the culmination of many eecis.udel.edu/research.html”

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 363 from the University of Delaware Web function for closing a connection. Note site. In the simplest case,1the client that while TCP is connection-oriented, sends a request containing the filename not all transport services establish a of the desired Web page (“GET /re- connection before data is sent. Connec- search.html”) to a server (“www. tion-oriented and connectionless ser- eecis.udel.edu”), and the server vices and protocols are discussed in Sec- sends back a response consisting of the tions 3.1, 3.2.5 and 4.1. contents of that file. This communication takes place over 2.3 Terminology: SDUs, PDUs, and the like a complex internetwork of computers that is constantly changing in terms of One difficulty in summarizing any topic both technology and topology. A connec- is the wide range of terms used for tion between two particular hosts may similar concepts. Throughout this pa- involve such diverse technologies as per, we use a simplified communication , Token Ring, X.25, ATM, PPP, model (Figure 1) that employs some OSI SONET, just to name a few. However, a terminology. At the top layer, a user programmer writing a Web client or sender (e.g., a Web client) has some server does not want to be concerned messages to communicate to the user with the details of how communication receiver (e.g., a Web server). These so- takes place between client and server. called application entities use the ser- The programmer simply wants to send vice of the transport layer. Communica- and receive messages in a way that does tion between peer entities consists of an not change as the underlying network exchange of Protocol Data Units changes. This is the function of the (PDUs). Application peers communicate transport layer: to provide an abstrac- using Application PDUs (APDUs), while tion of interprocess communication that transport peers communicate using is independent of the underlying net- Transport PDUs (TPDUs), etc. In our work. Web example, the first APDU is the HTTP uses TCP as the transport request “GET /research.html” sent layer. The programmer writing code for from the client (application entity) to an HTTP client or server would access the server (its peer application entity). TCP’s service through function calls The Web server will respond with an that comprise that transport layer’s Ap- APDU containing the entire text of the plication Program Interface (API). At a file “research.html”. minimum, a transport layer API pro- Many transport and application pro- vides functions to send and receive mes- tocols are bidirectional; that is, both sages; for example, the Berkeley Sockets sides can send and receive data simulta- API provides functions called write() neously. However, it is frequently use- and read() (for more details, see ful to focus on one direction while re- Stevens [1998]). maining aware that the other direction Because TCP is connection-oriented, is also operational. As Figure 1 shows, the Berkeley Sockets API also provides each application entity can assume both a connect() function for setting up a the role of sender and receiver; for the connection between the local and remote APDU “GET /research.html”, the cli- processes. It also provides a close() ent is the user sender and the server is the user receiver (as shown by more 1To simplify the discussion, we will assume HTTP prominent labels). When the APDU con- version 0.9 and a document containing only hyper- taining the contents of the file “re- text: no inline images, applets, etc. This avoids search.html” is sent, user sender and discussion of HTTP 1.0 headers, persistent con- user receiver reverse roles (as indicated nections as in HTTP 1.1, (which complicate the issue of how and when the connection is closed,) by the dotted line boxes, and the lighter and the necessity for multiple connections where italicized labels). inline images are involved. The term transport entity refers to the

ACM Computing Surveys, Vol. 31, No. 4, December 1999 364 • S. Iren et al.

Host A Host B APDUs User Sender User Receiver Application (Receiver) (Sender) Entities Application (e.g. web client) Application (e.g. web server) TSAP Submit Deliver User/Transport Interface TPDUs Transport Transport Sender Transport Receiver Entities (Receiver) (Sender) NSAP Transmit/Send Arrive/Receive Transport/Network Interface

Network

Figure 1. Transport service. hardware and/or software within a on its way from the Web client to the given host that implements a particular Web server. When the user sender sub- transport service and protocol. Again, mits the request APDU to the transport even when the protocol is bidirectional, sender, that APDU becomes a TSDU. we focus on one direction for purposes of The transport sender adds its own clarity. In this model, the user sender header information to the TSDU, to con- submits a chunk of user data (i.e., a struct a TPDU that it can send to the Transport Service Data Unit (TSDU), or transport receiver. TPDUs exchanged informally, a message)tothetransport by the transport entities are encapsu- sender. The transport sender transmits lated (i.e., contained) in NPDUs which or sends this data to the transport re- are exchanged between the network en- ceiver over a network which may provide tities, as illustrated in Figure 2. The different levels of reliability. The trans- routes NPDUs between port receiver receives the data that ar- the local and remote network entities rives from the network and delivers it to over intermediate links. When an the user receiver. Note that even when NPDU arrives, the network layer entity one transport entity assumes the role of processes the NPDU header and passes sender and the other assumes the role the payload of the NPDU to a transport of receiver, we use solid lines to show layer entity. The transport entity either TPDUs flowing in both directions. This passes the payload of the TPDU to the illustrates that TPDUs may flow in both transport user if it is user data, or pro- directions even when user data flows cesses the payload itself if it is a control only from sender to receiver. TPDUs TPDU. from receiver to sender are examples of In the previous paragraph we de- control TPDUs, which are exchanged scribe a single APDU becoming a single between transport entities for connec- TSDU, being encapsulated in a single tion management. When the flow of TPDU, which in turn becomes a single user data is bidirectional, control and NSDU encapsulated in a single NPDU. data information can be piggybacked,as This is the simplest case, and one that discussed in Section 4. Control TPDUs is likely to occur for a small APDU such may flow in both directions between as the HTTP request in our example. sender and receiver, even in the absence However, there are many other possibil- of user data. ities for the relationships between AP- Figure 2 shows the terminology we DUs, TSDUs, TPDUs, NSDUs, and NP- use to describe what happens to the DUs, as described in Step (5) of Section request APDU “GET /research.html” 2, and also in Sections 3 and 4. as it passes through the various layers Figure 2 also shows some of the ter-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 365

Application (or Session) APDU (e.g., "GET /research.html") Layer

TSDU Transport TPDU Layer header TPDU (e.g., TCP segment)

NSDU Network NPDU header Layer NPDU (e.g., IP Datagram)

Figure 2. Relationship between Service Data Units (SDUs) and Protocol Data Units (PDUs).

minology commonly used in the Internet ber that will be used for the con- protocol suite to identify PDUs at vari- nection (80, in this case). Port ous layers. TCP’s PDUs are called seg- numbers differentiate between ments, while IP’s NPDUs are called multiple connection points on the . However, the Internet com- same machine; for example, be- munity’s terminology can be confusing. tween servers for Web, File Trans- For example, the term packet is often fer Protocol (FTP) and (re- used informally to refer to both IP Data- mote terminal) requests. grams (NPDUs) and TCP segments (TP- —“www.eecis.udel.edu” indicates DUs). The term datagram can refer ei- the host name to which a TCP ther to IP’s NPDUs, or to the TPDUs of connection is established. This is the . To avoid converted to the IP address such confusion, we make use of some 128.175.2.17 by the Domain OSI terms throughout the paper. Al- Name System (which is outside though some may criticize OSI termi- the scope of this discussion). The nology as being cumbersome, it often combination of IP address and TCP has the advantage of being more pre- port number (128.175.2.17, 80) cise. represents what OSI calls a TSAP address. A TSAP (Transport Ser- 2.4 Example TCP Connection, Step-by- vice Access Point) address identi- Step fies one endpoint of a communica- Now let us consider the Web document tion channel between a process on retrieval example, one step at a time. a local machine, and a process on This discussion omits many details; the a remote machine. /research.html purpose here is to provide a general —“ ” is the file you understanding of the transport layer in are requesting; the Web client the context of a well-known application. uses this to form the HTTP request (APDU) “GET /research.html” (1) You select “http://www.eecis. that must be sent to the Web udel.edu/research.html” with server via TCP. your Web client. —http indicates the application (2) The Web client starts by making a layer protocol to be used; more connection request to the transport importantly from a transport entity at (128.175.2.17, 80) by layer perspective, it also implic- calling the connect() function. itly indicates the TCP port num- This causes the local-client TCP en-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 366 • S. Iren et al.

tity (or simply, the “local TCP”) to then sends these TSDUs to the local initiate a 3-way-handshake with the TCP in multiple TPDUs, utilizing remote-server TCP entity (or sim- the service of the underlying net- ply, the “remote TCP”). TPDUs are work layer. exchanged between the TCP entities One interesting aspect of TCP is to ensure reliable connection estab- that the sizes of the TSDUs submit- lishment, and to establish initial se- ted by the transport entities may quence numbers to be used by the bear little or no relationship to the transport layer (see Figure 3, and sizes of the TPDUs actually ex- Section 4). If the 3-way-handshake changed by TCP. TCP treats the fails, TCP notifies the application by flow of data from sender to receiver returning an error code as the result as a byte-stream and segments it in of the connect() function; other- whatever manner seems to make wise a success code is returned. In best use of the underlying network this way, TCP provides what OSI service, without regard to TSDU calls confirmation of the connect re- boundaries. It delivers data in the quest. same fashion; thus for TCP, the boundaries between APDUs, sub- (3) Once the connect request is con- mitted TSDUs, TPDUs, and deliv- firmed, the Web client submits a ered TSDUs may all be different. request to send data (in this case Other transport services/protocols the APDU “GET /research.html”). have a message orientation, mean- The local TCP then sends this data ing that they preserve the bound- — most likely, in a single TPDU. aries of TSDUs (see Section 3). This TPDU (i.e., TCP segment) con- tains the service user’s data (the (6) Because the network layer (IP in TSDU), plus a transport layer this case) can lose or reorder NS- header, containing (among other DUs, TCP must detect and recover things) the negotiated initial se- from network errors. As the remote quence number for the data. The TCP sends the TPDUs containing purpose of this sequence number is the successive portions of the file discussed in Step (2). “research.html”, it includes a se- (4) When the remote TCP receives the quence number in each TPDU corre- TPDU, the data “GET /re- sponding to the sequence number of search.html” is buffered. It is de- the first byte in that TPDU relative livered when the Web server does a to the entire flow of data from re- read(). In OSI terminology, this mote TCP to local TCP. The remote delivery is known as a data indica- TCP also places a copy of the data in tion. The remote TCP also sends each TPDU sent into a buffer, and back an acknowledgment (ACK) con- sets a timer. If this timer expires trol TPDU to the local TCP. Ac- before the remote TCP receives a knowledgments inform the trans- matching ACK TPDU from the local port sender about TPDU arrivals TCP, the data in the TPDU will be (see Section 4). retransmitted in a new TPDU. In TCP, the data are tracked by indi- (5) The Web server then responds with vidual byte-stream sequence num- the contents of “research.html”. bers, and TPDUs retransmitted may This file may be too large to be or may not correspond exactly to the efficiently submitted to TCP in one original TPDUs. The remote TCP write() call, i.e., one TSDU. If so, also places a checksum in the TPDU the Web server divides this APDU header to detect bit errors intro- into multiple write() calls, i.e., duced by noise or software errors in multiple TSDUs. The remote TCP the network, and physical

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 367

layers. Such error control is a major change of a disconnect TPDU and function of the transport layer as ACK. This procedure and other discussed in Section 4. methods of connection termination are discussed in Section 4.3.4. (7) As TPDUs are received by the local TCP, any TPDUs with checksum er- 2.5 What this example shows. . . and does rors are discarded. The sequence not show numbers of the incoming TPDUs are checked to ensure that no pieces of From this example, we see that TCP the byte-stream are missing or have provides a connection-oriented (CO) arrived out-of-order. Any out-of-or- byte-stream, no-loss, no-duplicates, or- der pieces will be buffered and re- dered transport service, and that its ordered. As TPDUs arrive, the local protocol is designed to operate on top of TCP responds to the remote TCP a lossy , such as IP with ACK TPDUs. If ACKs are lost, datagram service [Postel 1981]. the remote TCP may send a TPDU This example also shows a few as- which duplicates data sent in an pects of communication between the earlier TPDU. The local TCP also user layer and the transport layer, and discards any duplicate data that ar- between peer entities. It illustrates one rives. By detecting and recovering example of connection establishment, from loss, reordering, duplication reliable data exchange, and connection and noise, the local and remote TCP termination. However, many aspects of entities cooperate to ensure reliable this example were not discussed, such delivery of data. As pieces of the as flow control and congestion avoid- byte-stream arrive correctly, these ance, piggybacking of ACKs, and round- will be buffered until the Web client trip-time estimation, to name just a few. requests them by doing read() Also, consider that in the case of a calls. Each read() results in deliv- Web browser using TCP, the transport ery of a TSDU to the Web client. service is reliable, which means (among other things) that the transport service (8) In general, a TCP connection is bidi- may not lose any TSDUs. This guaran- rectional, and either side may ini- tee comes at the expense of potentially tiate the closing of the connection. increased delay, since reliable delivery However, in first-generation Web may require more TPDUs to be ex- systems, the server initiates the changed. Other transport services (for close by calling the close() func- example, those used for audio or video tion after it sends the last piece of transmission) do not provide reliability, the file requested. In OSI terminol- since for some applications lower delay ogy, this is called a disconnect re- is more important than receiving every quest. TCP handles disconnect re- TSDU. quests with a 4-way-handshake Many variations exist both in the type procedure (shown in Figure 4) to of service provided by the transport ensure graceful termination without layer, and in the design of transport loss of data. The remote TCP re- protocols. These variations are what sponds to the server’s disconnect re- make the transport layer so interesting! quest by sending a disconnect TPDU Sections 3 and 4 provide an overview of and waiting for an ACK TPDU. The the most important of these variations. local TCP signals the server’s dis- connect request to the client by re- 3. TRANSPORT SERVICE turning “end-of-file” as the result of a client read() request. The client This section classifies the typical service then responds with its own close() provided by the transport layer (see Ta- request; this causes another ex- ble I). A transport service abstracts a

ACM Computing Surveys, Vol. 31, No. 4, December 1999 368 • S. Iren et al.

Table I. Transport Service Features Once a connection is established, the CO-message vs. CO-byte vs. CL user sender provides the transport No-loss vs. Uncontrolled-loss vs. Controlled-loss sender with the data to be transmitted No-duplicates vs. Maybe-duplicates (T-Data). At the end of the data trans- Ordered vs. Unordered vs. Partially-ordered Data-integrity vs. No-data-integrity vs. Partial- fer, the user sender explicitly asks the data-integrity transport sender to terminate the con- Blocking vs. Non-blocking nection (T-Disconnect). Multicast vs. Unicast CO service itself has two variations: Priority vs. No-priority message-oriented and byte-stream.In Security vs. No-security Status-reporting vs. No-status-reporting the former, the user sender’s messages Quality-of-service vs. No-quality-of-service (or what OSI calls service data units (SDUs)) have a specified maximum size, and message boundaries are preserved. When two 1K messages are submitted set of functions that is provided to a by a user sender, they are delivered to higher layer. A protocol, on the other the user receiver as the same two dis- hand, refers to the details of how a tinct 1K messages, never for example as transport sender and a transport re- one 2K message or four .5K messages. ceiver cooperate to provide that service. TP4 is an example that provides mes- In defining a service, we treat the trans- sage-oriented service. In byte-stream port layer as a black box. In Section 4, service as provided by TCP, the flow of we look at internal details of that black data from user sender to user receiver is box. viewed as an unstructured sequence of One of the valuable contributions of bytes that flow in a FIFO manner. the OSI Reference Model is its emphasis There is no concept of message (or mes- in distinguishing between service and sage boundary or maximum message protocol. This distinction is not well dif- size) for the user sender. Data submit- ferentiated in the TCP/IP protocol suite ted by a user sender is appended to the where RFCs intermix service and proto- end of the byte-stream, and the user col concepts, frequently in a confusing receiver reads from the head of the manner. As will be seen in Section 4, byte-stream. there are often several different protocol An advantage of byte-stream service mechanisms that can provide a given is that it gives the transport sender service. flexibility to divide TSDUs in ways that make better use of the underlying net- 3.1 CO-message vs. CO-byte vs. CL work service. However, byte-stream ser- Transport services can be divided into vice has been criticized because it does two types: connection-oriented and con- not deliver data to an application in nectionless. A connection-oriented (CO) meaningful units. A message-oriented service provides for the establishment, service better preserves the semantics maintenance, and termination of a logi- of each APDU down through the lower cal connection between transport users. layers for so-called Application Level The transport user generally performs Framing [Clark and Tennenhouse 1990] three distinct phases of operation: con- (see Section 6). One research team notes nection establishment, data transfer, that adaptive applications are much and connection termination. When a easier to develop if the unit of process- user sender wants to transmit some ing (i.e., APDU) is strongly tied to the data to a remote user receiver by using unit of control (i.e., TPDU) [Dabbous a connection-oriented service, the user and Diot 1995]. sender first explicitly asks the transport A connectionless (CL) service provides sender to open a connection to the re- only one phase of operation: data trans- mote transport receiver (T-Connect). fer. There are no T-Connect and T-Dis-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 369 connect primitives exchanged (either ex- ple, UDP provides an uncontrolled-loss plicitly or implicitly) between a user service. Any data submitted to UDP by sender and the transport sender. When a user sender may fail to be delivered to the user sender has some data, it simply the user receiver without the user submits the data to the transport sender being notified. Analogously, the sender. As with CO-message service, in default service of the United States a CL service the user sender submits Postal Service is an uncontrolled-loss messages, and message boundaries are service. A user sender may not be noti- maintained. Messages submitted to a fied if a letter is not delivered to the CL service as in UDP often are referred user receiver. to as datagrams. A controlled-loss service lies in be- tween no-loss and uncontrolled-loss. 3.2 Reliability Loss may occur, but there is control over the degree of loss. Several varia- Unfortunately, the terms reliable and tions of a controlled-loss service are pos- unreliable mean different things to dif- sible. For example, a user sender might ferent people. To avoid ambiguity in our request different loss levels for different definition, unless otherwise stated, a messages. The k-XP protocol [Amer et service is reliable if and only if it is all al. 1997; Marasli et al. 1996] supports of the following (as defined below): no- three controlled-loss service classes: loss, no-duplicates, ordered, and data- integrity. If even one feature is lacking, ● reliable messages will be retransmit- the service is unreliable. ted (if needed) until successfully de- 3.2.1 No-loss vs. Uncontrolled-loss vs. livered to the user receiver; Controlled-loss. There are three levels ● of delivery guarantee. A no-loss (or at- partially reliable messages will be re- least-once delivery) service guarantees transmitted (if needed) at most k that for all data submitted to the trans- times and then dropped if unsuccess- port sender by the user sender, one of ful, where k is a user sender parame- two results will occur. Either (1) the ter; and 2 data is delivered to the user receiver, ● or (2) the user sender is notified that unreliable messages will be transmit- some data may not have been delivered. ted only once. It never occurs that a user sender be- In the TRUMP protocol [Golden 1997], a lieves some data was delivered to the user sender assigns a time limit to each user receiver when in fact it was not. message. The transport layer does its For example, TCP provides a no-loss best to deliver the message before the service using disconnection as a means time limit; if unsuccessful, the message of notifying a user sender that data may is discarded. not have been delivered.3 An uncontrolled-loss (or best-effort) service does not provide the above as- 3.2.2 No-duplicates vs. Maybe-dupli- surance for any of the data. For exam- cates.Ano-duplicates (or at-most-once delivery) service (e.g., TCP) guarantees that all data submitted to the transport 2Depending on the data-integrity service defined sender will be delivered to the user re- in Section 3, delivered data may or may not con- ceiver at most once. tain bit errors. A maybe-duplicates service (e.g., 3Strictly speaking, for the most common TCP API, UDP) does not provide the above guar- namely the Berkeley Sockets API, this is only true antee. Efforts by the protocol may or if the application specifies the SOLINGER socket option; by default, it is possible for a TCP may not be made to avoid delivering connection to lose data without notifying the appli- duplicates, but delivery of duplicates cation during connection teardown [Stevens 1998]. may occur nevertheless.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 370 • S. Iren et al.

3.2.3 Ordered vs. Unordered vs. Par- ceptable as a means of achieving higher tially-ordered.Anordered service (e.g., . A real-time multimedia ap- TCP) preserves the user sender’s sub- plication might request the transport mission order of data when delivering it receiver to tolerate certain levels of bit to the user receiver. It never occurs that errors when errored data could be use- a user sender submits two pieces of ful to the user receiver [Han and Mess- data, first A, then B, and A is delivered erschmitt 1996]. For example, a Web after B is delivered. browser retrieving an image might pre- An unordered service (e.g., UDP) does fer to progressively display a partially not provide the guarantee above. While damaged image while waiting for the the service may try to preserve the or- retransmission and delivery of the cor- der of data when delivering it to the rect data (assuming the decompression user receiver, no guarantee of preserva- algorithm can process damaged image tion-of-order is made. data.) A partially-ordered service guaran- 3.2.5 Remarks on Reliability and CO tees to deliver pieces of data in one of a vs. CL. Note that all four aspects of set of permitted orders as predefined by reliability — loss, duplicates, order, and a partial order relation agreed upon by data-integrity — are orthogonal; they the user sender and user receiver. Par- are independent functions. For example, tially-ordered service is useful for appli- ordered service does not imply no-loss cations such as multimedia communica- service; some portion of the data might tion or distributed databases where get lost while the data that is delivered delivery of some objects is constrained is guaranteed to be in order. by others having been delivered, yet A wide range of understanding (i.e., independent of certain other objects ar- confusion) exists in the relationship be- riving [Connolly et al. 1994]. POC is an tween a service being CO or CL and example protocol providing partially-or- whether or not it is reliable. These two dered service [Conrad et al. 1996]. services are orthogonal, yet so many 3.2.4 Data-integrity vs. No-data-in- people presume a CO service is reliable. tegrity vs. Partial-data-integrity.Ada- Why? In the past when the number of ta-integrity service ensures with high available services and protocols was probability that all data bits delivered smaller, and applications were fewer to a user receiver are identical to those and less diverse in their service needs, originally submitted. The actual proba- reasoning about connections and reli- bility achieved in practice depends on ability was simple: the strength of the error detection method. Good discussions of error detec- Whereas: TCP service is CO and TCP tion methods can be found in Lin and service is reliable, Costello [1982], and McNamara [1998]. TCP uses a 16-bit checksum for data- Whereas: TP4 service is CO and TP4 integrity; its goodness is evaluated in service is reliable, Partridge et al. [1995]. A no-data-integrity service does not Whereas: X.25 service is CO and X.25 provide any guarantees regarding bit service is reliable, errors. Any number of bit errors may Therefore: CO service ϵ reliable ser- occur in the delivered data. UDP may or vice. may not be configured to use a check- sum; in the latter case, UDP provides a Whereas: UDP service is CL and no-data-integrity service. UDP service is unreliable, A partial-data-integrity service allows a controlled amount of bit errors in de- Therefore: CL service ϵ unreliable livered data. This service may be ac- service.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 371

We emphasize that these equiva- receiver(s). Applications such as real- lences do not hold! It is quite reason- time news and information broadcast- able to design a transport layer provid- ing, teleconferencing, and remote col- ing CO service that is not reliable (e.g., laboration can take advantage of a provides controlled-loss rather than no- multicast service to reach multiple des- loss service, or unordered rather than tinations. Depending on the reliability ordered service), say to support best- of the service, delivery to each user re- effort compressed-image transmission ceiver may be no-loss vs. uncontrolled- [Amer et al. 1999; Iren and Amer 2000; loss vs. controlled-loss, no-duplicates vs. Iren 1999]. Similarly, a transport layer maybe-duplicates, etc. The use of a mul- can provide a CL service that is reliable ticast service influences many of the if an underlying network layer is reli- protocol features to be discussed in Sec- able. tion 4. A unicast service limits the data sub- 3.3 Blocking vs. Non-Blocking mitted by a user sender to be delivered to exactly one user receiver. A blocking service ensures that the (As stated earlier, although the tables transport layer is not overwhelmed with in the appendix indicate which protocols incoming data. In this case, a user surveyed offer multicast service, this sender submits data to the transport paper does not survey or discuss multi- sender and then waits for the transport cast protocols.) sender to signal that the user sender can resume processing. A blocking ser- 3.5 Priority vs. No-priority vice provides flow control between user sender and transport sender. A priority service enables a user sender A non-blocking service allows the user to indicate the relative importance of sender to submit data and continue pro- various messages. A user of priority ser- cessing without awaiting the transport vice may expect higher priority data to sender’s ok. A non-blocking service does be delivered sooner, if possible, than not take into account the transport lay- lower priority data, even at the expense er’s buffering capabilities, or the rate at of slowing down data submitted earlier. which the user receiver consumes data. When combined with uncontrolled-loss The user sender submits data at any or controlled-loss service, a priority ser- rate it chooses, possibly resulting in ei- vice may drop lower priority data when ther data loss or notification to the user necessary, thereby allowing the delivery sender that data cannot be submitted of higher priority data with smaller de- right now. lay and/or higher probability. Priority Of the many transport protocols sur- also may be passed down to a network veyed, all could potentially provide service if it has a notion of priority, such blocking or non-blocking service de- as IP’s type of service field. pending on their API; one can therefore In OSI, priority service is provided in argue that this service feature is an the form of expedited data and the pri- implementation decision, not an inher- ority QoS parameter (see Section 3). ent feature of a given transport layer. Expedited data is an entirely separate (Note that the term “blocking” as used data flow that is not subject to normal here has no relationship to its use in data flow control. Section 4.9.) A no-priority service does not allow a user sender to differentiate the impor- 3.4 Multicast vs. Unicast tance of classes of data. Despite common misconceptions to A multicast service enables a user the contrary, TCP’s urgent data service sender to submit data, a copy of which is neither a priority service nor an expe- will be delivered to one or more user dited data service. The Berkeley Sock-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 372 • S. Iren et al. ets API provides access to this service although some discussion is underway via what it calls out-of-band data. How- to include security features in a future ever, this is misleading, since data sent generation. in urgent mode is not expedited, sent out-of-band, or sent with higher prior- 3.7 Status-reporting vs. No-status- ity. Rather, TCP urgent mode is a ser- reporting vice by which the user sender (i.e., an application) marks some portion of the A status-reporting service allows a user byte-stream as needing special treat- sender to obtain specific information ment by the user receiver. Thus, signal- about the transport entity or its connec- ing the presence of urgent data and tions. This information may allow the marking its position in the data stream user sender to make better use of the are the only aspects that distinguish the available service. Some information delivery of urgent data from the deliv- that a status reporting service might ery of all other TCP data; for all other provide are: performance characteristics purposes, urgent data is treated identi- of a connection (e.g., throughput, mean cally to the rest of the TCP byte-stream. delay), addresses (network, transport), The user receiver must read every byte current timer values, or the state of the of data exactly in the order it was sub- protocol machine supporting a connec- mitted regardless of whether or not ur- tion. gent mode is used. A no-status-reporting service does not provide any information about the 3.6 Security vs. No-security transport entity and its connections. A distinction can be made between A security service (e.g., TP4) provides status-reporting and QoS monitoring as, one or more security functions such as: for example, in applications based on authentication, access control, confiden- the Real-time Transport Protocol (RTP). tiality, and integrity [ISO 1989]. Au- Although QoS monitoring provides some thentication is the verification of user of the information listed in the above sender’s and user receiver’s identities. definition (delay, throughput), it pro- Access control checks a user’s permis- vides no information on the protocol’s sion status before allowing the use of internal structure such as state infor- different resources. Confidentiality mation, timer values, etc. Status-report- guarantees that only the intended user ing service provides explicit interactions receiver(s) can decode and understand between a user and its transport entity. the user sender’s data; no other user In QoS monitoring, there are no such receiver can understand the data’s con- 4 explicit interactions. tent. Finally, integrity detects (and Of the protocols surveyed, only TCP sometimes recovers from) any modifica- provides anything resembling status-re- tion, insertion, deletion, or replay of porting service. When TCP’s Socket De- transport sender’s data. Secure routing bug Option is enabled, the kernel main- also has been noted as a security service tains for future analysis a trace record [Stallings 1997]. This service guaran- of what happens on a connection tees that while going from user sender [Stevens 1994]. Status-reporting is a to user receiver, data will only pass valuable service that designers should through secure links and secure inter- consider incorporating into their proto- mediate routers. col implementations. A no-security service does not provide any of the above security functions. TCP currently provides no-security-service, 3.8 Quality-of-service vs. No-quality-of- service

4See Section 3.2.4 for a slightly different usage of A transport layer that provides Quality integrity. of Service (QoS) allows a user sender to

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 373 specify the quality of transmission ser- a user sender and being delivered to vice desired. The protocol then presum- the user receiver. ably optimizes network resources to ● provide the requested QoS. Since the Residual Error Rate is the ratio of underlying network places limits on the incorrect, lost, and duplicate TSDUs service that can be provided by the to the total number of TSDUs that transport protocol, a user sender should were sent. recognize two facts: (1) depending on ● Transfer Failure Probability is the ra- the underlying network’s capabilities, a tio of total transfer failures to total transport protocol will have varying de- transfer samples observed during a grees of success in providing the re- performance measurement. quested QoS, and (2) there is a trade-off among QoS parameters such as reliabil- ● Connection Release Delay is the max- ity, delay, throughput, and cost of ser- imum acceptable time between a vice [Marasli et al. 1996]. transport user initiating release of a Transport QoS is a broad and compli- connection and the actual release at cated issue. No universally accepted list the peer transport service user. of parameters exists, and how the trans- ● Connection Release Failure Probabil- port layer should behave for a desired ity is the fraction of connection re- QoS under different circumstances is lease attempts that did not complete unclear. The ISO Transport Service within the agreed upon connection re- [ITU-T 1995c] defines a number of pos- lease delay interval. sible performance QoS parameters that are negotiated during connection estab- ● Protection is used by the user sender lishment. User senders can specify (sus- to specify interest in having the tained) target, acceptable, and mini- transport protocol provide protection mum values for various service against unauthorized third parties parameters. The transport protocol ex- reading or modifying the transmitted amines these parameters, and deter- data. mines whether it can provide the re- quired service; this depends in part on ● Priority allows a user sender to spec- the available network service. ISO spec- ify the relative importance of trans- ifies eleven QoS parameters: port connections. In case of conges- tion or the need to recover resources, ● Connection Establishment Delay is lower-priority connections are de- the maximum acceptable time be- graded or terminated before the high- tween a transport connection being er-priority ones. requested and its confirmation being ● Resilience is the probability that the received by the user sender. transport protocol itself will sponta- ● neously terminate a connection due to Connection Establishment Failure internal or network problems. Probability is the probability a con- nection cannot be established within A transport layer that provides No- the maximum connection establish- Quality-of-Service (No-QoS) does not al- ment delay time due to network or low a user sender to specify desired internal problems. quality of transmission service. The original definitions of TP0 and ● Throughput is the number of bytes of TP4 support QoS service, however, only user sender data transferred per unit recently have the lower layers of net- time over some time interval. works matured sufficiently for the com- munity to consider transport layer QoS ● Transit Delay is the elapsed time be- handling. The ATM environment sup- tween a message being submitted by ports only two QoS parameters: (sus-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 374 • S. Iren et al. tained) target, acceptable, and mini- ing an initial value for the sequence mum throughput and transit delay number can be hazardous, particularly [ITU-T 1995a]. Within the Internet when two transport entities close and IETF community, much work is ongoing immediately reopen a connection (see to create an integrated services (INT- Section 4.4.7). SERV) QoS control framework so that A transport protocol is CO (e.g., TCP) the TCP/IP suite can provide more than if state information is maintained be- its current no-QoS service. While not tween transport entities. Typically, a itself a transport protocol, RSVP [Bra- CO protocol has three phases: connec- den et al. 1997] sits on top of IP. RSVP tion establishment, data transfer, and allows a host to request specific quali- connection termination. In the connec- ties of service from the network. A re- tion establishment phase, state infor- quest may refer to a single path in the mation is initialized either explicitly by case of unicast or multiple paths in the exchanging control TPDUs, or implicitly case of multicast. General QoS parame- with the arrival of first data TPDU. ters supported by the INTSERV frame- During a connection’s lifetime (i.e., data work are defined in Shenker and Wro- transfer phase), state information is up- clawski [1997]. Requests result in dated to reflect the reception of data reserved resources by individual net- and control PDUs (e.g., ACKs). When work elements (i.e., subnets, IP routers) both transport entities agree to termi- along the path. Thus far use of RSVP nate the connection, they discard the has been defined to provide two kinds of state information. Note the distinction Internet QoS: controlled load [Wroclaw- between a CO service and a CO proto- ski 1997] and guaranteed [Shenker et col. A CO service entails a three phase al. 1997]. operation to establish a logical connec- tion between transport users. A CO pro- 4. TRANSPORT PROTOCOL FEATURES tocol entails a three phase operation to maintain state information between The previous section describes general transport entities. features of transport layer service. This If no state information is maintained section focuses on internal transport at the transport sender and receiver, layer mechanisms that provide this ser- the protocol is CL. A CL protocol is vice. It is important to understand that based on individually self-contained almost every service can be accom- units of communication often called plished by several different protocols. datagrams that are exchanged indepen- dently without reference to each other 4.1 Connection-oriented vs. or to any shared state information. Connectionless Each datagram contains all of the infor- mation that the receiving transport en- Not only can a service be classified CO tity needs to interpret it. or CL, so too can a protocol that pro- vides either service. The distinction de- 4.2 Transaction-oriented pends on the establishment and mainte- nance of state information, a record of Transaction-oriented protocols attempt characteristics and events related to the to optimize the case where a user communication between the transport sender wishes to communicate a single sender and receiver. Perhaps the most APDU (called a request) to a user re- important piece of state information is ceiver, who then normally responds the sequence number that identifies a with a single APDU (called a response). TPDU. Consistent viewpoints of the se- Such a request/response pair is called a quence numbers used by transport transaction. sender and transport receiver are re- A transaction-oriented transport pro- quired for reliable data transfer. Choos- tocol attempts to provide reliable ser-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 375 vice for transactions with as few TPDUs 4.3.2 Unidirectional vs. Bidirectional. as possible, ideally only one for the re- In a unidirectional connection (e.g., quest and one for the response. This is NETBLT, VMTP) data flows only in one done by trying to minimize overhead for direction at a time (half duplex). That connection establishment. Transactions is, while one transport entity is sending share the following characteristics [Bra- data, the other may not send data in the den 1992a]: an asymmetrical model (i.e., reverse direction. Each transport entity client and server), simplex data trans- can assume the role of transport sender fer, short duration, low delay, few data or transport receiver, but not both si- TPDUs, message orientation, and the multaneously. need for no-duplicates service. Exam- In a bidirectional connection (e.g., ples of transaction-oriented protocols in- TCP), both transport entities simulta- clude VMTP and T/TCP, a backwards neously assume the roles of transport compatible TCP extension. sender and transport receiver, thereby allowing full duplex data flow. 4.3 CO Protocol Features 4.3.3 Connection Establishment. The following subsections refer only to Three modes of connection establish- CO protocols. ment are shown in Figure 3. The cost 4.3.1 Signaling. Signaling is the ex- associated with connection establish- change of control (i.e., state) informa- ment can be amortized over a connec- tion between transport entities for man- tion’s lifetime. For short-lived connec- aging a connection.5 Signaling is used to tions which result from transaction- establish and terminate a connection oriented applications, this cost can be and to exchange communication system significant. Therefore, protocols for parameters. Signaling can be accom- short-lived connections have been de- plished in-band, where control informa- signed using a timer-based connection tion and user data are multiplexed on establishment and termination mecha- the same connection, or out-of-band, nism. For longer connections and when where separate connections are used. avoiding false connections is important, In-band signaling (e.g., TCP) is gener- 2-way or 3-way handshake mechanisms ally more suitable for applications that are needed. A handshake involves the require short-lived connections (e.g., explicit exchange of control TPDUs. transaction-oriented communications). In implicit connect, the connection is The extra overhead incurred for manag- open as soon as the first TPDU is sent ing two bands is avoided, and small or received. The transport sender starts amounts of user data can be transmit- transmitting data without any explicit ted during the signaling operation. connection verification by the transport Out-of-band signaling (e.g., TPϩϩ)is receiver. Further data TPDUs can be desirable for high-speed communication sent without receiving an ACK from the systems. It avoids the overhead of sepa- transport receiver. Implicit connections rating signaling information from user can provide reliable service only if the data by doing so below the transport protocol definition guarantees that de- layer. Out-of-band signaling can sup- layed TPDUs from any previously port transporting more than one kind of closed connection cannot cause a false data over a connection, thereby facili- open. T/TCP [Braden 1994] achieves tating operations involving third par- this via connection-unique identifiers ties, for example, a server validating and timer-based connection termina- security information, or billing. tion. In 2-way-handshake connect (e.g., 5While in some cases data may be exchanged, APPN, SSCOP/AAL5), a CR-TPDU signaling generally refers to exchanging control (Connection Request) and a CC-TPDU info. (Connection Confirm) are exchanged to

ACM Computing Surveys, Vol. 31, No. 4, December 1999 376 • S. Iren et al.

Host A Host B Host A Host B Host A Host B

TPDU CR-TPDU CR-TPDU

CC-TPDU CC-TPDU

ACK-CC-TPDU

Implicit 2-way-handshake 3-way-handshake

Point in time at which connection is said to be established for that host.

Figure 3. Three modes of connection establishment. establish the connection. The transport 4.3.4 Connection Termination. Four sender may transmit data in the CR- modes of connection termination are TPDU, but sending additional data is shown in Figure 4. prohibited until the CC-TPDU is re- With implicit disconnect (e.g., VMTP), ceived. During this handshake, QoS pa- when a transport entity does not hear rameters such as buffer size, burst size, from its peer for a certain time period, burst rate, etc., can be negotiated. the entity terminates the connection. When the underlying network service Typically, implicit disconnection is used provides a small degree of loss, a 2-way- with implicit connection establishment. handshake mechanism may be good In abortive disconnect, when a trans- enough to establish new connections port entity must close the connection without significant risk of false connec- abnormally due to an error condition, it tions. simply sends an abort-TPDU and termi- In 3-way-handshake connect (e.g., nates the connection. The entity does TCP), the transport sender sends a CR- not wait for a response. Thus TPDUs in TPDU to the transport receiver, which transit in either direction may be lost. responds with a CC-TPDU. The proce- In 2-way-handshake disconnect (e.g., dure is completed with an ACK-CC- NETBLT), a transport entity sends a TPDU (ACK for Connection Confirm). DR-TPDU (Disconnect Request) to its Normally, no user data is carried on peer and receives a DC-TPDU (Discon- these connection establishment TP- nect Confirm) in return. If a connection DUs.6 If the underlying network service is unidirectional (see Section 4), the provides an unacceptable degree of loss, transport sender usually initiates con- a 3-way-handshake is needed to prevent nection termination, and before discard- false connections that might result from ing its state information verifies the delayed TPDUs. Because TCP was de- reception of all TPDUs by the transport signed specifically for use over an unre- receiver. In a bidirectional connection, a liable network service, TCP uses a 2-way-handshake can only verify recep- 3-way handshake. tion of TPDUs in one direction. TPDUs in transit in the reverse direction may be lost. 6TCP allows user data to be sent on a CR-TPDU. Finally, in 4(3)-way-handshake dis- However, that data cannot be delivered to the connect (e.g., TCP), two 2-way-hand- user until the 3-way-handshake is completed. Furthermore, outside of T/TCP compliant imple- shakes are used, one for each direction mentations, TCP implementations typically do not of data flow. The transport entity that take advantage of this feature [Stevens 1996]. closes its sending flow sends a DR-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 377

Host A Host B Host A Host B Host A Host B Host A Host B

DR-TPDU Abort-TPDU DR-TPDU DC-TPDU

No data DC-TPDU DR-TPDU flow

DC-TPDU

Implicit Abortive 2-way-handshake 4(3)-way-handshake

Point in time at which connection is said to be terminated for that host. Figure 4. Four modes of connection termination.

TPDU to its peer entity (in TCP the FIN error control [Bae et al. 1991; flag is set in its last TPDU). This dis- Bhargava et al. 1988]. There are two connect request is then acknowledged phases in transport layer error control: by the transport receiver as soon as all error detection, and error reporting and preceding TPDUs have been received. recovery. The connection is terminated when data flows in both directions are closed. The 4.4.1 Error Detection. Error detec- number of control TPDUs exchanged tion identifies lost, misordered, dupli- can be reduced to three if the first DC- cated, and corrupted TPDUs. Sequence TPDU also functions as a DR-TPDU for numbers help uncover the first three the reverse direction. problems. Corrupted data is discovered Connections may be terminated ei- by means of (1) length fields that allow ther gracefully or ungracefully.Inan a transport receiver to detect missing or ungraceful termination as in TP0 or extra data, and (2) redundant informa- TP4,7some data in transit may be lost. tion in the form of error detecting codes In a graceful close, no data in transit is (EDC). Cyclic redundancy checks and lost since a connection is closed only other forms of checksums are the most after all data have arrived at their des- common examples of EDCs. An EDC tination. For bidirectional connections, can verify the header/trailer, the data, only a 4(3)-way-handshake can guaran- or both. TCP and AAL5 contain a single tee graceful close. checksum on both. XTP performs one checksum on the header and allows for 4.4 Error Control: Sequence Numbers, an optional second checksum on the Acks, and Retransmissions data. Separate EDCs for header and data Error control is a combination of tech- are recommended for multimedia appli- niques to guard against loss or damage cations [La Porta and Schwartz 1992]. of user data and control information. CL In voice applications, where corrupted network protocols such as IP perform no data is tolerable but retransmissions error control on user data. Even CO are less advantageous due to delay con- networks are being designed with light- straints, only a header EDC needs to be weight virtual circuits that provide no used. In image transfer using both error control [McAuley 1990; Parulkar EDCs, corrupted data may be used tem- and Turner 1989]. Recent studies have porarily until a corrected version is ob- shown that for realistic high-speed net- tained via retransmission. works with low error rates, transport layer error control is more efficient than 4.4.2 Error Reporting and Recovery. Error reporting is a mechanism where 7The OSI provides the graceful the transport receiver explicitly informs close. the transport sender about errors that

ACM Computing Surveys, Vol. 31, No. 4, December 1999 378 • S. Iren et al. have been detected. Error recovery is a sion or a subsequent retransmission mechanism used by both transport [Karn and Partridge 1987]. Jacobson sender and receiver to recover from er- [1988]refined TCP’s retransmission rors whether or not they are explicitly timeout calculation by using a low-pass reported. filter to estimate the mean, and mean Error reporting and recovery are tra- deviation of the RTT. RTT estimation ditionally accomplished using timers, has been a key area of research in TCP; sequence numbers, and acknowledg- an informative summary appears in ments. Sequence numbers are assigned Stevens [1994]. to TPDUs (or associated with the byte- stream). The transport receiver informs 4.4.3 Piggybacking. When a TPDU the transport sender via ACKs about arrives at a transport receiver, instead TPDU arrivals so that the sender can of immediately returning an ACK as a retransmit those that are missing. separate control TPDU, some protocols A positive ACK, or PACK, contains (including TCP) artificially delay re- sequence number information about turning an ACK hoping the user re- those TPDUs that have been received. A ceiver will soon submit its next message negative ACK, or NACK, often known as to be sent as part of the reverse direc- a selective reject, explicitly identifies tion data flow. When this occurs, the TPDUs that have not been received.8 ACK is piggyback-ed as header informa- Protocols that use ACKs are known as tion on the reverse direction data Positive Ack with Retransmission (PAR) TPDU. This is an example of transport or (ARQ) layer concatenation (see Section 4). schemes. Upon receipt of an ACK, the 4.4.4 Cumulative vs. Selective Ac- transport sender updates its state infor- knowledgment. A common type of mation, discards buffered TPDUs that PACK is the cumulative PACK. Each are acknowledged, and retransmits any cumulative PACK carries a sequence TPDUs that are not acknowledged. A number indicating that all TPDUs with protocol that only use PACKs has no lower9 sequence numbers have been re- error reporting mechanism. ceived. With cumulative PACKs, even if If a transport sender does not receive PACK is lost, the arrival of a later an ACK within a reasonable timeout i PACK such that j Ͼ i, positively ac- period, it may assume something has j gone wrong and retransmits unacknowl- knowledges all TPDUs up to TPDUj. edged TPDU(s). With delays within The later cumulative PACK incorpo- rates the information of the previously most underlying networks being vari- 10 able and dynamic, accurate calculation lost one. of a reasonable timeout value within the A disadvantage of cumulative PACKs transport layer is a most difficult prob- occurs when TPDUi is lost and lem. However, accurate round-trip time TPDUiϩ1, TPDUiϩ2, ..., arrive intact (RTT) estimation is essential [Zhang and are buffered. Each PACK following 1986]. The basic idea is to keep an RTT the loss can acknowledge only TPDUs estimate by measuring the time be- up to and including TPDUiϪ1. Thus, the tween sending a TPDU and receiving its transport sender may not receive timely ACK. Research has shown the impor- feedback on the success of TPDUs sent tance of not updating this estimate for retransmitted TPDUs, since it is impos- sible to determine whether the received 9 ACK is for the original TPDU transmis- This is TCP’s definition; some protocols use “low- er or equal”. 10Most implementations allow sequence number 8For clarity in this paper, PACK refers only to a wraparound. For simplicity, we assume that j Ͼ i positive ACK, and ACK refers to either a PACK or implies that TPDUj is later in the data flow than NACK. TPDUi.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 379

A more aggressive transport sender after TPDUi, and retransmit them un- necessarily. retransmits TPDUi and all TPDUs al- The second PACK type in its most ready sent after TPDUi. This Go- basic form is the selective PACK, which Back-N approach is fairly simple but acknowledges exactly one TPDU. With decreases channel utilization by poten- selective PACKs, a transport receiver tially retransmitting correctly-received informs the transport sender about each TPDUs. These unnecessary retransmis- TPDU that arrives successfully, so the sions then add to . transport sender can more likely re- The protocol SMART combines good fea- transmit only those TPDUs that have tures of selective repeat and Go-Back-N actually been lost. Selective PACKs are by having the transport receiver return used by NETBLT and VMTP. both a cumulative PACK, and a selec- A block PACK is a variation of selec- tive PACK for the TPDU that most re- tive PACK, where blocks of individual cently arrived [Keshav and Morgan TPDUs are selectively acknowledged 1997]. This combined ACK information [Brown et al. 1989]. For example, an helps the transport sender avoid unnec- XTP ACK is actually a series of block essary retransmissions. Further work PACKs. If TPDUs {1, 2, 4, 5, 6, 9, 10} on innovative retransmission strategies arrive and {3, 7, 8} are missing, then the for use over wireless networks can be block PACK would look like 1-2;4-6;9- found in Balakrishnan et al. [1996] (see 10. Section 6). Both selective repeat and In part due to the influence of innova- Go-Back-N also apply to tive acknowledgment schemes in proto- communication, and are thoroughly an- cols such as XTP, NETBLT, and VMTP, alyzed in Stallings [1997], Tanenbaum TCP with selective acknowledgment [1996], and Walrand [1991]. (called SACK) recently has been pro- posed. SACK combines selective (block) 4.4.6 Sender-dependent vs. Sender-in- and cumulative PACKs within TCP dependent Acknowledgment. In some [Braden and Jacobson 1988; Floyd 1996; protocols, transport receiver ACKs are Mathis et al. 1996]. One simulation generated in response to explicit or im- study shows the strength of TCP imple- plicit events initiated by the transport mentations with vs. without selective sender. These ACKs are called sender- PACKs [Fall and Floyd 1996]. dependent [Doeringer et al. 1990; Thai et al. 1994]. In XTP, no ACKs are gen- 4.4.5 Retransmission Strategies. erated until the transport sender explic- When a transport sender transmits itly instructs the transport receiver to TPDUi and determines later that it generate one. Some transaction-ori- may not have arrived at the transport ented protocols such as VMTP try to receiver, two retransmission strategies minimize the number of control TPDUs are possible. (This determination can exchanged by using the response as an result when the transport sender does implicit PACK for a transmitted re- not receive a PACK within a predeter- quest, and a new request as an implicit mined timeout period, or when it re- PACK for the previous response. ceives back-to-back cumulative PACKs Sender-independent ACKs are those that are identical.) A conservative ap- generated by a transport receiver inde- proach has the transport sender re- pendent of actions by the transport transmit selectively only TPDUi and sender. SNR/ESNR [Doshi et al. 1993; wait for a PACK with sequence number Netravali et al. 1990] uses an error re- larger than previous PACKs. This selec- porting and recovery scheme called Pe- tive repeat approach is used in SNR riodic State Exchange, where complete [Doshi et al. 1993], and avoids retrans- state information is exchanged periodi- mitting correctly received TPDUs [Feld- cally between a transport sender and a meier and Biersack 1990]. transport receiver based on timers, not

ACM Computing Surveys, Vol. 31, No. 4, December 1999 380 • S. Iren et al. the arrival or loss of TPDUs. If a control maintain the state information. A sec- TPDU containing state information is ond approach uses a unique connection lost, no immediate steps are taken. identifier for each connection. Both ap- Eventually, after another period the proaches work fine, unless the system complete state information containing loses the sequence number or connec- information in the lost control TPDU tion identifier information, say due to a will be exchanged. This approach sim- system crash. A third approach requires plifies the error recovery mechanism a minimum amount of time before a and reduces the number of error recov- connection between the same two trans- ery timers. With parallel processors, port entities can be reestablished, and one processor at the receiver can be enforces a maximum lifetime of any dedicated to processing state informa- TPDU to be less than that minimum. tion and inserting control TPDUs while This so-called TIME-WAIT approach is a second focuses on processing TPDUs. used in TCP; it has the disadvantage of introducing artificial delays between 4.4.7 Duplicate Detection. Sequence consecutive connections. numbers are also the basic mechanism for duplicate detection. If an ACK is lost 4.4.8 Forward Error Correction. Be- and as a result one or more TPDUs are cause each retransmission requires at retransmitted, duplicate TPDUs can ar- least one round-trip time, PAR and rive at the transport receiver. In most ARQ schemes may operate poorly for CO protocols, the transport receiver applications that have tight latency con- maintains state information about pre- straints. As an alternative, Forward Er- viously received TPDUs. (A less com- ror Correction (FEC) can be used for low mon approach to duplicate detection latency error control, with or without uses synchronized clocks instead of PAR/ARQ. In TPϩϩ, a transport sender state information [Liskov et al. 1990]). can transmit each TSDU as k data TP- When a duplicate TPDU is received, the DUs and an additional h redundant par- state information allows it to be de- ity TPDUs. Unless the network loses tected. In CL protocols, where no state more than h of the hϩk TPDUs sent, information is maintained, duplicate de- the transport receiver can reconstruct tection is not possible and the typical the original k data TPDUs. offered service is maybe-loss, maybe-du- plicates. In CL protocols that use ACKs 4.5 Flow/Congestion Control and retransmissions, the offered service is limited to no-loss, maybe-duplicates The terms flow control and congestion service. control create confusion, since different When a duplicate is received after a authors approach these subjects from connection close and subsequent recon- different perspectives. In this section nection between the same two transport we try to make a useful distinction be- entities, duplicate detection becomes a tween the terms while recognizing that delicate problem. In this case, the dupli- overlap exists. cate may accidentally be considered First, we define transport layer flow original data for the second connection. control as any scheme by which the Several solutions exist for this delayed transport sender limits the rate at duplicate problem. First, each transport which data is sent over the network. entity can remember the last sequence The goals of flow control may include number used for each terminated con- one or both of the following: (1) prevent- nection (i.e., maintain state even after ing a transport sender from sending disconnection). If a connection is re-es- data for which there is no available tablished, the next sequence number is buffer space at the transport receiver, used. A design problem involves decid- or (2) preventing too much traffic in the ing how long each transport entity must underlying network. Flow control for (2)

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 381 also is called congestion control,orcon- provided in tandem with error control, gestion avoidance. Congestion is essen- by using sequence numbers and win- tially a network layer problem, and doz- dowing techniques. Some authors, how- ens of schemes are discussed and ever, claim that error control and flow classified in Yang and Reddy [1995]. control have different goals and should However, congestion often is addressed be separated in high-speed network pro- by transport layer flow control (also tocol design [Feldmeier 1990; La Porta known as end-point flow control). and Schwartz 1992; Lundy and Tipici Techniques used to implement both 1994]. goals are often tightly integrated, as is Two techniques may be used, either the case in TCP. Regardless of which alone or together, to avoid network con- goal is being pursued, the flow of data is gestion and overflowing receiver buff- being controlled. Therefore, some au- ers. These are: (1) window flow control thors use “flow control” to refer to proto- and (2) rate control. col features that address both goals, In window (or sliding-window) flow thus blurring the distinction between control, the transport sender continues flow and congestion control [Stevens sending new data as long as there re- 1994, p. 310]. Bertsekas and Gallager mains space in the sending window. In [1992] state that flow and congestion general, this window may be fixed or control overlap so much that it is better variable in size. In fixed size window to treat them as a single problem; how- control, ACKs from the transport re- ever, he discusses flow control as a ma- ceiver are used to advance the transport jor network layer function. Other au- sender’s window. thors [Stallings 1997; Tanenbaum 1996; In variable size window control, also Walrand 1991] clearly separate conges- called a credit scheme, ACKs are de- tion control from flow control. Feld- coupled from flow control. A TPDU may meier states that congestion control can be acknowledged without granting the be viewed as a generalized n-dimen- transport sender additional credit to sional version of flow control, and sug- gests that flow control for all layers send new TPDUs, and additional credit should be performed at the lowest layer can be given without acknowledging a that requires flow control [Feldmeier TPDU. The transport receiver adopts a 1993a]. In this paper, we use flow con- policy concerning the amount of data it trol as the overall term for such tech- permits the transport sender to trans- niques, but emphasize that (1) flow con- mit, and advertises the size of this win- trol is only one example of many dow in TPDUs that flow from receiver to possible congestion control techniques, sender. and (2) congestion control is only one of Early experience with TCP showed a the two motivations for flow control. problem, known as the Silly Window We now present a discussion of gen- Syndrome, that can afflict protocols us- eral flow control techniques, and a dis- ing the credit scheme. The silly window cussion of how these techniques combat syndrome can start either because the network congestion. transport receiver advertises a very small window, or the transport sender 4.5.1 General Flow Control Tech- sends a very small TPDU. Clark [1982] niques. Transport layer flow control is describes how this can degenerate into a more complex than network or data-link vicious cycle where the average TPDU layer flow control. This is because stor- size ends up being much smaller than age and forwarding at intermediate the optimal case, and throughput suf- routers causes highly variable delays fers as a result. TCP includes mecha- that are generally longer than actual nisms in both the transport sender and transmission time [Stallings 1997]. transport receiver to avoid the condi- Transport layer flow control usually is tions that lead to this problem.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 382 • S. Iren et al.

Credit schemes may be described as trol and rate control together as in XTP conservative or aggressive (optimistic). and NETBLT will achieve better perfor- The conservative approach, as used in mance than implementing only one of TCP, only allows new TPDUs up to the them [Clark et al. 1987; Doeringer et al. transport receiver’s available buffer 1990; La Porta and Schwartz 1992; space. This approach has the disadvan- Sanders and Weaver 1990]. tage that it may limit a transport con- One innovative approach imple- nection’s throughput in long delay or mented in TPϩϩ involves backpressure, large bandwidth-product situations. where the application, transport, and The aggressive approach attempts to in- network layers all interact. If a user crease throughput by allowing a trans- receiver reads from the transport re- port receiver to optimistically grant ceiver at a rate slower than the trans- credit for space it does not have [Stall- port sender is sending, the transport ings 1997]. A disadvantage of the ag- receiver’s buffers eventually will fill up. gressive approach is its potential for The transport receiver explicitly trig- buffer overflow at the transport receiver gers congestion control procedures (hence, wasted bandwidth) unless the within the network. The network sender credit-granting mechanism is carefully in turn refuses additional TPDUs from timed [Clark 1982]. the transport sender. The transport Rate control uses timers at the trans- sender may then exercise backpressure port sender to limit data transmission on the sending application by refusing [Cheriton 1988; Clark et al. 1987; to accept any more new APDUs, until Weaver 1994]. Either a transport notice arrives that the transport receiv- sender can be assigned (1) a burst size er’s buffers have started to empty and interval (or burst rate), or (2) an [Stevens 1994]. This method will not interpacket delay time. In (1), a trans- work if multiple transport connections port sender transmits a burst of TPDUs are multiplexed on a single network at its maximum rate, then waits for the connection [Stallings 1997]. specified burst interval before transmit- ting the next burst (e.g., NETBLT and XTP). This is often modeled as a token- 4.5.2 Flow Control Techniques for Ad- bucket scheme. In (2), a transport dressing Congestion. Congestion con- sender transmits data as long as it has trol schemes have two primary objec- credit available, but artificially pauses tives: fairness, and optimality. between each TPDU according to the Fairness involves sharing resources interpacket delay (e.g., VMTP [Cheriton among users (individual data flows) and Williamson 1989]). This is often fairly. Since transport entities lack modeled as a leaky-bucket scheme. Case knowledge about general network re- (2) improves performance because sources, the bottleneck(s) itself is at a spreading TPDU transmissions helps better position to enforce fairness. The avoid network congestion and receiver transport layer’s role is limited to en- overflow. However, interpacket delay al- suring that all transport senders coop- gorithms are more difficult to imple- erate. For example, TCP’s congestion ment because of the timers needed. control technique assumes that all TCP Rate control schemes involve low net- entities will “back-off” in a similar man- work overhead since they do not ex- ner in the presence of congestion. Un- change control TPDUs except to occa- fortunately, it only works in a coopera- sionally adjust the rate in response to a tive environment. Today’s Internet is significant change in the network or the experiencing congestion difficulties be- transport receiver. They also do not af- cause of many non-TCP-conformant fect the way data ACKs are managed transport entities acting selfishly. Mod- [Doeringer et al. 1990]. Some authors ifications to Random Early Dropping argue that implementing window con- (RED) algorithms have been proposed to

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 383 counter these selfish TCP implementa- tected. The second principle is that tions [Floyd and Jacobson 1993]. when loss is encountered, indicating po- Optimality involves ensuring that all tential congestion, the transport sender routers operate at optimal levels. Trans- reduces its sending rate significantly, port entities have a role to play by lim- and then uses an additive increase per iting the traffic sent over any router RTT in an attempt to find an optimal that becomes a bottleneck. A router can sending rate. A goal of TCP is that monitor itself and send explicit access network bandwidth is shared fairly control feedback to the traffic sources when all TCP connections abide by both [Prue and Postel 1987; Ramakrishnan principles. Details of these principles and Jain 1988]. Alternatively, transport have been refined over various imple- entities can use implicit access control mentations of TCP, which go by the and detect potential bottlenecks from names Tahoe, Reno, and Vegas [Ahn et timeouts and delays derived from the al. 1995; Fall and Floyd 1996; Jacobson ACKs received [Ahn et al. 1995; Jain 1988]. 1989]. A third method is to use packet- Using a control-theoretic approach, pair flow control which also has the congestion control schemes can be advantage of ensuring that well-be- viewed as a control policy to achieve haved users are protected from ill-be- prescribed goals (e.g., minimize round- haved users because of the use of round- trip delay). These schemes are divided robin schedulers [Keshav 1991; 2000]. into open loop (e.g., rate control and Explicit access control has been im- implicit access control schemes), where plemented in XTP and TPϩϩ. XTP im- control decisions by the transport plements both end-to-end flow control sender do not depend on feedback from and explicit access congestion control. the congested sites, and closed loop Initially, default parameters control the (e.g., explicit access control schemes), transport sender’s transmission rate. where control decisions do depend on However, during the exchange of control such feedback. With networks having information, the network can modify higher speeds and higher bandwidth- the flow control parameters to control delay products, open loop schemes have congestion. Furthermore, network rout- been proposed as being more effective. ers can explicitly send control PDUs to Many current admission control transport entities to control the conges- schemes, which can be called regulation tion. TPϩϩ uses the previously men- schemes, are open loop. tioned backpressure both to avoid over- whelming a slow receiver and to control 4.6 Multiplexing/Demultiplexing congestion. When the network layer provides no Multiplexing, as shown in Figure 5(a), support for explicit access control, the supports several transport layer connec- transport protocol may operate on the tions using a single network layer asso- assumption that any TPDU loss indi- ciation. Multiplexing maps several user/ cates network congestion — a reason- transport interface points (what ISO able assumption when the underlying calls TSAPs) onto a single transport/ network links are highly reliable. TCP’s network interface point (NSAP). An as- approach to congestion avoidance incor- sociation is a in the case porates two important design princi- of a CO network. In the case of a CL ples. The first principle is slow-start, network, an association is a pair of net- where new connections do not initially work addresses (source and destina- send at the full available bandwidth, tion). When the underlying network is but rather first send one TPDU per CL (as is the case for TCP), multiplex- RTT, then two, then four, etc., (expo- ing/demultiplexing as provided by TCP’s nential increase per RTT), until the full port numbers is necessary to serve mul- bandwidth is reached, or a loss is de- tiple transport users.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 384 • S. Iren et al.

TSAPs NSAP TSAP NSAPs Host A Host B Host A Host B Application (Session) ...... Transport ......

Network

Flow of NPDUs Flows of NPDUs

(a) (b)

Figure 5. (a) Multiplexing/demultiplexing(b) Splitting/recombining.

Multiplexing uses network layer re- ted to the network layer. The transport sources more efficiently by reducing the receiver has to separate the concate- network layer’s context-state informa- nated TPDUs. This mechanism saves tion. Additionally, it can provide primi- network bandwidth at the expense of tive stream synchronization [Feldmeier extra processing time at the transport 1990]. For example, video and audio sender and receiver. Considering that of a movie can be multiplexed the transport receiver is often the bot- to maintain “lip sync,” in which case the tleneck in protocol processing and that multiplexed streams are handled as a available bandwidth is increasing, some single stream in a single, uniform man- authors argue that concatenation/sepa- ner. This method works only if all ration and splitting/recombining mecha- streams require the same network layer nisms are becoming less important QoS. Multiplexing’s advantages come at [Thai et al. 1994]. the expense of having to demultiplex TP- DUs at the transport receiver and to en- sure fair sharing of resources among the 4.9 Blocking/Unblocking multiplexed connections. Demultiplexing Blocking combines several TSDUs into requires a look-up operation to identify a single TPDU (see Figure 7(a)), thereby the intended receiver and possibly extra reducing the number of transport layer process switching and data copying. encapsulations and the number of NS- DUs submitted to the network layer. The 4.7 Splitting/Recombining receiving transport entity has to unblock Splitting (or downward-multiplexing or deblock the blocked TSDUs before de- [Walrand 1991]) as shown in Figure 5(b) livering them to transport service user. is the opposite of multiplexing. In this Blocking/unblocking is intended to case, several network layer associations save network bandwidth. The effect on support a single transport layer connec- processing time varies with the particu- tion. Splitting provides additional resil- lar implementation. A protocol that has ience against a network failure and po- to do both blocking and unblocking, say tentially increases throughput by to preserve TSDU boundaries, might letting a fast processor output data over add processing time. However, TCP’s several slower network connections form of blocking, described by the Nagle [Strayer and Weaver 1988]. Algorithm [Nagle 1984], actually may reduce processing time at the transport 4.8 Concatenation/Separation receiver, since it reduces the number of TCP headers which must be processed. Concatenation, as shown in Figure 6, TCP’s transport receiver unblocks, but combines several TPDUs into one net- does not preserve TSDU boundaries. work service data unit (NSDU), thus This is consistent with TCP’s CO-byte reducing the number of NSDUs submit- service.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 385

Sender Receiver

Transport TPDU... TPDU TPDU ... TPDU Layer

Network NSDU NSDU Layer

Figure 6. Concatenation and separation.

Sender Receiver Sender Receiver

TSDU... TSDU TSDU... TSDU TSDU TSDU

Transport TPDU... TPDU TPDU... TPDU Layer TPDU TPDU

(a) (b) Figure 7. (a) Blocking/unblocking (b) Segmentation/reassembly.

Although blocking and concatenation Mogul 1987]. These implementations are similar (both permit grouping of then use transport layer segmentation PDUs), they may serve different pur- (and/or blocking) with this MTU in poses. As one subtle difference, concate- mind, thereby reducing11 the risk for nation permits the transport layer to network layer segmentation [Stevens group control (e.g., ACK) TPDUs with 1994, p. 340]. data TPDUs that were derived from TS- DUs. Piggybacking ACKs and data is an 5. TRANSPORT PROTOCOL EXAMPLES example of concatenation. Blocking only This section now summarizes eleven combines TSDUs (i.e., transport layer widely implemented or particularly in- data). Segmentation and concatenation fluential transport protocols other than may occur, but blocking, as defined TCP. Each one’s features are described within the ISO Reference Model, in CL in terms of the service and protocol fea- protocols is not permitted [ITU-T 1994b]. tures discussed in general in Sections 3 4.10 Segmentation/Reassembly and 4, respectively. The protocol de- scriptions are summarized from the in- Often the network service imposes a dicated references and from direct cor- maximum permitted NSDU size. In respondence with those people most such cases, a larger TSDU is segmented responsible for each protocol’s design into smaller TPDUs by the transport and development. Together these proto- sender as shown in Figure 7(b). The cols are summarized in the Appendix in transport receiver reassembles the TSDU Tables II–IV. Each of these tables are before delivering it to the user receiver. divided into three parts: (a) general fea- TCP segmentation is based on negoti- tures, (b) service features, and (c) proto- ating a during col features. connection establishment, or in more The last part of the section provides a recent implementations, on a Path Max- brief statement about eight experimen- imum Transmission Unit (MTU) discov- tal transport protocols that appear in ery mechanism. To avoid the overhead the literature. associated with network layer fragmen- tation, many TCP implementations de- termine the largest size IP datagram 11Due to dynamic route changes, the MTU can that can be transmitted end-to-end change with TCP; thus IP fragmentation can be without IP fragmentation [Kent and avoided but not prevented.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 386 • S. Iren et al.

5.1 UDP signed to operate on top of a CO reliable network service that also provides end- User Datagram Protocol (UDP) (see Ta- to-end flow control. TP0 does not even ble II) provides CL, uncontrolled-loss, provide its own disconnection proce- maybe-duplicates, unordered service [Postel 1980]. UDP is basically an inter- dures; when the underlying network face to IP, adding little more than TSAP connection closes, TP0 closes with it. demultiplexing (port numbers) and op- One interesting use of TP0 is that it can tional data integrity service. By not pro- be employed to create an OSI transport viding reliability service, UDP’s over- service on TCP’s reliable byte-stream head is significantly less than that of service, enabling OSI applications to TCP. Although UDP includes an optional run over TCP/IP networks [Rose and checksum, there is no provision for error Cass 1987]. reporting; incoming TPDUs with check- sum errors are discarded, and valid ones are passed to the user receiver. 5.4 NETBLT Network Block Transfer (NETBLT) (see 5.2 TP4 Table III) was developed at MIT for The ISO Transport Protocol Class 4 (or high throughput bulk data transfer TP4) (see Table II) was designed for the [Clark et al. 1987]. It is optimized to same reasons as TCP. It provides a sim- operate efficiently over long-delay links. ilar CO, reliable service over a CL unre- NETBLT was designed originally to op- liable network. TP4 relies on the same erate on top of IP, but can operate on mechanisms as TCP with the following top of any network protocol that pro- differences. First, TP4 provides a CO- vides a similar CL unreliable network message service rather than CO-byte. service. Data exchange is realized via Therefore, sequence numbers enumer- unidirectional connections. The unit of ate TPDUs rather than bytes. Next, se- transmission is a buffer, several of quence numbers are not initiated from a which can be concurrently active to clock counter as in TCP [Stevens 1994], keep data flowing at a constant rate. but rather start from 0. A destination Connection is established via a 2-way- reference number is used to distinguish handshake during which buffer, TPDU between connections. This number is and burst sizes are negotiated. Flow similar to the destination port number control is accomplished using buffers in TCP, but here the reference number (transport-user-level control) and rate maps onto the port number and can be control (transport-protocol-level con- chosen randomly or sequentially [Bert- trol). Either transport user of a connec- sekas and Gallager 1992]. Another im- tion can limit the flow of data by not portant difference is that (at least in providing a buffer. Additionally, NET- theory) a set of QoS parameters (see BLT uses burst size and burst rate pa- Section 3) can be negotiated for a TP4 rameters to accomplish rate control. connection. Other differences between NETBLT uses selective retransmission TCP and TP4 are discussed in Piscitello for error recovery. After a transport and Chapin [1993]. sender has transmitted a whole buffer, it waits for a control TPDU from the 5.3 TP0 transport receiver. This TPDU can be a The ISO Transport Protocol Class 0 (or RESEND, indicating lost TPDUs, or an TP0) (see Table II) was designed as a OK, acknowledging the whole buffer. A minimum transport protocol providing GO allows the transmission of another only those functions necessary to estab- buffer. Instead of waiting after each lish a connection, transfer data, and buffer, a multiple buffering mechanism report protocol errors. TP0 was de- can be used [Dupuy et al. 1992].

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 387

5.5 VMTP efficiently performed as a single incar- nation of a TCP connection. It intro- The Versatile Message Transaction Pro- duces two major improvements over tocol (VMTP) (see Table III) was de- TCP. First, after an initial transaction signed at Stanford University to provide is handled using a 3-way-handshake high performance communication ser- connection, subsequent transactions vice for distributed operating systems, streamline connection establishment including file access, remote procedure through the use of a 32-bit incarnation calls (RPC), real-time datagrams and number, called a “connection count” multicast [Cheriton and Williamson (CC) carried in each TPDU. T/TCP uses 1989]. VMTP is a request-response pro- the monotonically increasing CC values tocol that uses timer-based connection in initial CR-TPDUs to bypass the management to provide communication 3-way-handshake, using a mechanism between network-visible entities. Each called TCP Accelerated Open. With this entity has a 64-bit identifier that is mechanism, a transport entity needs to unique, stable, and independent of host- cache a small amount of state for each address. The latter property allows enti- remote peer entity. The second improve- ties to be migrated and handled inde- ment is that T/TCP shortens the delay pendent of network layer addressing, in the TIME-WAIT12 state. T/TCP de- facilitating process migration and mo- fines three new TCP options, each of bile and multihomed hosts [Cheriton which carries one 32-bit CC value. and Williamson 1989]. Each request These options accelerate connection setup (and response) is identified by a trans- for transactions. T/TCP includes all nor- action identifier. In the common case, a mal TCP semantics, and operates exactly client increments its transaction identi- as TCP for all features other than connec- fier and sends a request to a single tion establishment and termination. server; the server sends back a response with the same (Client, Transaction) 5.7 RTP identifier. A response implicitly ac- knowledges the request, and each new Real-time Transport Protocol (RTP) (see request implicitly acknowledges the last Table III) was designed for real-time response sent to this client by the multiparticipant multimedia applica- server. Multicast is realized by sending tions [Schulzrinne 1996; Schulzrinne et to a group of servers. Datagram support al. 1996]. Even though RTP is called a is provided by indicating in the request transport protocol by its designers, this that no response is expected. Addition- sometimes creates confusion, because ally, VMTP provides a streaming mode RTP by itself does not provide a com- in which an entity can issue a stream of plete transport service. RTP TPDUs requests, receiving the responses back must be encapsulated within the TP- asynchronously [Williamson and Cheri- DUs of another transport protocol that ton 1989]. Flow control is achieved by a provides framing, checksums, and end- rate control scheme borrowed from to-end delivery, such as UDP. The main NETBLT with negotiated interpacket transport layer functions performed by delay time. 12When a TCP entity performs an active close and sends the final ACK, that entity must remain in 5.6 T/TCP the TIME-WAIT state for twice the Maximum Segment Lifetime (MSL), the maximum time any Transaction TCP (T/TCP) (see Table III) TPDU can exist in the network before being dis- is a backwards-compatible extension of carded. This allows time for the other TCP entity TCP that provides efficient transaction- to send and resend its final ACK in case the first copy is lost. This closing scheme prevents TPDUs oriented service in addition to CO ser- of a closed connection from appearing in a subse- vice [Braden 1992a; 1994]. The goal of quent connection between the same pair of TCP T/TCP is to allow each transaction to be entities [Braden 1992b].

ACM Computing Surveys, Vol. 31, No. 4, December 1999 388 • S. Iren et al.

RTP are to provide timestamps and se- terminals. The growing popularity of quence numbers for TSDUs. These LAN-based systems led to SNA version timestamps and sequence numbers may 2, a dynamic peer-to-peer architecture be used by an application written on top called Advanced Peer-to-Peer Network- of RTP to provide error detection, rese- ing (APPN) [Chiong 1996; Dorling et al. quencing of out-of-order data, and/or er- 1997]. Version 3, called High Perfor- ror recovery. However, it should be em- mance Routing (HPR), is a small but phasized that RTP itself does not powerful extension to APPN [Dorling provide any form of error detection or 1997; Freeman 1995]. HPR includes error control. Furthermore, in addition APPN-based high-performance routing, to transport layer functions, RTP also which addresses emerging high-speed incorporates func- digital transmissions. Although SNA is tions; through the use of so-called RTP a proprietary architecture, its structure profiles, RTP provides a means for the has moved closer to TCP/IP with each application to identify the format of data revision. Today, SNA exists mostly for (i.e., whether the data is audio or video, legacy reasons. SNA and TCP/IP are the what compression method is used, etc.). most widely used protocols in the enter- RTP has no notion of a connection; it prise networking arena [Chiong 1996]. may operate over either CO or CL ser- APPN does not map nicely onto ISO’s vice. Framing and segmentation must modular layering since it does not de- be done by the underlying transport fine an explicit transport protocol per layer. RTP is typically implemented as se. However, end-to-end transport layer part of the application and not in the functions such as end-to-end connection operating system kernel; it is a good management are performed as part of example of Application Level Framing APPN. Its CO service is built on top of a and Integrated Layer Processing [Clark reliable network service. Therefore, and Tennenhouse 1990]. It consists of many functions related to error control two parts: data and control. Continuous that are commonly handled at the media data such as audio and video is transport layer are not required in AP- carried in RTP data TPDUs. Control PN’s transport layer. Unlike applica- information is carried in RTCP (RTP tions using TCP, SNA applications do Control Protocol) TPDUs. Control TP- not directly interface to the transport DUs are multicast periodically to the layer. Instead, applications are based on same multicast group as data TPDUs. a “Logical Unit” concept which contains The functions of RTCP can be summa- the services of all three layers above rized as QoS monitoring, intermedia network layer. An interesting feature of synchronization, identification, and ses- APPN is its Class of Service(COS)-based sion size estimation/scaling. The control route selection where the route can be traffic load is scaled with the data traf- chosen based on the associated cost fac- fic load so that it makes up a certain tor. The overall cost for each route de- percentage of the data rate (5%). pends on various factors such as the RTP and RTCP were designed under speed of the link, cost per connection, the auspices of the Internet Engineer- cost per transaction, propagation delay, ing Task Force (IETF). and other user-configurable items such as security. For service and protocol de- 5.8 APPN (SNA) tails of APPN, see Table IV. HPR, an extension to APPN, en- The first version of IBM’s proprietary hances data routing performance by de- Systems Network Architecture (SNA) creasing intermediate router process- (see Table IV) was implemented in 1974 ing. The main components of HPR are in a hierarchical, host-centric manner: Automatic Network Routing (ANR) and the intelligence and control resided at Rapid Transport Protocol (RTP) [Chiong the host, which was connected to dumb 1996; Dorling et al. 1997]. Unlike

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 389

APPN, RTP is an explicit transport pro- a specific protocol at connection estab- tocol which provides a CO-message, re- lishment. liable service. Some of the key features DECnet was widely used throughout of RTP include bidirectional data trans- the 1980’s, and remains an important fer, end-to-end connections without in- legacy protocol for many organizations. termediate session switching, end-to- DECnet was one of the two target proto- end flow and congestion control based col suites for the original implementa- on adaptive rate-based (ARB) control, tion of the popular X11 windowing sys- and automatic switching of sessions tem (the other being TCP/IP). without service disruption. 5.10 XTP 5.9 NSP (DECnet) The Xpress Transport Protocol’s design DECnet (see Table IV) is a proprietary (XTP Version 4.013) (see Table IV) was protocol suite from Digital Equipment coordinated within the XTP Forum to Corporation. Early versions (1974-1982) support a variety of applications such as were limited in functions, providing multimedia distribution and distributed only point-to-point links, or networks of applications over WANs as well as Յ 255 processors. The first version ca- LANs [Strayer et al. 1992]. Originally pable of building networks of thousands XTP was designed to be implemented in of nodes was DECnet Phase IV in 1982 VLSI; hence it has a 64-bit alignment, a (two years earlier than the OSI Refer- fixed-size header, and fields likely to ence Model was standardized). Its layer control a TPDU’s initial processing lo- corresponding most closely to the OSI cated early in the header. However, no Transport layer was called the “End hardware implementations were ever Communication Layer”; it consisted of a built. XTP combines classic functions of proprietary protocol called the Network TCP, UDP, and TP4, and adds new ser- Services Protocol (NSP). vices such as transport multicast, multi- NSP provides CO-message and reli- cast group management, priorities, rate able service. It supports segmentation and burst control, and selectable error and reassembly, and has provisions for and flow control mechanisms. XTP can flow control and congestion control operate on top of network protocols such [Robertson 1996]. Two data channels as IP or ISO CLNP, data link protocols are provided: a “Normal-Data” channel such as 802.2, or directly on top of the and an “Other-Data” channel which car- AAL of ATM. XTP simply requires fram- ries expedited data and messages re- ing and end-to-end delivery from the lated to flow control. DEC engineers underlying service. One of XTP’s most claim that OSI’s TP4 “is essentially an important features is the orthogonality enhancement and refinement of Digi- it provides between communication par- tal’s proprietary NSP protocol” [Martin adigm, error control and flow control. and Leben 1992]. Indeed, there are An XTP user can choose any communi- many similarities between TP4 and cation paradigm (CO, CL, or transac- NSP, such as their mechanisms for han- tion-oriented) and whether or not to en- dling expedited data. able error control and/or flow control. In the early 1990s, DECnet Phase V XTP uses both window-based and rate- was introduced to integrate OSI proto- based flow control. cols with the earlier proprietary proto- cols. Phase V supports a variant of TP4, plus TP0 and TP2 and NSP for back- wards compatibility. A session level ser- 13XTP version 3.6 (the Xpress Transfer Protocol) vice known as “tower matching” re- was a transport and network layer protocol com- solves which protocol(s) are available bined. XTP 4.0 performs only transport layer func- for a particular connection, and selects tions.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 390 • S. Iren et al.

5.11 SSCOP/AAL5 (ATM) authors cite SSCOP as potentially appli- cable as a general purpose end-to-end In the ATM environment, several trans- reliable transport protocol in the ATM port protocols have been and are being environment [Henderson 1995]. For developed; they are referred to as ATM these reasons, we select SSCOP/AAL5 Adaptation Layers (AALs).14 These as the ATM transport protocol to survey AALs have been specified over time to in Table IV. handle different traffic classes [Stall- SSCOP incorporates a number of im- ings 1998] - CO constant bit rate (CBR) portant design principles of other high- requiring synchronization, CO variable speed transport protocols (e.g., SNR bit rate (VBR) requiring synchroniza- [Lundy and Tipici 1994]) including the tion, CO-VBR not needing synchroniza- use of complete state exchange between tion, CL-VBR not requiring synchroni- transport sender and receiver to reduce zation, and most recently available bit reliance on timers. SSCOP’s main draw- rate (ABR) where bandwidth and timing back for general usage is that it is de- requirements are defined by the user in signed to run over an underlying ATM an environment where an ATM back- service that provides in-order PDU de- bone provides the same quality of ser- livery. The SSCOP receiver assumes vice as found in a LAN [Black 1995]. that every out-of-order PDU indicates a Although 5 transport protocols loss (as opposed to possible misordering) (AAL1-5) have been proposed for these and immediately returns a NACK traffic classes, AAL5 which supports (called a USTAT) which acts as a selec- virtually all data applications has tive reject (see Section 4). Over an unor- greatest potential for marketplace suc- dered service, SSCOP would still cor- cess. None of the AALs including AAL5 rectly provide its advertised reliable provide reliable service (although with service; it is simply unclear whether minor changes AAL5 could).15 Another SSCOP would be efficient in an environ- protocol that does provide end-to-end ment where the underlying service mi- reliability is the Service Specific Connec- sorders PDUs. One author indicates tion-Oriented Protocol (SSCOP) [ITU-T that SSCOP could be modified to be a 1994a] which can run on top of AAL5. more general robust transport protocol SSCOP was initially standardized to capable of running above an unreliable, provide for reliable communication of connectionless service [Henderson 1995]. control signals [ITU-T 1995b], not data transfer. It has since been approved for reliable data transfer with I.365.3 5.12 Miscellaneous Transport Protocols [ITU-T 1995a] specifying it as the basis The following are brief descriptions of for providing ISO’s connection-oriented eight experimental transport protocols transport service [ITU-T 1995c]. Some that influenced transport protocol de- velopment. While many of the ideas con- 14Some authors note that since none of the AALs tained in these protocols are innovative provide reliable service, then “It is not really clear whether or not ATM has a transport layer.” and useful, for one reason or another, [Tanenbaum 1996]. We argue that reliability or they have not themselves been success- the lack thereof is not the issue; since ATM vir- ful in the marketplace. This may be the tual circuits/paths are a concatenation of links result of: (1) the marketing problem of between store-and-forward NPDU (cell) switches, introducing new transport protocols into then by definition, the AAL protocols above ATM provide end-to-end transport service. an existing infrastructure, (2) the pri- 15Because of AAL’s lack of reliable service and the marily university nature of the protocol popularity of TCP/IP, there exists a without added industrial support, or (3) with TCP over IP over AAL over ATM. This stack the failure of an infrastructure for essentially puts a transport layer over a network layer over a transport layer over a network layer which the protocol has been specifically to provide reliable data transfer in a mixed designed. TCP/IP - ATM environment [Cole et al. 1996]. Although these protocols were surveyed

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 391 as were TCP and the previous eleven, DTP, a slightly modified version of space limitations prevent presenting TCP, was developed in the early 1990s them in detail. The tables that summa- [Sanghi and Agrawala 1993]. Some key rize these protocols are available at: www. features that distinguish DTP from TCP eecis.udel.edu/˜amer/PEL/survey/. are its use of a send-time control Delta-t was designed in the late 1970s scheme for flow control, selective as well for reliable stream and transaction-ori- as cumulative PACKs, and TPDU-based ented communications on top of a best sequence numbers. effort network such as IP or ISO CLNP Partial Order Connection (POC) was [Watson 1989]. Its main contribution is in recently designed to offer a middle connection management: it achieves haz- ground between the ordered/no-loss ser- ard-free connection management without vice provided by TCP, and the unor- exchanging explicit control TPDUs. dered/uncontrolled-loss service provided SNR was designed in the late 1980s for by UDP [Amer et al. 1994]. The main a high speed data communications net- innovation of POC is the introduction of work [Doshi et al. 1993; Lundy and Tipici partially-ordered/controlled-loss service 1994]. Its key ideas are periodic exchange [Conrad Ph.D dissertation; 1996; Ma- of state information between the trans- rasli 1997; Marasli et al. 1997]. port sender and the transport receiver, Two recent transport protocols, the reduction of protocol overhead by making k-Transmit Protocol (k-XP) and the decisions about blocks of TPDUs instead Timed-Reliable Unordered Message Pro- of individual ones (packet blocking), and tocol (TRUMP), provide CO-message, parallel implementation of the protocol on unordered, controlled-loss service. k-XP a dedicated front-end communications was implemented as an application soft- processor. SNR uses three modes of oper- ware library that provides several major ation: one for virtual networks, one for enhancements to the basic service pro- real-time applications over reliable net- vided by UDP, such as connection man- works, and one for large file transfers. agement and controlled-loss delivery The MultiStream Protocol (MSP), a [Amer et al. 1997]. TRUMP is a varia- feature-rich, highly flexible CO trans- tion of k-XP that allows applications to port protocol designed in the early specify an expiration time for each 1990s, supports applications that re- TSDU [Golden 1997]. quire different types of service for dif- ferent portions of their traffic (e.g., mul- timedia). Transport users can 6. FUTURE DIRECTIONS AND CONCLUSION dynamically change modes of operation In this section, we first summarize how during the life of a connection without several recent trends and technological loss of data [La Porta and Schwartz developments have impacted transport 1993a; 1993b]. MSP defines seven traf- layer design. We then cover a particular fic streams, each of which is defined by trend in more detail; namely, the im- a set of common protocol functions. pact of wireless networking on the These functions specify the order in transport layer. Next, we enumerate which TPDUs will be accepted and some of the debates concerning trans- passed to the user receiver. port layer design. We conclude with a TPϩϩ was developed in the early few final observations. 1990s. It is intended for a heterogeneous internetwork with a large bandwidth-de- 6.1 Impacts of Trends and New lay product [Biersack et al. 1992; Feld- Technologies meier 1993b]. It uses backpressure for flow and congestion control, and is de- Several trends have influenced the de- signed to carry three major application sign of transport protocols over the last classes: constrained latency service, two decades. transactions, and bulk data transfer. Faster satellite links and gigabit net-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 392 • S. Iren et al. works have resulted in networks with the request/response nature of Web in- larger end-to-end bandwidth-delay teractions is a poor match for TCP’s products. These so-called “Long Fat byte-stream [Heidemann 1997]. Even Networks” allow increased amounts of before the introduction of the Web, in- data in transit at any given moment. terest in request/response (transaction) This required extensions to TCP, for protocols was reflected in the develop- example, to prevent wrapping of se- ment of VMTP and T/TCP. quence numbers [Borman et al. 1992]. The delay and jitter sensitivity of au- Fiber optics has improved the quality dio and video based applications re- of communication links, thus shifting quired transport protocols providing the major source of network errors —at something other than reliable service. least for wired networks— from line bit This has led to increased usage of UDP, errors to NPDU losses due to conges- which lacks congestion control features, tion. This fact is exploited by TCP’s resulting in increased Internet conges- Congestion Avoidance and Fast Re- tion. This has created a need for proto- transmit and Recovery algorithms cols that incorporate TCP-compatible which were made necessary because of congestion control, yet provide services the exponential growth in the use of the appropriate for audio/video transmis- Internet — even before the Web made sion (e.g., maybe-loss or controlled-loss). the Internet a household word. Conges- Audio/video transmission also required tion avoidance continues to be an active synchronization as enabled by RTP research area for the TCP community timestamps. [Brakmo et al. 1994; Jacobson 1988; Finally, distributed applications in- Stevens 1997]. volving one-to-many, many-to-one, and Higher speed links and fiber optics many-to-many communication have re- have caused some designers of so-called quired a new multicast paradigm of light-weight transport protocols to shift communication. Unicast protocols such their design goals from minimizing as XTP, VMTP, TPϩϩ, and UDP have transmission costs (at the expense of been extended to support multicast. processing power) to minimizing pro- cessing requirements (at the expense of 6.2 Wireless Networks transmission bandwidth) [Doeringer et al. 1990]. On the other hand, the prolif- Regarding the future of the transport eration of relatively low speed point-to- layer, we note the influence of the rapid point (PPP) and wireless links has re- growth of wireless networks. Although sulted in a varying and complex transport protocols are supposed to be interconnection of low, medium, and independent of the underlying network- high speed links with varying degrees of ing technology, in practice, TCP is tai- loss. This has introduced a new compli- lored to perform in networks where cation into the design of flow control links are tethered. As more wireless and error control algorithms. links are deployed, it becomes more New applications such as transaction likely that the path between transport processing, audio/video transmission, and sender and transport receiver will not the Web have resulted in new and widely be fully wired. Designing a transport varying network service demands. protocol that performs well over a heter- Over the years, TCP has been opti- ogeneous network is difficult because mized for a particular mix of applica- wireless networks have two inherent tions: that is, bulk transfer (e.g., File differences that require, if not their own Transfer Protocol (FTP) and Simple specific transport protocols, at least spe- Mail Transfer Protocol (SMTP)) and re- cific variations of existing ones. mote terminal traffic (e.g., telnet and The first difference is the major cause rlogin). The introduction of the Web has for TPDU gaps at the transport receiver. created new performance challenges, as In wired networks missing, TPDUs are

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 393 primarily due to network congestion as power due to limited power supply. routers discard NPDUs. In wireless net- Therefore, the transport protocols used works, gaps are most likely due to bit on these devices should be less complex errors and hand-off problems. On de- to allow efficient battery usage. This tecting a gap, TCP’s transport receiver approach is taken in Mobile-TCP [Haas responds by invoking congestion control and Agrawal 1997]. It uses the same and avoidance algorithms to slow the splitting approach used in I-TCP, how- transport sender and thereby reduce the ever, instead of identical transport enti- network load. For wireless networks, a ties on both ends of the wireless link, better response is to retransmit lost TP- Mobile-TCP employs an asymmetri- DUs as soon as possible [Balakrishnan cally-based protocol design which re- et al. 1996]. duces the computation and communica- Recent studies have concentrated on tion overhead on the mobile host’s alleviating the effects of non-congestion- transport entity. Because the wireless related losses on TCP performance over segment is known to be a single-hop wireless and other high-loss networks connection, several transport functions [Bakre and Badrinath 1997; Balakrish- can be either simplified or eliminated. nan et al. 1995; Yavatkar and Bhagwat 1994]. These studies follow two funda- 6.3 Debates mentally different approaches. One ap- proach hides any noncongestion-related In reviewing the last two decades of loss from the transport sender by mak- transport protocol research, we first ing the lossy link appear as a higher note there seems to have been two quality link with reduced effective schools of thought on the direction of bandwidth. As a result, the losses that transport protocol design: (1) the “hard- are seen by the transport sender are ware-oriented” school, and (2) the “ap- mostly due to congestion. Some exam- plication-oriented” school. ples include: (1) reliable link layer pro- The hardware-oriented school feels tocols (AIRMAIL [Ayanoglu et al. that transport protocols should be sepa- 1995]), (2) splitting a transport connec- rated from the operating system as tion into two connections (I-TCP [Bakre much as possible, and implementations and Badrinath 1997]), and (3) TCP- should be moved into VLSI, parallel, or aware link layer schemes (snoop [Bal- special purpose processors [Feldmeier akrishnan et al. 1995]). 1993c; Haas 1990; La Porta and The second approach makes the Schwartz 1992; Strayer and Weaver transport sender aware that wireless 1988]. This school claims that the heavy hops exist and does not invoke conges- usage of timers, interrupts, and memory tion control for TPDU losses. Selective read/writes degrades the performance of ACK proposals such as TCP SACK [Ma- a transport protocol, and thus special this et al. 1996] and SMART [Keshav purpose architectures are necessary. and Morgan 1997] can be considered They would also point out that as net- examples of this approach. One study work rates reach the Gigabit/sec range, shows that a reliable link layer protocol it will be more difficult to process TP- with some knowledge of TCP provides DUs in real time. Therefore TPDU for- 10-30% higher throughput than a link mats should be carefully chosen to allow layer protocol without that knowledge parallel processing and to avoid TPDU [Balakrishnan et al. 1996]. Further- field limitations for sequence number- more, selective ACKs and explicit loss ing, window size, etc. This school of notifications result in significant perfor- thought is reflected in the design of mance improvements. protocols such as MSP and XTP. The second inherent difference is that By contrast, the application-oriented wireless devices are constrained in both school prefers moving some transport their computing and communication layer functions out of the operating sys-

ACM Computing Surveys, Vol. 31, No. 4, December 1999 394 • S. Iren et al. tem into the user application, thereby mization. The lightweight protocol integrating the upper layer end-to-end school of thought argues that since processing, and achieving a faster appli- high-speed networks offer extremely cation processing pipeline. This school low error rates, protocols should be would claim that the bottleneck is actu- “success-oriented” and optimized for ally in operations such as checksums maximum throughput [Doeringer et al. and presentation layer conversion, all of 1990; Haas 1990; La Porta and which can be done more efficiently if Schwartz 1992]. Others emphasize the they are integrated with the copying of increased deployment of wireless net- data into application user space. This works, which are more prone to bit er- school of thought is reflected in the con- rors; they argue that protocols should be cepts of Application Level Framing designed to operate in both environ- (ALF) and Integrated Layer Processing ments efficiently. The SMART retrans- (ILP) [Clark and Tennenhouse 1990]. mission strategy [Keshav and Morgan ALF states that “the application should 1997] is a good example of the latter break the data into suitable aggregates, approach. and the lower layers should preserve these frame boundaries as they process 6.4 Final Observations the data”. ILP is an implementation concept which “permits the implemen- Several of the experimental protocols sur- tor the option of performing all (data) veyed in this paper present beneficial manipulation steps in one or two inte- ideas for the overall improvement of grated processing loops”. Ongoing re- transport protocols. However, most of search shows that performance gains these schemes are not widely used. This can be obtained by using ALF and ILP may have more to do with the difficulty of [Ahlgren et al. 1996; Braun 1995; Braun introducing new transport protocols than and Diot 1995; 1996; Diot and Gagnon with the merits of the ideas themselves. 1999; Diot et al. 1995]. ALF concepts In the 1980’s, the primary battle was are reflected in the design of protocols between TCP and ISO TP0/4—or, more such as RTP, POC, and TRUMP. broadly, between the Internet (TCP/IP) A second debate, orthogonal to the de- suite and OSI suite in general. Never- bate about where transport services ending hallway discussions debated should be implemented, is the debate which would win. Today the victor is about functionality. Older transport ser- TCP/IP, due to a combination of “tim- vices tend to focus on a single type of ing”, “technology”, “implementations,” service. Some recent protocol designers and “politics” [Tanenbaum 1996]. Some- advocate high degrees of flexibility, or- times the usefulness of transport proto- thogonality, and user-configurability to col research is questioned—even re- meet the requirements of various applica- search to improve TCP—because the tions. These designers feel that: (1) a sin- market base of the current TCP version gle transport protocol offer different com- is so large that inertia prevents new munication paradigms, and different good ideas from propagating into the levels of reliability and flow control, and user community. However, the success- the transport user should be able to ful incorporation of SACK [Mathis choose among them, and (2) protocol de- 1996], TCP over high-speed networks sign should include multiple functions to [Borman 1992], and T/TCP [Braden support multicast, real-time data, syn- 1994] into implementations clearly chronization, security, user-defined QoS, shows that transport research is useful. etc., that will meet the requirements of What should be clear is that for any today’s more complex user applications. new idea to have influence in practice in This is reflected in the designs of XTP, the short term, it must be interoperable MSP and POC, for example. with elements of the existing TCP/IP A third debate concerns protocol opti- protocol suite.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 395

APPENDIX

Table II. (a) General features of TCP, UDP, TP4, and TP0 General Features TCP (Internet) UDP (Internet) TP4 (OSI) TP0 (OSI) Well suited for Stream transport Applications Message-oriented Message-oriented over unreliable where normal transport over transport over networks case is operation unreliable reliable networks over LAN with networks low loss rates (e.g., NFS, SNMP) Developed for General purpose General purpose General purpose Minimum reliable data datagram reliable data function protocol communication transport with communication over reliable CO over unreliable minimum of over unreliable networks networks protocol networks mechanism Features for special Nagle algorithm1 None Oriented for applications for interactive data teletext; OSI-TCP gateway2 Underlying service IP IP Unreliable Reliable CO network network (e.g., X.25) Has been implemented in Software Software Software Software 1Since interactive data TPDUs often carry 1 byte of user data, overhead of 40-byte header (20 for TCP; 20 for IP) adds to congestion in WANs. Therefore, Nagle Algorithm restricts a TCP connection to having at most one outstanding unacknowledged small TPDU. No additional small TPDUs can be sent until the ACK is received. 2TP0 can be used to create OSI transport service on top of TCP’s reliable byte-stream service, enabling OSI apps to run over TCP/IP [Rose and Cass 1987].

Table II. (b) Service features of TCP, UDP, TP4, and TP0 Service Features TCP (Internet) UDP (Internet) TP4 (OSI) TP0 (OSI) CO-byte vs. CO-message vs. CL CO-byte CL CO-message CO-message Reliability No-loss vs. Uncontrolled- No-loss Uncontrolled- No-loss No-loss3 loss vs. Controlled-loss loss No-duplicates vs. Maybe- No-duplicates Maybe- No-duplicates No-duplicates3 duplicates duplicates Ordered vs. Unordered vs. Ordered Unordered Ordered Ordered3 Partially-ordered Data-integrity vs. No- Data-integrity Data-integrity Data-integrity Data- data-integrity vs. Partial- and No-data- integrity3 data-integrity integrity4 Multicast vs Unicast Unicast Unicast5 Unicast Unicast Priority vs. No-priority No-priority Priority Priority Priority Security vs. No-security No-security No-security Security Security6 3Assumed to be part of the underlying network service. 4UDP provides an optional checksum on the header and user data. 5UDP can provide multicast service with IP support. 6Protection is a QoS parameter that can be specified as protection against passive monitoring, passive modification, replay, addition, or deletion.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 396 • S. Iren et al.

Table II. (c) Protocol features of TCP, UDP, TP4, and TP0 Protocol Features TCP (Internet) UDP (Internet) TP4 (OSI) TP0 (OSI) CO vs CL Protocol CO CL CO CO Transaction-oriented No No1 No No CO Protocol In-band vs. Out-of-band In-band N/A In-band In-band Signaling Unidirectional vs. Bidirectional N/A Bidirectional Bidirectional Bidirectional conn Conn establishment 3-way N/A 3-way 3-way User data in conn Not N/A Permitted3 Not permitted Establishment permitted2 Conn termination 4-way N/A 3-way N/A4 Acknowledgments Piggybacking Yes N/A Yes N/A5 Cumulative vs. Selective Cumulative6 N/A Selective N/A5 Sender-dependent vs. Sender- N/A Sender- N/A5 Sender-independent independent dependent and optional Sender- independent Error Control Error detection Sequence no; Optional Sequence no; No Checksum on checksum on Length field; header and header and Optional data data checksum on header and data Error reporting No No No No Error recovery PAR No PAR No Flow/Congestion Control End-to-end flow control Window No Window Backpressure4 Window allocation Byte-oriented N/A TPDU-oriented N/A Rate control parameters N/A N/A N/A N/A Congestion control Implicit No Implicit access No access control control TPDU Format TPDU numbering Byte-oriented No TPDU-oriented No Min TPDU header/trailer 20-byte header 8-byte header 5-byte header 3-byte header Multiplexing/Demultiplexing Yes Yes Yes No Splitting/Recombining No No Yes No Concatenation/Separation Yes7 No Yes No Blocking/Deblocking Yes8 No Yes Yes Segmentation/Reassembly Yes No Yes Yes 1But frequently used for transaction based applications. 2While data may be sent in TCP connection opening TPDU, it cannot be delivered to user receiver until 3-way-handshake is complete. 3Not Ͼ32 octets. This data (e.g., password) may help decide if connection should be established, thereby allowing estab to depend on user data. 4Assumed to be part of the underlying network service. 5There are no ACKs in TP0. 6A version of TCP which combines selective ACKs and cumulative ACKs has been specified [Mathis et al. 1996]. 7Data and ACK TPDU is concatenated in the case of piggybacking. 8However, boundaries are not preserved between transport sender and transport receiver.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 397

Table III. (a) General features of NETBLT, VMTP, T/TCP, and RTP General Features NETBLT (Internet) VMTP (Internet) T/TCP (Internet) RTP (Internet) Well suited for Long-delay paths Transactions Transactions Real-time data Developed for Bulk data Remote Improving TCP Real-time, multi- transfer Procedure Calls, performance for participant multicast, real- transaction multimedia time processing conferences communication Features for special User controlled Naming 3-way can be Integrated layer applications window-based mechanism for bypassed; processing; QoS flow control process Shorter delay in feedback; Can be migration, mobile TIME-WAIT tailored to hosts; state specific Conditional applications; message delivery Mixing, for real-time translating via application; SSRC fields Optimized for page-level network file access; Call forwarding Underlying service IP Datagram IP UDP,1 AALS network Has been implemented in Software Software Software Software 1RTP may be used with other suitable underlying network or transport protocols.

Table III. (b) Service features of NETBLT, VMTP, T/TCP, and RTP NETBLT Service Features VMTP (Internet) T/TCP (Internet) RTP (Internet) (Internet) CO-byte vs. CO-message vs. CL CO-message2 CL CO-byte and CL CL3 Reliability No-loss vs. Uncontrolled- No-loss No-loss No-loss Uncontrolled- loss vs. Controlled-loss loss No-duplicates vs. Maybe- No-duplicates No-duplicates No-duplicates No-duplicates duplicates Ordered vs. Unordered vs. Ordered Ordered Ordered Unordered Partially-ordered Data-integrity vs. No- Data-integrity Data-integrity Data-integrity No-data- data-integrity vs. Partial- integrity data-integrity Multicast and Unicast Unicast Multicast and Unicast Unicast4 Unicast Priority vs. No-priority No-priority Priority No-priority No-interrupt/ Priority Security vs. No-security No-security Security5 No-security Security 2In this case message is a “buffer” as explained in section 5.4. 3T/TCP provides CO or CL service. For CO service, transport user has to issue the open, send, and close primitives as in TCP. For CL service, only the sendto primitive is required. 4RTP can provide multicast service if available in lower layer protocols. 5VMTP’s security feature was designed but never implemented.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 398 • S. Iren et al.

Table III. (c) Protocol features of NETBLT, VMTP, T/TCP, and RTP Protocol Features NETBLT (Internet) VMTP (Internet) T/TCP (Internet) RTP (Internet) CO vs CL protocol CO CO CO CL1 Transaction-oriented No Yes Yes No CO protocol In-band vs. Out-of-band In-band In-band In-band N/A signaling Unidirectional vs. Unidirectional Unidirectional Bidirectional N/A Bidirectional conn Conn establishment 2-way Implicit 3-way and N/A Implicit User data in conn Not permitted Permitted Permitted N/A Establishment Conn termination 2-way Implicit 4-way N/A Acknowledgments Piggybacking N/A No2 Yes N/A Cumulative vs. Selective Cumulative Selective Cumulative N/A Sender-dependent vs. Sender- Sender- Sender- N/A Sender-independent dependent dependent dependent Error Control Error detection Sequence No; Transaction ID; Sequence No; Sequence No Length field; Length field; Checksum on Header Checksum on header and checksum; header and data Optional data data checksum Error reporting Selective reject Selective reject No Special reports3 Error recovery ARQ with ARQ with PAR No selective selective retransmission retransmission Flow/Congestion Control End-to-end flow control Window and Rate Window No Rate Window scheme Buffer-oriented N/A Byte-oriented N/A Rate control parameters Burst size and Interpacket N/A N/A rate delay Congestion control No Implicit access Implicit access No4 control control TPDU Format TPDU numbering TPDU-oriented TPDU-oriented Byte-oriented TPDU-oriented Min TPDU 24-byte header 64-byte header 24-byte header 12-byte header header/trailer Multiplexing/Demultiplexing Yes Yes Yes Yes5 Splitting/Recombining No No No No Concatenation/Separation No No No No Blocking/Deblocking No No Yes No Segmentation/Reassembly Yes Yes Yes No 1RTP has no notion of a connection; however RTCP provides out-of-band signaling. 2Normally a response implicitly acks request; each new request implicitly acks last response rec’d. Thus VMTP provides implicit but not explicit piggybacked ACK info. 3Sender and Receiver reports are exchanged via RTCP, a control protocol that monitors data delivery and provides minimal control and id functions. 4Application may perform congestion control by using the QoS feedback provided by the RTCP control messages. 5Visa synchronization source identifier (SSRC) field. ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 399

Table IV. (a) General features of SNA/APPN, DECnet/NSP, XTP, and ATM/SSCOP/AAL5 General Features APPN (SNA) NSP (DECnet) XTP SSCOP/AAL5 (ATM) Well suited for Distributed Message oriented Reliable High speed systems over unreliable transport reliable data and networks; dist’d multicast; Real- control signaling systems time datagrams; Fast connection setup Developed for Diverse General purpose Single protocol ATM networks platforms, reliable data separates policy topologies, and comm over from mechanism; apps into single unreliable nets VLSI network implementation Features for special Class of Service Expedited data Out-of-band block applications selection1 data2; PACKs/NACKs Orthogonality; Selectable flow and error control Underlying service APPN network DECnet layer 3 Any network ATM layer (virtual protocol layer3 or directly circuit) over LLC, MAC, ATM AALs Has been Software and implemented in Software Software4 Hardware 1COS-based routing is provided by APPN network layer. APPN transport takes advantage by using different service classes. 2Useful for passing control info about state of transport user processes, or passing semantic info about data, or for event sequencing via timestamps. 3Most of all XTP implementations operate over IP. 4XTP was designed for but never implemented in VLSI.

Table IV. (b) Service features of SNA/APPN, DECnet/NSP, XTP, and ATM/SSCOP/AAL5 Service Features APPN (SNA) NSP (DECnet) XTP SSCOP/AAL5 (ATM) CO-byte vs. CO-message vs. CO-message CO-message CL and CO- CO-message CL byte Reliability No-loss vs.Uncontrolled- No-loss5 No-loss No-loss and No-loss loss vs. Controlled-loss Uncontrolled- loss No-duplicates vs. Maybe- No-duplicates5 No-duplicates No-duplicates No-duplicates duplicates and Duplicates Ordered vs. Unordered Ordered5 Ordered Ordered and Ordered vs. Partially-ordered Unordered Data-integrity vs. No- Data- No-data- Data-integrity Data-integrity6 data-integrity vs. Partial- integrity5 integrity and No-data- data-integrity integrity Multicast and Unicast Unicast Unicast Multicast and Unicast Unicast Priority vs No-priority Priority Priority Priority Priority7 Security vs No-security Security No-security No-security No-security 5Provided by the underlying network. 6Assumed to be provided by underlying service. SSCOP itself has no checksum. 7I.365.3 allows optionally expedited data using SSCOP unassured data delivery.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 400 • S. Iren et al.

Table IV. (c) Protocol features of SNA/APPN, DECnet/NSP, XTP, and ATM/SSCOP/AAL5 Protocol Features APPN (SNA) NSP (DECnet) XTP SSCOP/AAL5 (ATM) CO vs. CL protocol CO CO CO CO Transaction-oriented Yes No Yes No CO protocol In-band vs. Out-of-band Out-of-band In-band In-band N/A1 signaling Unidirectional vs. Uni and Bidirectional Bidirectional Bidirectional Bidirectional conn Bidirectional Conn establishment 2-way Implicit 2-way User data in conn No Yes Permitted Yes2 establishment Conn termination 2-way 4-way3 2-way Acknowledgments Piggybacking N/A4 Yes No No Cumulative vs Selective N/A4 Both Selective Both5 Sender-dependent vs. N/A4 Sender- Sender- Sender-independent independent independent6 Error control Error detection No7 Sequence No Sequence No; Sequence No; Length field; Checksum on Header header and data checksum; Optional data checksum Error reporting No7 Selective Selective reject8 reject Error recovery No7 PAR ARQ with ARQ with Go-Back-N selective repeat (optional sel retrans) Flow/Congestion control End-to-end flow control N/A9 Window Window and Window-credit Rate Window scheme N/A9 TPDU(segment) Byte- TPDU-oriented -oriented oriented Rate control parameters N/A9 N/A Burst size N/A and rate Congestion control N/A9 Explicit Explicit No10 access control access control TPDU format TPDU numbering TPDU-oriented TPDU-oriented Byte-oriented TPDU-oriented Min TPDU header/trailer 9-byte header 32-byte header Multiplexing/Demultiplexing No No Yes No Splitting/Recombining No No No No Concatenation/Separation No Yes11 No No Blocking/Deblocking No No No No Segmentation/Reassembly Yes Yes Yes Yes 1See Section 5.11. 2But no guarantee of delivery. 3XTP also employs 2-way handshake and abortive close modes of connection termination. 4APPN does not need ACKs because it assumes a fully reliable network service. 5STAT is full map of missing/ACKed PDUs. 6Sender periodically polls receiver. 7Responsibility of the underlying network. 8Transport receiver sends one USTAT (NACK) which provides selective reject info for every missing PDU. 9APPN does not perform flow/congestion control at transport layer; relies on network layer’s “adaptive pacing”. 10Local flow control backoff is discussed, but not specified/standardized. 11Data and ACK TPDU is concatenated in the case of piggybacking.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 401

ACKNOWLEDGMENTS of indirect TCP. IEEE Trans. Comput. 46,3, 260–278. The authors wish to thank several indi- BALAKRISHNAN, H., PADMANABHAN,V.N.,SESHAN, viduals: R. Marasli and E. Golden who S., AND KATZ, R. H. 1996. A comparison of contributed to an early draft of this mechanisms for improving TCP performance over wireless links. SIGCOMM Comput. paper; a number of protocol experts who Commun. Rev. 26, 4, 256–269. reviewed individual service/protocol BALAKRISHNAN, H., SESHAN, S., AND KATZ,R. summaries and their respective table H. 1995. Improving reliable transport and entries, including A. Agrawala, A. Bhar- handoff performance in cellular wireless net- gava, R. Case, L. Chapin, B. Dempsey, works. Wireless Networks 1, 4, 469–481. D. Feldmeier, T. La Porta, D. Piscitello, BERNERS-LEE, T., FIELDING, R., AND FRYSTYK, H. 1996. Hypertext transfer protocol-- K. Sabnani, H. Schulzrinne, D. Sanghi, HTTP/1.0. RFC 1945. T. Strayer, E. Tremblay, R. Watson, A. BERTSEKAS,D.AND GALLAGER, R. 1992. Data Weaver, C. Williamson; M. Taube and Networks. 2nd ed. Prentice-Hall, Inc., Up- the anonymous reviewers who made per Saddle River, NJ. valuable suggestions. BHARGAVA, A., KUROSE,J.F.,TOWSLEY, D., AND VANLEEMPUT, G. 1991. Performance com- parison of error control schemes in high-speed computer communication networks. In REFERENCES Broadband Switching: Architectures, Proto- cols, Design, and Analysis, C. Chas, V. K. AHLGREN, B., BJORKMAN, M., AND GUNNINGBERG, Konangi, and M. Sreetharan, Eds. IEEE P. 1996. Integrated layer processing can be Computer Society Press, Los Alamitos, CA, hazardous to your performance. In Proceed- 416–426. ings of the Conference on Protocols for High- BIERSACK, E., COTTON, C., FELDMEIER, D., MCAU- Speed Networks, V (France, Oct.), W. Dabbous LEY, A., AND SINCOSKIE, W. 1992. Gigabit and C. Diot, Eds. IFIP, 167–181. networking research at Bellcore. IEEE Net- AHN,J.S.,DANZIG,P.B.,LIU, Z., AND YAN, work 6, 2 (Mar.), 42–48. L. 1995. Evaluation of TCP Vegas: Emula- BLACK, U. 1995. ATM Foundation for Broad- tion and experiment. SIGCOMM Comput. band Networks. Prentice-Hall series in ad- Commun. Rev. 25, 4 (Oct.), 185–205. vanced communications technologies. Pren- AMER,P.D.,CHASSOT, C., CONNOLLY,T.J.,DIAZ, tice-Hall, Inc., Upper Saddle River, NJ. M., AND CONRAD, P. 1994. Partial-order BORMAN, D., BRADEN, B., AND JACOBSON,V. transport service for multimedia and other 1992. TCP extensions for high perfor- applications. IEEE/ACM Trans. Netw. 2,5 mance. RFC 1323. (Oct. 1994), 440–456. BORMANN, C., OTT, J., GEHRCKE, H., AND KERSCHAT, AMER, P., CONRAD, P., GOLDEN, E., IREN, S., AND T. 1994. MTP-2: towards achieving the CARO, A. 1997. Partially ordered, partially s.e.r.o. properties for multicast transport. In reliable transport service for multimedia ap- Proceedings of the International Conference on plications. In Proceedings of the Conference Computer Communications Networks (San on Advanced /Informa- Francisco, CA, Sept.), tion Distribution Research Program (College BRADEN, R. 1992a. Extending TCP for transac- Park, MD, Jan.). tions -- Concepts. RFC 1379. AMER,P.D.,IREN, S., SEZEN, G., CONRAD, P., BRADEN, R. 1992b. TIME-WAIT assassination TAUBE, M., AND CARO, A. 1999. Network- conscious GIF image transmission. Comput. hazards in TCP. RFC 1337. Netw. J. 31, 7 (Apr.), 693–708. BRADEN, R. 1994. T/TCP -- TCP extensions for ARMSTRONG, S., FREIER, A., AND MARZULLO, transactions functional specification. RFC K. 1992. Multicast transport protocol. 1644. RFC 1301. BRADEN,R.AND JACOBSON, V. 1988. TCP exten- AYANOGLU, E., PAUL, S., LAPORTA,T.F.,SABNANI, sions for long-delay paths. RFC 1072. K. K., AND GITLIN, R. D. 1995. AIRMAIL: a BRADEN, R., ZHANG, L., BERSON, S., HERZOG, S., link-layer protocol for wireless networks. AND JAMIN, S. 1997. Resource reservation Wireless Networks 1, 1 (Feb. 1995), 47–60. protocol (RSVP) -- Version 1 functional speci- BAE, J., SUDA, T., AND WATANABE, N. 1991. fication. RFC 2205. Evaluation of the effects of protocol process- BRAKMO, L. S., O’MALLEY,S.W.,AND PETERSON,L. ing overhead in error recovery schemes for a L. 1994. TCP Vegas: New techniques for high-speed packet switched network: link-by- congestion detection and avoidance. SIG- link versus edge-to-edge schemes. IEEE J. COMM Comput. Commun. Rev. 24, 4 (Oct.), Sel. Areas Commun. 9 (Dec.), 1496–1509. 24–35. BAKRE,A.V.AND BADRINATH, B. R. 1997. BRAUDES,R.AND ZABELE, S. 1993. Require- Implementation and performance evaluation ments for multicast protocols. RFC 1458.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 402 • S. Iren et al.

BRAUN, T. 1995. Limitations and implementa- DIOT, C., HUITEMA, C., AND TURLETTI,T. tion experiences of integrated layer processing. 1995. Multimedia applications should be In Proceedings of the Conference on GISI, adaptive. In Proceedings of the Workshop on Springer-Verlag, New York, 149–156. HPCS (Mystic, CT, Aug.), IFIP. BRAUN,T.AND DIOT, C. 1995. Protocol imple- DOERINGER, W., DYKEMAN, D., KAISERWERTH, M., mentation using integrated layer processing. MEISTER, B., RUDIN, H., AND WILLIAMSON,R. SIGCOMM Comput. Commun. Rev. 25,4 1990. A survey of lightweight transport pro- (Oct.), 151–161. tocols for high speed networks. IEEE Trans. BRAUN,T.AND DIOT, C. 1996. Automated code Commun. 38, 11 (Nov.), 2025–2039. generation for integrated layer processing. DORLING, B., LENHARD, P., LENNON, P., AND USKOK- In Proceedings of the Conference on Protocols OVIC, V. 1997. Inside APPN and HPR: The for High-Speed Networks, V (France, Oct.), W. Essential Guide to the New SNA. Prentice- Dabbous and C. Diot, Eds. IFIP, 182–197. Hall, New York, NY. BROWN,G.M.,GOUDA,M.G.,AND MILLER,R.E. DOSHI, B., JOHRI, P., NETRAVALI, A., AND SABNANI, 1989. Block acknowledgement: redesigning K. 1993. Error and flow control perfor- the window protocol. SIGCOMM Comput. mance of a high speed protocol. IEEE Trans. Commun. Rev. 19, 4 (Sept.), 128–135. Commun. 41, 5 (May). CHERITON, D. 1988. VMTP: Versatile message DUPUY, S., TAWBI, W., AND HORLAIT,E. transaction protocol: Protocol specification. 1992. Protocols for high-speed multimedia RFC 1045. communications networks. Comput. Com- CHERITON,D.AND WILLIAMSON, C 1989. VMTP mun. 15, 6 (July/Aug. 1992), 349–358. as the transport layer for high performance FALL,K.AND FLOYD, S. 1996. Simulation-based distributed systems. IEEE Commun. Mag. comparisons of Tahoe, Reno and SACK TCP. 27, 6 (June), 37–44. SIGCOMM Comput. Commun. Rev. 26,3, CHIONG, J. 1996. SNA Interconnections: Bridg- 5–21. ing and Routing SNA in Hierarchical, Peer, FELDMEIER, C. C. 1990. Multiplexing issues in and High-Speed Networks. McGraw-Hill se- communication system design. SIGCOMM ries on computer communications. McGraw- Comput. Commun. Rev. 20, 4 (Sept.), 209– Hill, Inc., New York, NY. 219. CLARK, D. 1982. Window and acknowlegement FELDMEIER, D. 1993a. A framework of architec- strategy in TCP. RFC 813. tural concepts for high-speed communications CLARK, D., LAMBERT, M., AND ZHANG,L. systems. IEEE J. Sel. Areas Commun. 11,4 1987. NETBLT: A bulk data transfer proto- (May), 480–488. col. RFC 998. FELDMEIER, D. 1993b. An overview of the CLARK,D.D.AND TENNENHOUSE, D. L. 1990. TPϩϩ transport protocol project. In High Architectural considerations for a new gener- ation of protocols. SIGCOMM Comput. Com- Performance Networks--Frontiers and Experi- mun. Rev. 20, 4 (Sept.), 200–208. ence, A. Tantawy, Ed. Kluwer Academic Publishers, Hingham, MA, 157–176. COLE, R., SHUR, D., AND VILLAMIZAR, C. 1996. IP over ATM: A framework document. RFC FELDMEIER, D. 1993c. A survey of high perfor- 1932. mance protocol implementation techniques. In High Performance Networks---Technology CONNOLLY, T., AMER, P., AND CONRAD, P. 1994. An extension to TCP: Partial order ser- and Protocols, A. Tantawy, Ed. Kluwer Ac- vice. RFC 1693. ademic Publishers, Hingham, MA, 29–50. CONRAD, P. 2000. Partial order and partial reli- FELDMEIER,D.AND BIERSACK, E. 1990. ability transport service innovations in a mul- Comparison of error control protocols for high timedia application context. (In pro- bandwidth-delay product networks. In Pro- gress). Ph.D. Dissertation. University of ceedings of the Conference on Protocols for Delaware, Newark, DE. High-Speed Networks, II (Palo Alto, CA, CONRAD, P., GOLDEN, E., AMER, P., AND MARASLI,R. Nov.), M. Johnson, Ed. North-Holland Pub- 1996. A multimedia document retrieval sys- lishing Co., Amsterdam, The Netherlands, tem using partially-ordered/partially-reliable 271–295. transport service. In Proceedings of the Con- FLOYD, S. 1996. Issues of TCP with SACK. ference on Multimedia Computing and Net- Tech. Rep.. Lawrence Livermore National working (San Jose, CA, Jan.). Laboratory, Livermore, CA. DABBOUS,W.S.AND DIOT, C. 1997. High-perfor- FLOYD,S.AND JACOBSON, V. 1993. Random mance protocol architecture. Comput. Netw. early detection gateways for congestion avoid- ISDN Syst. 29, 7, 735–744. ance. IEEE/ACM Trans. Netw. 1, 4 (Aug. DEERING, S. 1989. Host extensions for IP multi- 1993), 397–413. casting. RFC 1112. FLOYD, S., JACOBSON, V., MCCANNE, S., LIU, C.-G., DIOT,C.AND GAGNON, F. 1999. Impact of out-of- AND ZHANG, L. 1995. A reliable multicast sequence processing on the performance of framework for light-weight sessions and ap- data transmission. Comput. Netw. J. 31,5 plication level framing. SIGCOMM Comput. (Mar.), 475–492. Commun. Rev. 25, 4 (Oct.), 342–356.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 403

FREEMAN, R. L. 1995. Practical Data Communi- JACOBSON, V. 1988. Congestion avoidance and cations. Wiley series in telecommunications control. SIGCOMM Comput. Commun. Rev. and signal processing. Wiley-Interscience, 18, 4 (Aug. 1988), 314–329. New York, NY. JAIN, R. 1989. A delay-based approach for con- GOLDEN, E. 1997. TRUMP: Timed-reliability gestion avoidance in interconnected heteroge- unordered message protocol. Master’s The- neous computer networks. SIGCOMM Com- sis. University of Delaware, Newark, DE. put. Commun. Rev. 19, 5 (Oct. 1989), 56– HAAS, Z. 1999. A communication architecture 71.delay.ps|. for high-speed networking. In Proceedings of KARN,P.AND PARTRIDGE, C. 1987. Improving the Conference on IEEE INFOCOM 2000 (San round-trip time estimates in reliable trans- Francisco, CA), IEEE Computer Society port protocols. SIGCOMM Comput. Com- Press, Los Alamitos, CA, 433–441. mun. Rev. 17, 5 (Aug. 1987), 2–7. HAAS,Z.AND AGRAWAL, P. 1997. Mobile-TCP: KENT,C.A.AND MOGUL, J. C. 1987. An asymmetric transport protocol design for Fragmentation considered harmful. SIG- mobile systems. In Proceedings of the IEEE COMM Comput. Commun. Rev. 17, 5 (Aug. Conference on Computer Communications, 1987), 390–401. IEEE Press, Piscataway, NJ. KESHAV, S. 1991. A control-theoretic approach HAN,R.AND MESSERSCHMITT, D. 1996. to flow control. SIGCOMM Comput. Com- Asymptotically reliable transport of multime- mun. Rev. 21, 4 (Sept. 1991), 3–15. dia/graphics over wireless channels. In Pro- KESHAV, S. 2000. Packet-pair flow control. (To ceedings of the Conference on Multimedia appear). IEEE/ACM Trans. Netw. 8. Computing and Networks (San Jose, KESHAV,S.AND MORGAN, S. 1997. SMART re- CA), IEEE Computer Society Press, Los transmission: Performance with overload and Alamitos, CA, 99–110. random losses. In Proceedings of on IEEE HEIDEMANN, J. 1997. Performance interactions INFOCOM 1997 (Kobe City, Japan, between P-HTTP and TCP implementations. Apr.), IEEE Computer Society Press, Los SIGCOMM Comput. Commun. Rev. 27,2,65– Alamitos, CA. 73. LA PORTA,T.F.AND SCHWARTZ, M. 1992. HENDERSON, T. R. 1995. Design principles and Architectures, features, and implementation performance analysis of SSCOP: a new ATM of high-speed transport protocols. In Com- adaptation layer protocol. SIGCOMM Com- puter Communications: Architectures, Proto- put. Commun. Rev. 25, 2 (Apr. 1995), 47–59. cols, and Standards, W. Stallings, Ed. IEEE IREN, S. 1999. Network-conscious image com- Press, Piscataway, NJ, 217–225. pression. Ph.D. Dissertation. University of LA PORTA,T.F.AND SCHWARTZ, M. 1993a. The Delaware, Newark, DE. multistream protocol: A highly flexible high- IREN,S.AND AMER, D. 2000. SPIHT-NC: Net- speed transport protocol. IEEE J. Sel. Areas work-conscious zerotree encoding. In Pro- Commun. 11, 4 (May), 519–530. ceedings of the on 2000 IEEE Data Compres- LA PORTA,T.F.AND SCHWARTZ, M. 1993b. sion Conference (Snowbird, Ut., Mar.), IEEE Performance analysis of MSP: feature-rich Computer Society Press, Los Alamitos, CA. high-speed transport protocol. IEEE/ACM ISO. 1989. International standard 7498-2. In- Trans. Netw. 1, 6 (Dec. 1993), 740–753. formation processing systems, open systems LIN,S.AND COSTELLO, D. 1982. Error Control interconnection, basic reference model, part 2: Coding. Prentice-Hall, New York, NY. Security architecture. International Stan- LISKOV, B., SHRIRA, L., AND WROCLAWSKI,J. dards Organization. 1990. Efficient at-most-once messages based ITU-T. 1994a. Recommendation Q.2110. B-ISDN on synchronized clocks. SIGCOMM Comput. ATM adaptation layer-service specific connec- Commun. Rev. 20, 4 (Sept.), 41–49. tion oriented protocol (SSCOP). LUNDY,G.M.AND TIPICI, H. A. 1994. ITU-T. 1994b. Recommendation X.200. Infor- Specification and analysis of the SNR high- mation technology, open systems interconnec- speed transport protocol. IEEE/ACM Trans. tion: Basic reference model. Netw. 2, 5 (Oct. 1994), 483–496. ITU-T. 1995a. Recommendation I.365.3. B-ISDN MARASLI, R. 1997. Performance analysis of par- ATM adaptation layersublayers: Service-spe- tially ordered and partially reliable transport cific coordination function to provide the con- services. Ph.D. Dissertation. University of nection-oriented transport service (SSCF- Delaware, Newark, DE. COTS). MARASLI, R., AMER, P., AND CONRAD, P. 1996. ITU-T. 1995b. Recommendation Q.2144. B-ISDN Retransmission-based partially reliable ser- signalling ATM adaptationlayer (SAAL)-layer vices: An analytic model. In Proceedings of management for the SAAL at the network on IEEE INFOCOM 1996 (San Francisco, Ca- node interface (NNI). lif., Mar.), IEEE Computer Society Press, ITU-T. 1995c. Recommendation X.214. Infor- Los Alamitos, CA. mation technology, open systems interconnec- MARASLI, R., AMER,P.D.,AND CONRAD,P.T. tion, transport service definition. 1997. An analytic study of partially ordered

ACM Computing Surveys, Vol. 31, No. 4, December 1999 404 • S. Iren et al.

transport services. Comput. Netw. ISDN puter Networks, Architecture and Applications Syst. 29, 6, 675–699. (NETWORKS ’92, Trivandrum, India, Oct. MARTIN,J.AND LEBEN, J. 1992. DECnet Phase 28–29), S. V. Raghavan, G. V. Bochmann, and V: An OSI Implementation. Prentice-Hall, G. Pujolle, Eds. Elsevier Sci. Pub. B. V., Inc., Upper Saddle River, NJ. Amsterdam, The Netherlands, 171–180. MATHIS, M., MAHDAVI, J., FLOYD, S., AND ROMANOW, SCHULZRINNE, H. 1996. RTP profile for audio A. 1996. TCP selective acknowledgment and video conferences with minimal control. options. RFC 2018. RFC 1890. MCAULEY, D. 1990. Protocol design for high SCHULZRINNE, H., CASNER, S., FREDERICK, R., AND speed networks. Ph.D. Dissertation. Uni- JACOBSON, V. 1996. RTP: A transport pro- versity of Cambridge, Cambridge, UK. tocol for real-time applications. RFC 1889. MCCANNE, S., JACOBSON, V., AND VETTERLI,M. SHENKER, S., PARTRIDGE, C., AND GUERIN,R. 1996. Receiver-driven layered multi- 1997. Specification of guaranteed quality of cast. SIGCOMM Comput. Commun. Rev. 26, service. RFC 2212. 4, 117–130. SHENKER,S.AND WROCLAWSKI, J. 1997. General MCNAMARA, J. E. 1988. Technical Aspects of characterization parameters for integrated . 3rd ed. Digital service network elements. RFC 2215. Press, Newton, MA. SMITH,W.AND KOIFMAN, A. 1996. A distributed NAGLE, J. 1995. Congestion control in IP/TCP interactive simulation intranet using RAMP, internetworks. SIGCOMM Comput. Com- a reliable adaptive multicast protocol. In mun. Rev. 25, 1 (Jan. 1995), 61–65. Proceedings of the 14th Workshop on Stan- NETRAVALI, A., ROOME, W., AND SABNANI,K. dards for the Interoperability of Distributed 1990. Design and implementation of a high Simulations (Orlando, FL, Mar.). speed transport protocol. IEEE Trans. Com- STALLINGS, W. 1997. Data and Computer mun. 38, 11, 2010–2024. Communications. 5th ed. Prentice-Hall, PARTRIDGE, C., HUGHES, J., AND STONE,J. Inc., Upper Saddle River, NJ. 1995. Performance of checksums and CRCs STALLINGS, W. 1998. High-Speed Networks: over real data. SIGCOMM Comput. Com- TCP/IP and ATM Design Principles. Pren- mun. Rev. 25, 4 (Oct.), 68–76. tice-Hall, Inc., Upper Saddle River, NJ. PARULKAR,G.AND TURNER, J. 1989. Towards a STEVENS, W. R. 1994. TCP/IP Illustrated: The framework for high speed communication in a Protocols (Vol. 1). Addison-Wesley Longman heterogeneous networking environment. In Publ. Co., Inc., Reading, MA. Proceedings of on IEEE INFOCOM 1989 (Ot- STEVENS, W. R. 1996. TCP/IP Illustrated: TCP tawa, Ont., Canada, Apr.), IEEE Computer for Transactions, HTTP, NNTP, and the Society Press, Los Alamitos, CA, 655–667. Domain Protocols (Vol. 3). Addison-Wesley PISCITELLO,D.AND CHAPIN, A. 1993. Open Sys- tems Networking, TCP/IP and OSI. 1st Professional Computing Series. Addison- ed. Addison-Wesley, Reading, MA. Wesley Publishing Co., Inc., Redwood City, CA. POSTEL, J. 1980. User datagram protocol. RFC 768. STEVENS, W. 1997. TCP slow start, congestion avoidance, fast retransmit, and fast recovery POSTEL, J. 1981. Transmission control protocol. RFC 793. algorithms. RFC 2001. TEVENS PRUE,W.AND POSTEL, J. 1987. Something a S , W. R. 1990. UNIX Network Program- host could do with source quench: The source ming. Prentice-Hall, Inc., Upper Saddle quench introduced delay (SQuID). RFC River, NJ. 1016. STRAYER,W.T.,DEMPSEY,B.J.,AND WEAVER,A.C. RAMAKRISHNAN,K.K.AND JAIN, R. 1988. A bi- 1992. XTP: The Xpress Transfer Protocol. nary feedback scheme for congestion avoid- Addison-Wesley Publishing Co., Inc., Red- ance in computer networks with a connection- wood City, CA. less network layer. SIGCOMM Comput. STRAYER,T.AND WEAVER, A. 1988. Evaluation Commun. Rev. 18, 4 (Aug. 1988), 303–313. of transport protocols for real-time communi- ROBERTSON, D. 1996. Accessing Transport Net- cations. Tech. Rep. TR-88-18. Department works: MPTN and AnyNet Solutions. of Computer Science, University of Virginia, McGraw-Hill series on computer communica- Charlottesville, VA. tions. McGraw-Hill, Inc., Hightstown, NJ. TANENBAUM, A. S. 1996. Computer Networks. ROSE,M.AND CASS, D. 1987. ISO transport ser- 3rd. ed. Prentice-Hall, Inc., Upper Saddle vice on top of the TCP, version: 3. RFC 1006. River, NJ. SANDERS,R.AND WEAVER, A. 1990. The Xpress THAI, K., CHASSOT, C., FDIDA, S., DIAZ, M., AND transfer protocol (XTP): A tutorial. SIG- DIAZ, M. 1994. Transport layer for coopera- COMM Comput. Commun. Rev. 20, 5 (Oct.), tive multimedia applications. Tech. Rep. 67–80. 94196. Editions du CNRS, Paris, France. SANGHI,D.AND AGRAWALA, A. K. 1993. DTP: An WALRAND, J. 1991. Communication Networks: A efficient transport protocol. In Proceedings First Course. Richard D. Irwin, Inc., Burr of the IFIP TC6 Working Conference on Com- Ridge, IL.

ACM Computing Surveys, Vol. 31, No. 4, December 1999 Transport Layer • 405

WATSON, R. 1989. The Delta-T transport proto- YANG,C.AND REDDY, A. 1995. A taxonomy for col: Features and experience. In Proceedings congestion control algorithms in packet of the 14th Conference on Local Computer switching networks. IEEE Network, 42–48. Networks (Minneapolis, MN, Oct. 1989), YAVATKAR,R.AND BHAGWAT, N. 1994. WEAVER, A. 1994. The Xpress transfer protocol. Improving end-to-end performance of TCP Comput. Commun. 17, 1 (Jan.), 46–52. over mobile internetworks. In Proceedings of WILLIAMSON,C.AND CHERITON, D. 1989. An the Workshop on Mobile Computing Systems overview of the VMTP transport protocol. In and Applications (Mobile ’94, Dec.). Proceedings of the 14th Conference on Local ZHANG, L 1986. Why TCP timers don’t work well. Computer Networks (Minneapolis, MN, Oct. In Proceedings of the ACM Conference on 1989), Communications Architecture and Protocols WROCLAWSKI, J. 1997. Specification of the con- (SIGCOMM ’86, Stowe, VT, Aug. 5–7), W. trolled-load network element service. RFC Kosinsky, J. J. Garcia-Luna, and F. F. Kuo, 2211. Eds. ACM Press, New York, NY, 397–405.

Received: May 1997; revised: October 1998; accepted: November 1998

ACM Computing Surveys, Vol. 31, No. 4, December 1999