University of California, Berkeley Fall 2000

Modular TCP

Yunfei Deng1, Kenneth Cheung2, Daniil Khidekel3 Professor Jean Walrand EE228a Communication Networks

Abstract

Since the first introducing of TCP/IP protocols, they have achieved great success in computer networks. While with the increasing speed of evolution of Internet, there come more and more various communication network conditions in which the current TCP/IP protocols have poor performance. In this paper, we describe the new challenges faced in modern transport protocols and applications. By introducing a new TCP- compatible transport protocol, Modular TCP, we address these issues in the principle of Application Level Framing (ALF). Modular TCP is aimed to be an application- controlled; connection based flexible transport layer protocol. We present the Modular TCP protocol designs in details considering semantics of reliability, ordering, and what need to be change in flow control and congestion control for TCP. We also discuss the issues in implementation of this experimental protocol and planned work. The chief contributions of this project are to design the modular transport protocol based on TCP and try to demonstrate the flexibility and performance gain in an experimental implementation.

1 Yunfei Deng, [email protected], for EE228a 2 Kenneth Cheung, [email protected], for CS262a 3 Daniil Khidekel, [email protected], for CS262a 1. INTRODUCTION

Modular TCP is extension of standard TCP [RFC 793, TCP] that we introduce to address the issues of satisfying the demanding requirements of applications in today’s Internet environments. Although there have been enormous efforts from Internet community to improve existing TCP/IP protocol suites for performance and functionality [Floyd]. More often Researchers try to design and experiment various transport protocols to catch up with the development of communication networks [WebTP]. We look at this issue and take a conservative approach to solve these problems based on existing TCP by introducing Application Level Framing mechanism support in TCP and related semantics. It is designed to be application-oriented with fine granularity control and will leverage on existing standards, on-going research to ensure incremental deployment and high performance.

Modular TCP is a potentially major project and this report only introduces the initial evolution of it. In the next section, the report illustrates the impact on network protocols of Internet from the history to recently developments. Section 3 describes the efforts of Internet community toward improving transport protocols and the problems needed to consider for next generation transport protocol design. Section 4 of this report presents the Modular TCP, from conceptual design to details of connection semantics in the sender side and receiver side. We survey the acknowledgement schemes used for congestion control in transport protocols and propose the use of Selective Acknowledgement and Explicit Congestion notification for the future. Some issues of our planned experimental implementation of Modular TCP are discussed in Section 5. We conclude this report and present the further work plan on this project in Section 6.

2. Challenges in the New Internet Era

Although computer networks only have history of less than 30 years, the development of computer networks has been speed up greatly. From the beginning of 21st century, we have entered the new Internet Era, in which we have entirely different network environments from the original networks. These changes have great impact on requirements on the infrastructure of our networking protocols.

In this new Internet Era, the network hosts and users have increased more than 7 magnitudes. According to the statistics from MIDS Internet Growth report [MIDS], the overall world online users have reached 377 millions, and the Internet hosts with IP addresses has reached 36,739,000. The network traffic also increases in corresponding manner. These increase are based on vast employment of high-speed Internet backbones, all kinds of Internet connections to users, and also because of the fast developing Personal Computer industry. The distribution of networks determines that there’s large variance on conditions of Internet devices and usages. For example, people have high- speed Broadband Internet Access (Cable Modem/DSL) with speed up to 1Mbps and also many still have to stuck on the slow phone modem with only 56Kbps or even 33.6Kbps. The wireless communication networks also bring people into a complete new network environment. A new characteristic of wireless network is that it has much large loss ratio compared with the normal wired networks. These differences in network conditions ask the easy adaptation of the transport protocols.

In this evolution of information, people have created and used all kinds of information or contents, which didn’t existed in the early days of Internet. The delivered content now days on Internet is multimedia including audio, video, or even 3D virtual world. Different type of media has different type of requirement on quality of transportation such as reliability, timing, ordering, and integrity. Also users may have different preferences on delivery of different contents. The transport protocols also need to adapt these needs.

All this kinds of variance in conditions and requirements ask for the help of transport protocols to present a simple, uniform and flexible network interface to the application developers. But the existing protocols, mainly TCP/UDP on IP were developed in much simple environments so that they fail or are nor efficient in meeting these needs.

In this section we present three examples of applications to describe the situation. The first example is a simple FTP program. The FTP protocol on the top of TCP would have to stick on the stream data interface provided by TCP though it has much simple semantics and fixed correlation between transferred data. A good transport protocol should be able to utilize FTP’s intrinsic less order constraints in transferring files, such as out-of-order packets can be delivery direct to FTP if FTP knows the location where to put the packets. The omitted re-transmission of out-of-order packets and possible optimization of kernel operations save download time of the files and also put less dataflow into the network traffic.

The second example is a stock quote program, which reports the latest stock price to day-traders. Since day-traders want the latest quote, then on the events of lost packets or corrupted packets, Stock quote program doesn’t trigger the re-transmission of those packets and can start to transmit the new stock prices if available instead.

The third example is an online video-playing program. Users ask for the smooth play with minimum delay and maximum resolution in the conditions of current connection. A lost packet in the frame may delay the delivery of the entire frame to video Server: Video player: Normal frame

Normal packet Playing Buffered Lost packet

Delayed frame Dropped Transport Protocol Transport Protocol IP IP Network

Fig. 1. Video frame processing player in time, so it’s the better that video player ignores the delayed frame, and continued with data available. But the delayed frame might be still useful for the program to play the later frames since the coding mechanism so that video player may keep it in the buffer or drop it when it’s no longer useful. While this algorithm works only if the transport protocol knows about the frames existing, and deliveries them as units to the player. The sketch picture [Fig. 1.] shows the processing video frames.

3. Transport Protocol Design

As described in the last section, the old transport protocol TCP/IP in the early Internet days can’t meet the requirements of the development of communication networks and applications. The Internet community kept improving the designs and implementations of TCP/IP. These efforts include several TCP improvements including flow control, congestion control [Stevens], TCP for transactions [RFC 1379, 1644, T/TCP]. But these improvements based on TCP still keep the concept of reliable in-order delivery of stream data transfer of the original TCP. Applications for unreliable data transfer cannot use TCP as the transport protocol. Many of them are built on the top of UDP.

Applications can use UDP directly, but because there are less features provided by UDP and it’s unreasonable and impossible to let application developers to implement all those functions need but not in UDP. So that there were many protocols designed on the top of UDP and each provides features specific to some kinds of applications, such as RTP, RSTP, and SRDP etc, as shown in [Fig. 2]. While these protocols solve some parts of problems faced by application developers, but they make the communication networks worse or unfair to other TCP applications. The reason is that most of these protocols didn’t consider the cooperation between protocols or didn’t have congestion control implemented in transport layer. This has unfair impact on the TCP connections because when congestion happens, TCP detects the congestion and backoffs, while UDP-based protocols didn’t aware and ignore the congestion, or was delayed in reactions to the congestion. With more and more various applications used in Internet are based on those protocols, the Internet is in the danger of unstableness.

Congestion Control TCP RSTP etc. Extended Reliability M transport layer T protocol built C on UDP P UDP Transport Layer IP Network

Fig. 2. Transport protocol design space Aware of this situation, researchers at Berkeley presented a new transport protocol, WebTP, motivated by the increasing popularity of the Web-based applications and user-centric design principle. The WebTP project supports fine-grained and application-specific control, and congestion control, though it is aimed for broader goals such as single/multi user-satisfaction optimization with QoS guarantees [WebTP].

Since most of these transport protocols was designed new, the implementation and adoption of these protocols may be a problem for applications to utilize them fully. TCP was created much earlier but it was greatly improved since then. TCP implementations are widely supported and have tuned with high performance. TCP also have features of flow control and congestion control, which are desired. The natural way to reuse TCP designs and implementations will be extend TCP for the requirements of conditions. We conclude and design Modular TCP based on this idea. IETF summarized the requirements for transport layer by the new development of the Internet in [IETF RUTS]. The modular TCP is designed to satisfy most of these requirements. Since the ever-improved TCP has satisfied some of them, it is easier to design Modular TCP. Overall, the Modular TCP was mainly designed to add these features into TCP:

 support for application level framing  visibility into network conditions- control over reliability  the ability to supercede previous application messages  want to deal with transport at a 'frame' granularity (record marking)  per-message priority control  Congestion control (extended for all types of ADU)

4. Modular TCP design

4.1 Overview of Modular TCP

The Modular TCP support the principle of Application Level Framing (ALF) [Clark90]. The idea is that applications provide data to the transport layer protocol as Application Data Units (ADU). The fundamental characteristic of ADU is that each ADU can be processed out of order with respect to other ADUs. This rules permits the ADU boundaries to take place of the packet boundaries for end-to-end error detection or correction and other encryption and presentation manipulating operations. The ADU also permits applications can transfer different ADU with different transport requirements of quality such as reliability, ordering, timing, etc.

Modular TCP is designed to support four types of ADUs with respect to requirements on reliability and ordering. The Modular TCP connection permits the ADU flows of 5 levels:

 Reliable, in-ordered delivery  Reliable, un-ordered delivery  Unreliable, in-ordered delivery  Unreliable, un-ordered delivery  Mixture delivery

Pure requirements as the first four levels are simple and easy to be consistent. The mixture delivery is designed for applications with multiple delivery requirements in the same connection. It would be simpler if the application can use multiple connections with different delivery requirements. But limited mixture delivery is good for performance in the case of application which ask for unreliable flows most of time, and only need reliable flow for some special ADUs, such as ADUs with important synchronization or timing information for applications. The order constraints for mixture delivery ask for additional semantics to define.

Modular TCP connections are established only when both sender and receiver sides agree to use Modular TCP semantics. It’s achieved by defining a Modular TCP permit option in the TCP header. The proposed format will be:

Modular TCP permit option: {[option kind = 200], [option length = 4], [ADU policy], [ADU max size]}.

During the connection, both sender and receiver need to track all the ADU information including sequences, sizes, reliability requirements, ordering requirements, and other possible options like priority. These data are saved in the structure of ADU header. It’s possible to define another TCP header option just as the same as Modular TCP permit option. But it’s argued that this option is not flexible and will limit the available option size to other TCP options, such as SACK option that will be used by Modular TCP.

There are many changes in TCP to make it support ALF, but basic connection flow is clear and simple. It is described in [Fig. 3.] and discussed step by step in the following sub-sections.

4.2 Sender of the connection

Sender side of the connection in Modular TCP keeps most part of TCP for flow control and congestion control. Sometimes the job is easier since the unreliable packets are not requited of re-transmission.

On receiving an ADU from the application, the sender fragments the ADU if needed, and queues the packets into sending queue. Sender still keeps the sliding window to determine sending packet flow. Once data packet has been sent out, the unreliable packet data can be discarded since it won’t need anymore, but the reliable packet data still need to be queued for re-transmission in the case of lost and corrupt. The sender also needs to keep records of outstanding unreliable ADUs, which is used to compare with acknowledgements for flow control and congestion control. When receiving acknowledgement from the receiver, which should be SACK, sender check the acked continuous blocks of data to make marks. After the check, if the sender find out that the SACK indicates the receiver is waiting for a lost reliable packet, it start to re-transmit that reliable packet, and also adjust the congest control parameters. In the case of timeout for a reliable packet, sender also starts re-transmission. In this case, Fast Retransmit will help. The acknowledgement for unreliable packets is recorded so that the sender can update the cumulative sequence number to ignore the fact of lost unreliable packet once the whole unreliable ADU is dropped at receiver.

4.3 Receiver of the connection

Receiver side of the connection in Modular TCP is more complex than the original TCP since it must deal with all kinds of delivery requirements mentions in the overview and also need to acknowledge the received packet correctly to inform the data sender with the timely information of flow status and congestion signals.

On the receiving an incoming packet, the data receiver first does normal checksum calculation on the packet passed from IP. If the packet is not corrupted, it as

Sender Receiver

MTCP MTCP

IP IP Network

ADU Packet Corrupted packet Packet queue Partial ADU Lost packet Lost ACK

Re-transmitted ACK SACK list Packet

Fig. 3. Overview of Modular TCP passed along to received data queue manager and acknowledged appropriately. If the packet was corrupted, it was dropped and also acked.

Modular TCP enables the data flow of unreliable transfer, which needs not to be acked usually. While in this case it still need acknowledgements for transferring information of flow status and congestion signals. So Modular TCP chooses to use Selective Acknowledgements (SACK) [RFC 2018, SACK] for all kinds of ADU flows. SACK is discussed later in details.

The queue manager receives the data passed and check with queued data blocks to find any full ADU, which can be repackaged. If a full ADU is available and it’s in the order or the ADU flow is permitted with out of order delivery, it dequeues the data and passes them to application. The Application reads the data through the ordinary Socket interface and process this ADU in its manner. It’s also possible to design unsynchronized system call to read ADUs. If the available ADU is out of order and not permitted to delivery out-of-order, them it has to be buffered in the ADU queue at receiver. The queue manager also updates the statistics and advertises optimal window size in the acknowledgements. . If the unreliable packet is out of order or delayed, the queue manager must check the queue to find its ADU, which can be discarded when buffer is low. If its ADU was discard, this packet also has to be discard.

4.4 Flow control and Congestion control

Modular TCP aims to be compatible with the original TCP and the main goal is to extend flow control and congestion control to all type of services. So that flow control and congestion control in TCP need to change to adapt Modular TCP flows.

Basically flow control and congestion control management module keeps the same as in TCP since we change the acknowledgement scheme so that the data receiver can inform the data sender any change of window size and congestion signals in a timely manner without regarding to types of ADU. We employ Selective Acknowledgement scheme in Modular TCP so that the data sender can detect the congestion in time and adopt congestion control to ease the network flow. The new ECN mechanism also looks promising as congestion indicator. Both of them are discussed in the following sub- section in details.

4.4.1 SACK

Selective Acknowledgement was a proposal to TCP by Ramakrishnan and Floyd [RFC 2018, SACK] to improve TCP’s original simple positive cumulative acknowledgement scheme, in which received segments that are not at the left edge of the receive window are not acknowledged. SACK option enables the data receiver to report that non-continuous blocks of data have been received while even some blocks before them are lost or corrupted. The data sender then can use this information to selective re- transmit only the data needed. SCAK adds SACK-permit option only used in SYN packets into the TCP header for negotiation in the initialization phase. In the SACK-enabled connection, the data receiver writes in the SACK TCP option field with a list of blocks of contiguous sequence space occupied by data that has been received and queued within the window. SACK TCP option takes 8*n+2 bytes so that it usually contains 3 SACK blocks since TCP option size is limited to 40 bytes.

SACK is only advisory to normal TCP connections considering some implementations don’t support SACK options. Modular TCP is designed to fully utilize the SACK option so that it checks the TCP connection requests and use Modular TCP semantics only for Modular TCP permitted and SACK option permitted.

4.4.2 Explicit Congestion Notification

TCP's existing congestion monitor and control algorithms are based on the notion that the network is a black-box [Jacobson90]. TCP probes the network state by gradually increasing the load on the network with increased window size of packets that are outstanding in the network until the network becomes congested and a packet is lost indicated by acknowledgements. This method is appropriate only for pure best-effort data carried by TCP on low loss ratio and low latency connection since TCP congestion management algorithms employ Fast Retransmit and Fast Recovery techniques to minimize the impact of losses from a throughput perspective. But with introduce of wireless networks which have high loss ratio, TCP needs to adapt to these cases.

TCP network bandwidth probing by gradually increasing the window size until it experiences a dropped packet will cause the queues at the bottleneck router to build up. Bottleneck router had to drop packets, which might belong to a loss-sensitive or delay- sensitive connection such as Web browsing and interactive processes. If router can detect the congestion in the processing flow and send the congestion warning signals to the sender of the participating connections, then sender can decrease congestion window size as the way to avoid the congestion to happen really soon. This is the idea of ECN (Explicit Congestion Notification) [RFC 2481, ECN].

ECN proposes that an ECN field in the IP header with two bits used as the congestion indication for incipient congestion where the ECN-enabled packets can sometimes be through routers which only mark rather than drop them. ECN needs the support from transport protocols of end-systems, such as the negotiation during setup to determine if they are both ECN-capable, an ECN-Echo flag in the TCP header to inform the sender when a CE packet has been received, and a Congestion Window Reduced (CWR) flag to inform the receiver that the congestion window has been reduced. Modular TCP needs to support this good mechanism, though it has not been widely employed and only implemented by few TCP/IP stacks. 5. Experimental Implementation

During the design of Modular TCP, we also look at the issues of implementation. To be a high-performance transport protocol, it has to be closely integrated with lower Internet Protocol and even network layer, thought it is best modularly designed. According to the principle of Integrated-Layer Processing introduced in [Clark90], it’s important to distinguish between the architecture of protocol suites and the engineering of a specific end-system. The key architectural principle should be flexible decomposition: the deferral of engineering decisions to the implementation and the avoidance of inessential constraints. Thus careful consideration with implementation in mind at design phase is important.

We decide that the experimental implementation of Modular TCP should be developed in Linux. Linux TCP/IP protocol suite is one of existing TCP/IP implementations with most advance features. For example, Linux supports Selective Acknowledgement (SACK) since kernel 2.1.19, and it also support Explicit Congestion Control (ECN) in kernel 2.4.test-10. More important is that Linux TCP/IP is open- sourced. There’s vast resource about Linux available online and also many kernel developers, which should make the implementation easier. It’s also good for adoption and evolution of Modular TCP suite on the Internet.

Linux TCP/IP implementation has high performance or throughput based its good quality, though it also means it’s harder to make changes. It employs the Integrated-Layer Processing principle, and also recommendations in the TCP Control Block Interdependence [RFC 2140]. It uses the generic BSD socket interface for application, which should be kept in Modular TCP. BSD sockets are higher-level abstractions of INET sockets. The key common data structure is the socket buffer or sk_buff, which enables maintaining the strict layering of protocols without wasting time copying parameters and payloads back and forth. Modular TCP would add additional parameters to TCP control block for support ALF/ADU in TCP. It’s estimated Modular TCP need add additional 1/3 of the protocol stack codes.

6. Conclusion

In this project, we presented the Modular TCP, an extension to TCP in the new Internet environments. We analyzed the challenges and requirements from the network applications right now and projected in the near future, and described what features is demanded for transport layer protocols. Based on these requirements, the design of Modular TCP was presented with special sub-sections for congestion control, which’s the main feature we want keep it and make it working efficiently for all cases. Solution of congestion control based on Selective Acknowledgement was discussed and also the Explicit Congestion Control was projected that it will be great help though it still in research. Experimental implementation on Linux was discussed. Overall we are excited with introduce of Modular TCP and will continue work on this project. Future plans include some simulations to study the correctness and performance of designed protocol and to determine several engineering decisions by using network simulation ns [ns]. Also planed is the experimental implementation of Modular TCP on Linux.

Reference

[MIDS] MIDS Internet Growth, http://www.mids.org/growth/internet/index.html

[Clark90] David D. Clark, David L. Tennenhouse, Architectural Considerations for a New Generation of Protocols, Lab. for Computer Science, M.I.T.

[IETF RUTS] IETF, Requirements for Unicast Transport/Sessions bof, Dec. 1998. http://www.ietf.cnri.reston.va.us/proceedings/98dec/43rd-ietf-98dec-142.html

[Floyd] Sally Floyd, website, http://www.aciri.org/floyd/

[Rizzo96] Luigi Rizzo, TCP re-transmission on very lossy networks, Jan 1996

[Steven] W. Richard Stevens, TCP/IP Illustrated, I, II, III

[WebTP] Ye Xia , Hoi-Sheung Wilson So, Venkat Anantharam, Steven McCanne, David Tse, Jean Walrand and Pravin Varaiya, The WebTP Architecture and algorithms, Memorandum No. UCB/ERL M00/53, Electronics Research Laboratory, University of California, Berkeley, Jan. 15, 2000

[Jacobson90] V. Jacobson, Modified TCP Congestion Avoidance Algorithm, Message to end2end-interest mailing list, April 1990 ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt

[ns] The Network Simulator - ns-2, http://www-mash.cs.berkeley.edu/ns

[RFC 793, TCP] ISI, Transmission Control Protocol, RFC 793, ITEF, 1981

[RFC 2001] W. Stevens, TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms, RFC 2001, IETF, January 1997

[RFC 2018, SACK] M. Mathis,J. Mahdavi, S. Floyd, A. Romanow, TCP Selective Acknowlegement Options, RFC 2018, IETF, Oct. 1996. [SACKTCP] Sally Floyd, Jamshid Mahdavi, Matt Mathis, Matthew Podolsky, and Allyn Romanow, An Extension to the Selective Acknowledgement (SACK) Option for TCP, Internet Draft, August 1999

[RFC 1379, T/TCP] Braden, R. T., Extending TCP for Transactions – Concepts, RFC 1379, ITEF, Nov 1992

[RFC 1644, T/TCP] Braden, R. T., T/TCP -- TCP Extensions for Transactions Functional Specification, RFC 1644, ITEF, July 1994

[RFC 2481, ECN] Ramakrishnan, K.K., and Floyd, S., A Proposal to add Explicit Congestion Notification (ECN) to IP, RFC 2481, January 1999

[RFC 2140] J. Touch, TCP Control Block Interdependence, RFC 2140, IETF, April 1997