Experimental Evaluation of Sunos IPC and TCP/IP Protocol Implementation1

Experimental Evaluation of SunOS IPC and TCP/IP Protocol Implementation1 Christos Papadopoulos Gurudatta M. Parulkar [email protected] [email protected] (314) 935-4163 (314) 935-4621 Computer and Communications Research Center Department of Computer Science Washington University St. Louis MO 63130-4899 Abstract been considerable debate in the research community regarding suitability of existing protocols such as TCP/IP Progress in the field of high speed networking and dis- [3,9,10] for emerging applications over high speed net- tributed applications has led to a debate in the research works. One group of researchers believe that existing pro- community on the suitability of existing protocols such as tocols such as TCP/IP are suitable and can be adopted for TCP/IP for emerging applications over high-speed net- use in high speed environments [4,7]. Another group works. Protocols have to operate in a complex environment claims that the TCP/IP protocols are complex and their comprised of various operating systems, host architectures, control mechanisms are not suitable for high speed net- and a rapidly growing and evolving internet of several het- works and applications [1,2,12,13]. It is important, how- erogeneous subnetworks. Thus, evaluation of protocols is ever, to note that both groups agree that efficient protocol definitely a challenging task that cannot be achieved by implementation and appropriate operating system support studying protocols in isolation. This paper presents results are essential. of a study which attempts to characterize the performance of the SunOS Inter-Process Communication (IPC) and Inter-Process Communication (IPC) which includes TCP/IP protocols for distributed, high-bandwidth applica- protocols, is quite complex. The underlying communications. tion substrate, for example, is a constantly evolving internet of many heterogeneous networks with varying capabilities and performance. Moreover, protocols often interact with 1. Introduction operating systems which are also complex and have addi- tional interacting components such as memory manage- Considerable research and development efforts are ment, interrupt processing, process scheduling and others. being spent in the design and deployment of high speed Furthermore, the performance of protocol implementations networks. These efforts suggest that networks supporting may be affected by the underlying host architecture. Thus, data rates of a few hundreds of Mbps will become available evaluation of IPC models and protocols such as TCP/IP is soon. Target applications for these networks include dis- definitely a challenging task and cannot be achieved by tributed computing involving remote visualization and col- studying protocols in isolation. This paper presents the laborative multimedia, medical imaging, teleconferencing, results of a study aimed at characterizing the performance video distribution, and other demanding applications. of TCP/IP protocols in the existing IPC implementation in Progress in high speed networking suggests that the raw SunOS for high bandwidth applications. Components to be data rates will be available to support such applications. studied include the control mechanisms (such as flow and However, these applications require not only high speed error control), per-packet processing, buffer management networks but also carefully engineered end-to-end proto- and interaction with the operating system by systematic cols implemented efficiently within the constraints of vari- measurement. ous operating systems and host architectures. There has The software examined is the SunOS 4.0.3 IPC using 1This work was supported in part by the National Science BSD stream sockets on top of TCP/IP. SunOS 4.0.3 is Foundation and an industrial consortium of Bellcore, BNR, based on 4.3 BSD Unix (for further details on BSD Unix DEC, Italtel SIT, NEC, NTT, and SynOptics. IPC implementation, see [11]). The hardware were two Sun ¢¡¥¡£¤©¦§¨ ¦§ Application Buffer ¢¡£¡¥¤§¦©¨ ¦§ ¡ ¨ ¡ ¨ Physical copy M &')( M M ! Socket Queue D D D &')/ ! Logical copy ¨ Header &'). TCP/IP M H ! template D &'%* ! 123!45 "! D M H 123!45 "! M Cluster mbuf with data Network &')- D M ! D Page with data Queue H &'), ! H Mbuf with header D M H "! &')0 ! Physical copy $% Ethernet # &'%+ Figure 1: Data distribution into Mbufs ! Sparcstation 1 workstations connected on the same Ether- net segment via the AMD Am7990 LANCE Ethernet Con- troller. Occasionally two Sparcstation 2 workstations Figure 2: Queue and Probe Locations running SunOS 4.1 were also used in the experiments. However, only SunOS 4.0.3 IPC could be studied in depth ated with each cluster mbuf. All data to be transmitted is due to the lack of source code for SunOS 4.1. copied from user space into mbufs, which are linked together if necessary to form a chain. All further protocol 2. Unix Inter-Process Communication (IPC) processing is performed on mbufs. In 4.3 BSD Unix, IPC is organized into 3 layers. The 2.1 Queueing Model first layer, the socket layer, is the IPC interface to applications and supports different types of sockets each type pro- Figure 1 shows the data distribution into mbufs at the viding different communication semantics (for example, sending side. In the beginning, data resides in application STREAM or DATAGRAM sockets). The second layer is the space in contiguous buffer space. Then data is copied into protocol layer, which contains protocols supporting the dif- mbufs in response to a user send request, and queued in the ferent types of sockets. These protocols are grouped in socket buffer until the protocol is ready to transmit it. In the domains, for example the TCP and IP protocols are part of case of TCP or any other protocol providing reliable deliv- the Internet domain. The third layer is the network interface ery, data is queued in this buffer until acknowledged. The layer, which contains the hardware device drivers (for protocol retrieves data equal to the size of available buffer- example the Ethernet device driver). Each socket has ing minus the unacknowledged data, breaks the data in bounded send and receive buffers associated with it, which packets, and after adding the appropriate headers passes are used to hold data for transmission to, or data received packets to the network interface for transmission. Packets from another process. These buffers reside in kernel space, in the network interface are queued until the driver is able and for flexibility and performance reasons they are not to transmit them. On the receiving side, after packets arrive contiguous. The special requirements of interprocess com- at the network interface, the Ethernet header is stripped and munication and network protocols require fast allocation the remaining packet is appended to the protocol receive and deallocation of both fixed and variable size memory queue. The protocol, notified via a software interrupt, blocks. Therefore, IPC in BSD 4.3 uses a memory manage- wakes up and processes packets, passing them to higher ment scheme based on data structures called MBUFs layer protocols if necessary. The top layer protocol after (Memory BUFfers). Mbufs are fixed size memory blocks processing the packets, appends them to the receive socket 128 bytes long that can store up to 112 bytes of data. A vari- queue and wakes up the application. The four queueing ant mbuf, called a cluster mbuf is also available, in which points identified above are depicted in Figure 2. A more data is stored externally in a page 1024 bytes long, associ- comprehensive discussion of IPC and the queueing model can be found in [8]. overhead as possible. There are three sources of potential overhead: (1) due to probe execution time; (2) due to the 3. Probe Design recording of measurements; and (3) due to probe activation (i.e. deciding when to log). To monitor the activity of each queue, probes were inserted in the SunOS Unix network code at the locations To address the first source of overhead, any kind of pro- shown in Figure 2. A probe is a small code segment placed cessing in the probes besides logging a few important at strategic locations in the kernel. Each probe when acti- parameters was precluded. To address the second source, it vated records a timestamp and the amount of data passing was decided that probes should store records in the kernel through its checkpoint. It also fills a field to identify itself. virtual space in a static circular list of linked records, ini- The information recorded is minimal, but can nevertheless tialized during system start-up. The records are accessed provide valuable insight into queue behavior. For example, via global head and tail pointers. A user program extracts the timestamp differentials can provide queue delay, queue the data and resets the list pointers at the end of each exper- length, arrival rate and departure rate. The data length can iment. The third source of overhead, probe activation, was provide throughput measurements, and help identify where addressed by introducing a new socket option that the and how data is fragmented into packets. socket and protocol layer probes can readily examine. For the network layer probes, a bit in the “tos” (type of service) At the sending side, probes 1 and 2 monitor the sending field of the IP header in the outgoing packets was set. The activity at the socket layer. The records produced by probe network layer is able to access this field easily because the 2 alone, show how data is broken up into transmission IP header is always contained in the first mbuf of the chain requests. The records produced by probes 1 and 2 together, either handed down by the protocol, or up by the driver. can be used to plot queue delay and queue length graphs for This is a non-standard approach, and it does violate layer- the socket queue. Probe 3 monitors the rate packets reach ing, but it has the advantage of incurring minimum over- the interface queue.

Load more