msocket: Multiple Stack Support for the Berkeley Socket API
Renzo Davoli Michael Goldweber Computer Science Department Dept. of Mathematics and Computer Science University of Bologna - Italy Xavier University - USA [email protected] [email protected]
ABSTRACT protocol family. Our discussion of socketpair can be found The de-facto standard for network programming, the Berke- in Section 3. ley socket API, supports several protocol families. Unfortu- The original API was designed to support a wide range nately, it has a significant limitation in only allowing a single of domains, services and protocols. However, it supports implementation for each supported protocol family. Hence, at most one implementation for each domain/type/protocol using Berkeley sockets, it is impossible to access multiple dis- assignment. tinct networking stacks for the same protocol, e.g. multiple For example: TCP/IP stacks. This paper defines, msocket, an extension fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); to the Berkeley socket API which overcomes this limitation. msocket has been implemented as a feature of the View-OS defines a socket for a TCP stream connection. Observe that project. Finally, we illustrate the utility and effectiveness of if there are multiple distinct TCP/IP stacks available there is our extended API by providing some examples of its use. no way to specify which stack to use for the communication stream. Categories and Subject Descriptors The core idea of this proposal is to augment the socket API with the following additional function: D.4.4 [Operating Systems]: Communication Management— Network communication; C.2.2 [Computer-Communication int msocket(char *stack, Networks]: Network Protocols—Applications int domain, int type, int protocol);
1. INTRODUCTION msocket, in addition to the socket parameters, allows one to specify, via a new first argument, which stack to use. The Berkeley socket API is the de facto standard for net- The reminder of the paper is organized as follows: Sec- work programming. A fundamental concept of the Berkeley tion 2 provides some examples of applications motivating API is that network communication endpoints are repre- the need for having/accessing multiple networking stacks. sented in the API by file descriptors. Virtually all of the Section 3 details the msocket API definition and its bi- API’s functions use file descriptors to identify individual nary compatibility with existing applications. In section 4 communication endpoints. The API function that defines we present the current proof-of-concept implementation of a new endpoint descriptor is the socket function. msocket in View-OS. Finally, in sections 5 and 6 we discuss The syntax of the Berkeley socket call is: related work, our conclusions and future directions. int socket(int domain, int type, int protocol); 2. APPLICATION DOMAINS FOR THE where domain specifies the communication domain, i.e. the protocol family to use. (e.g. PF_INET for IPv4, PF_INET6 for MSOCKET API IPv6, or PF_IRDA for irda) type indicates the communica- The following (non exhaustive) list includes descriptions tion semantics. (e.g. stream or datagram). The final argu- of various domains of application where the availability of ment, protocol, which is protocol family specific, specifies multiple networking stacks either outright permits or simply the protocol to use when the same semantics can be provided eases the implementation of useful networking services. by different protocols. In addition to socket, there is also socketpair, which is usually defined only for the PF_UNIX Experimental networking stacks running on remote machines: The use of one stack both as the target and the con- trol channel perturbates the results and can partition the remote machine whenever the experimental stack Permission to make digital or hard copies of all or part of this work for malfunctions. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies Support of different network requirements: For example, bear this notice and the full citation on the first page. To copy otherwise, to a stack used for LAN-based communication may not republish, to post on servers or to redistribute to lists, requires prior specific require overly large buffers to support TCP’s sliding permission and/or a fee. SAC'12 March 25-29, 2012, Riva del Garda, Italy. window protocol while TCP streams on satellite chan- Copyright 2011 ACM 978-1-4503-0857-1/12/03 ...$10.00. nels require large sliding windows.
588 Being able to support several stacks means that indi- IoTh requires the support for, and an API for the use vidual stacks can be algorithmically parameterized for of multiple stacks: msocket. specific purposes. Define permissions on stacks: Different users may be granted differing networking services, e.g. differing QoS levels, 3. MSOCKET DEFINITION access to various ip addresses, or the routing of net- msocket work traffic along different network paths. Currently, has the following syntax: one must define specific filters (e.g. iptables on GNU- int msocket(char *stack, Linux) to define differing network services for users. int domain, int type, int protocol); Using the msocket API, there can be several available stacks, each with its own interfaces, ip addresses, rout- In UNIX systems, stack is a UNIX-special file name (i.e. ing services, and access permissions. Since network pathname). In non-UNIX systems, the same interface could stacks are UNIX-special files defined in the standard be used to identify a kernel object. Since the Berkeley file system, access permissions are defined/controlled socket API is currently adopted by non-UNIX based op- in the usual manner. (i.e. chmod or ACL) Hence the erating systems, our proposed msocket extensions should be msocket API, using the security model inherited from easily port-able to other environments. the file system, allows a system administrator to con- Backward compatibility is provided by the definition of trol which users/groups have access to which stacks. default stack(s). Each process has a default stack defined for each protocol family. If the stack argument is NULL Network sandboxing: This is a special case of the previ- the socket is defined using the default stack for the protocol ous example. If a user lacks permission to access/use family of the domain argument. any networking stack, she has no way to generate any We redefine the socket system call in terms of msocket network traffic. as follows: Define different domains of protection/levels of security:A int socket(int domain, int type, int protocol) { user may need to use differing network accesses simul- return msocket(NULL, domain, type, protocol); taneously. For example, a user may need to concur- } rently use a VPN and her local network. She might use the VPN to access sensitive company data and the A file descriptor created via a call to socket will use the local network to read some domotics parameters of her default stack defined for the address family specified. De- room/home. Using a single stack this is only possible fault stack definitions get inherited through the process cre- by defining filtering rules (strictly an administrative ation fork(2) and execution execve(2) methods. procedure and not meant for user configuration). Al- The definition of msocket should appear natural to UNIX ternatively, if there are two stacks, one can be dedi- programmers as it extends socket via a leading pathname cated for use by the VPN and the other for the local argument. network. The setup can be isomorphic to the straight- The choice to add msocket as a new system call means the forward GUI print dialog for selecting which printer to syntax of socket is unmodified, thus ensuring binary com- use to satisfy a word processor print directive. Sim- patibility for existing applications. An alternative approach ilarly, a user may want to define several networking would redefine socket to have a variable number of argu- tunnels and compare how a geo located web service ments. Not only are variable parameter system calls rare provides different answers depending on the location (since they lead to code with a lower degree of readability), of the client. but whenever a system call requires a pathname as an argu- ment, it is virtually always the first argument. We believe Transparent use of compatible implementations: New pro- that a well designed API should be informed by the most tocols may provide compatible services sharing the same common cases when deciding on parameter order so that the addressing scheme of existing ones. An example of this usage of the API can appear both natural and familiar to is the Socket Direct Protocol (SDP) [7] which provides programmers. the same service as TCP. Similarly, applications can While it is natural to define a networking stack as a UNIX- use the Reliable Datagram Service (RDS) instead of special file, stacks unfortunately cannot be classified using UDP. Via the msocket API, existing applications can any of the existing categories of UNIX-special files (block, choose other compatible services instead of the “stan- character, fifo, etc). We therefore propose, as defined for dard” ones by specifying the appropriate networking st_mode in stat(2)), a new UNIX-special file type for stack stack from the set of available stacks. files: Implementation of an “Internet of Threads:” [4] In the #define S_IFSTACK 0160000 original design of IP, the addressable nodes were the networking adapters. While this model is still dom- Furthermore, our proposed stack UNIX-special files can only inant, nodes today are also virtual interfaces, virtual be used by msocket (e.g. not open (2)). machines, or one out of many addresses assigned to There are two main reasons to use msocket instead of an adapter supporting the splitting of services. (e.g. open. The former is a technical reason; a stack typically To migrate the services between nodes of a high avail- provides several different protocols/services. For example a ability cluster.) In an Internet of Threads (IoTh), pro- TCP-IP stack provides not only the datagram service and cesses, threads, or sets of processes can now be ad- the stream service, but also netlink (PF_NETLINK) services dressable Internet “nodes.” Any implementation of an for configuration and possibly direct access to the underlying 589 network (PF_PACKET). The use of open would necessitate the [1]$ um_add_service umnet umnet init definition of several special files (one per family/protocol), or [2]$ mount -t umnetlwipv6 none /dev/net/s1 the definition of ioctl tags to support service configuration. [3]$ mount -t umnetlwipv6 -o tp0=tapx none /dev/net/s2 This is related to the second reason for using msocket instead [4]$ mstack /dev/net/s1 ip addr of open. 1: lo0:
6. CONCLUSIONS AND FUTURE DEVEL- OPMENTS We have introduced an extension of the Berkeley socket API for the support of multiple stacks. The core ideas of this proposal are:
• the msocket system call,
• the naming of the network stack via a UNIX-special file (mapped on the file system), and
• the backward compatibility for all applications using socket given by the concept of default stacks.
We have provided a proof-of-concept implementation us- ing View-OS partial virtual machines to show both the ef- fectiveness of our approach and to give an idea of the wide range of applications for msocket. Further investigation is needed to ensure that the msocket API is the most effective we can make it. In particular, ad- ditional investigation is needed with regard to our overload- ing the msocket call to define process specific default net- working stacks. Perhaps an additional system call is more appropriate. Finally, other user-space utilities can be in- vestigated/designed in conjunction with msocket, including GUI dialogues for graphical programs to select the stack or stacks to work on.
7. REFERENCES [1] Authorless. Cisco openfabrics enterprise distribution infiniband host drivers user guide for linux. Technical report, Cisco OL-10778-01, 2006. [2] E. W. Biederman. ns: Introduce the setns syscall. https://lkml.org/lkml/2011/5/6/411, 2011. [3] dan Hildebrand. An architectural overview of qnx. In Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures, pages 113–126, 1992. [4] R. Davoli. Internet of threads. Communication at the Conferenza Garr 2011 (in Italian) http://www.garr.it/a/conf11/, 2011. [5] P. Emelyanov. net: Implement socketat. http://lwn.net/Articles/407615/, 2010. 592 #include
/* This is a TCP packet forwarding program. usage prog stack1 address1 port1 stack2 address2 port2 e.g. prog /dev/net/1 192.168.100.1 1111 /dev/net/2 192.168.102.2 2222 all tcp connections for 192.168.100.1 port 1111 on stack /dev/net/1 get forwarded to 192.168.102.2 port 2222 using stack /dev/net/2
error control has been omitted (... comments) to avoid details out of the scope of this explanation */ int main(int argc, char *argv[]) { int sockin, sockinc; int rv; struct addrinfo *addr1,*addr2; /* argc/argv consistency check ... */ if (getaddrinfo(argv[2],argv[3],NULL,&addr1) < 0 || getaddrinfo(argv[5],argv[6],NULL,&addr2) < 0) exit (-1); sockin=msocket(argv[1],addr1->ai_family,SOCK_STREAM,IPPROTO_TCP); /* ... */ rv=bind(sockin,addr1->ai_addr,addr1->ai_addrlen); /* ... */ rv=listen(sockin,5); /* ... */ for (;;) { int sockout; sockinc=accept(sockin,NULL,0); /* ... */ sockout=msocket(argv[4],addr2->ai_family,SOCK_STREAM,IPPROTO_TCP); /* ... */ if (connect(sockout,addr2->ai_addr,addr2->ai_addrlen) >= 0) { char buf[BUFSIZ]; struct pollfd pfd[]={{sockinc,POLLIN,0},{sockout,POLLIN,0}}; for (;;) { int n; poll(pfd,2,-1); /* ... */ if (pfd[0].revents & POLLIN) { if ((n=read(sockinc,buf,BUFSIZ)) <= 0) break; write(sockout,buf,n); } if (pfd[1].revents & POLLIN) { if ((n=read(sockout,buf,BUFSIZ)) <= 0) break; write(sockinc,buf,n); } } close(sockout); } /* else ... */ close(sockinc); } }
Figure 2: An example of msocket usage: A TCP forwarder across different stacks
593