Trickle: a Userland Bandwidth Shaper for Unix-Like Systems
Total Page:16
File Type:pdf, Size:1020Kb
Trickle: A Userland Bandwidth Shaper for Unix-like Systems Marius A. Eriksen∗ Google, Inc. [email protected] Abstract traffic management. More likely, the need for band- width shaping is largely ad-hoc, to be employed when As with any finite resource, it is often necessary to ap- and where it is needed. For example, ply policies to the shared usage of network resources. Existing solutions typically implement this by employ- • bulk transfers may adversely impact an interactive ing traffic management in edge routers. However, users session and the two should receive differentiated of smaller networks regularly find themselves in need services, or of nothing more than ad-hoc rate limiting. Such net- works are typically unmanaged, with no network admin- • bulk transfers may need to be prioritized. istrator(s) to manage complicated traffic management Furthermore, such users may not have administrative ac- schemes. Trickle bridges this gap by providing a simple cess to their operating system(s) or network infrastruc- and portable solution to rate limit the TCP connections ture in order to apply traditional bandwidth shaping tech- of a given process or group of processes. niques. Trickle takes advantage of the Unix dynamic loader’s Some operating systems provide the ability to shape preloading functionality to interposition itself in front of traffic of local origin (these are usually extensions to the the BSD socket API provided by the system’s libc. Run- router functionality provided by the OS). This function- ning entirely in user space, shapes network traffic by de- ality is usually embedded directly in the network stack laying and truncating socket I/Os without requiring ad- and resides in the operating system kernel. Network traf- ministrator privileges. Instances of Trickle can coop- fic is not associated with the local processes responsible erate, even across networks allowing for the specifica- for generating the traffic. Rather, other criteria such as tion of global rate limiting policies. Due to the preva- IP, TCP or UDP protocols and destination IP addresses lence of BSD sockets and dynamic loaders, Trickle enjoys are used in classifying network traffic for shaping. These the benefit of portability accross a multitude of Unix-like policies are typically global to the host (thus applying to platforms. all users on it). Since these policies are mandatory and global, it is the task of the system administrator to man- 1 Introduction age the traffic policies. These are the many burdens that become evident if one Bandwidth shaping is traditionally employed monolith- would like to employ bandwidth shaping in an ad-hoc ically as part of network infrastructure or in the local manner. While there have been a few attempts to add operating system kernel which works well for provid- voluntary bandwidth shaping capabilities to the afore- ing traffic management to large networks. Such solutions mentioned in-kernel shapers[25], there is still a lack of typically require dedicated administration and privileged a viable implementation and there is no use of collabo- access levels to network routers or the local operating ration between multiple hosts. These solutions are also system. non-portable and there is a lack of any standard applica- Unmanaged network environments without any set tion or user interfaces. bandwidth usage policies (for example home and small We would like to be able to empower any unprivi- office networks) typically do not necessitate mandatory leged user to employ rate limiting on a case-by-case ba- sis, without the need for special kernel support. Trickle ∗Work done by the author while at the University of Michigan. addresses precisely this scenario: Voluntary ad-hoc rate USENIX Association FREENIX Track: 2005 USENIX Annual Technical Conference 61 limiting without the use of a network wide policy. the link editor maps the specified libraries into memory Trickle is a portable solution to rate limiting and it runs and resolves all external symbols. The existence of any entirely in user space. Instances of Trickle may col- unresolved symbols at this stage results in a run time er- laborate with each other to enforce a network wide rate ror. limiting policy, or they may run independently. Trickle To load load its middleware into memory, Trickle only works properly with applications utilizing the BSD uses a feature of link editors in Unix-like systems called socket layer with TCP connections. We do not feel this preloading. Preloading allows the user to specify a list of is a serious restriction: Recent measurements attribute shared objects that are to be loaded together the shared li- TCP to be responsible for over 90% of the volume of braries. The link editor will first try to resolve symbols to traffic in one major provider’s backbone[16]. The major- the preload objects (in order), thus selectively bypassing ity of non-TCP traffic is DNS (UDP) – which is rarely symbols provided by the shared libraries specified by the desirable to shape anyway. program. Trickle uses preloading to provide an alterna- We strive to maintain a few sensible design criteria for tive version of the BSD socket API, and thus socket calls Trickle: are now handled by Trickle. This feature has been used chiefly for program and systems diagnostics; for exam- • Semantic transparency: Trickle should never ple, to match malloc to free calls, one would provide change the behavior or correctness of the process an alternative of these functions via a preload library that it is shaping (Other than the data transfer rates). has the additional matching functionality. In practice, this feature is used by listing the libraries • Portability: Trickle should be extraordinarily portable, working with any Unix-like operating sys- to preload in an environment variable. Preloading does tem that has shared library and preloading support. not work for set-UID or set-GID binaries for security rea- sons: A user could perform privilege elevation or arbi- • Simplicity: No need for excessively expressive poli- trary code execution by specifying a preload object that cies that confuse users. Don’t add features that will defines some functionality that is known to be used by be used by only 1 in 20 users. No setup cost, a user the target application. should be able to immediately make use of Trickle We are interested in interpositioning Trickle in be- after examining just the command line options (and tween the shaped process and the socket implementa- there should be very few command line options). tion provided by the system. Another way to look at it, is that Trickle acts as a proxy between the two. We The remainder of this paper is organized as follows: need some way to call the procedures Trickle is proxy- Section 2 describes the linking and preloading features ing. The link editor provides this functionality through of modern Unix-like systems. Section 3 provides a high- an API that allows a program to load an arbitrary shared level overview of Trickle. Section 4 discusses the details object to resolve any symbol contained therein. The API of Trickle’s scheduler. In section 5 we discuss related is very simple: Given a string representation of the sym- work. Finally, section 9 concludes. bol to resolve, a pointer to the location of that symbol is returned. A common use of this feature is to provide 2 Linking and (Pre)Loading plug-in functionality wherein plugins are shared objects and may be loaded and unloaded dynamically. Dynamic linking and loading have been widely used in Figure 1 illustrates Trickle’s interpositioning. Unix-like environments for more than a decade. Dy- namic linking and loading allow an application to refer to an external symbol which does not need to be resolved 3 How Trickle Works to an address in memory until the runtime of the particu- lar binary. The canonical use of this capability has been We describe a generic rate limiting scheme defining a to implement shared libraries. Shared libraries allow an black-box scheduler. We then look at the practical as- operating system to share one copy of commonly used pects of how Trickle interpositions its middleware in or- code among any number of processes. We refer to re- der to intercept socket calls. Finally we discuss how mul- solving these references to external objects as link edit- tiple instances of Trickle collaborate to limit their aggre- ing, and to unresolved external symbols simply as exter- gate bandwidth usage. nal symbols. After compilation, at link time, the linker specifies a 3.1 A Simple Rate Limiting Scheme list of libraries that are needed to resolve all external symbols. This list is then embedded in the final exe- A process utilizing BSD sockets may perform its own cutable file. At load time (before program execution), rate limiting. For upstream limiting, the application can 62 FREENIX Track: 2005 USENIX Annual Technical Conference USENIX Association do this by simply limiting the rate of data that is writ- application ten to a socket. Similarly, for downstream limiting, an recv() send() ioctl() printf() application may limit the rate of data it reads from a libtrickle.so socket. However, the reason why this works is not imme- delaysmooth() diately obvious. When the application neglects to read libc.so write() some data from a socket, its socket receive buffers fill up. This in turn will cause the receiving TCP to advertise a OS kernel smaller receiver window (rwnd), creating back pressure sys_send() sys_ioctl() on the underlying TCP connection thus limiting its data sys_recv() sys_write() flow. Eventually this “trickle-down” effect achieves end- to-end rate limiting. Depending on buffering in all layers Figure 1: libtrickle.so is preloaded in the of the network stack, this effect may take some time to application’s address space, calls to recv() and propagate.