Lazy Asynchronous I/O for Event-Driven Servers

Lazy Asynchronous I/O For Event-Driven Servers Khaled Elmeleegy, Anupam Chanda, and Alan L. Cox Department of Computer Science Rice University, Houston, Texas 77005, USA kdiaa,anupamc,alc ¡ @cs.rice.edu Willy Zwaenepoel School of Computer and Communication Sciences EPFL, Lausanne, Switzerland willy.zwaenepoel@epfl.ch Abstract necessity. Asynchronous and non-blocking I/O support in present Unix-like systems is, however, We introduce Lazy Asynchronous I/O (LAIO), a limited in its generality. Non-blocking I/O can new asynchronous I/O interface that is well suited be performed on network connections, but not on to event-driven programming. LAIO is general disk files. POSIX asynchronous I/O (AIO) [11] in the sense that it applies to all blocking I/O can be performed on disk files, but only supports operations. Furthermore, it is lazy in the sense reading and writing. Many widely-used opera- that it creates a continuation only when an op- tions that require disk access as a part of their im- eration actually blocks, and it notifies the appli- plementation, such as opening a file or determin- cation only when a blocked operation completes ing its size, have no asynchronous equivalents. in its entirety. These features make program- In principle, this problem could be addressed ming high-performance, event-driven servers us- by changes to the operating system. Such ing LAIO considerably easier than with previous changes would affect the operating system’s in- interfaces. terface as well as its implementation. In practice, We describe a user-level implementation of the scope of such changes has impeded such a so- LAIO, relying only on kernel support for sched- lution. As a consequence, developers faced with uler activations, a facility present in many Unix- this problem have either (1) abandoned an event- like systems. driven architecture entirely for a multithreaded or We compare the performance of web servers multiprocess architecture, (2) accepted that some implemented using LAIO to the performance ob- operations can block and the effect thereof on tained with previous interfaces. For workloads performance, or (3) simulated asynchronous I/O with an appreciable amount of disk I/O, LAIO at user-level by submitting blocking operations to performs substantially better than the alterna- a queue that is serviced by a pool of threads. tives, because it avoids blocking entirely. In one The tradeoff between multithreaded and event- such case, the peak throughput with LAIO is 24% driven servers has received considerable atten- higher than the next best alternative. For in- tion [5, 6, 7, 13]. Event-driven servers ex- memory workloads it performs comparably. hibit certain advantages including greater control over scheduling, lower overhead for maintain- 1 Introduction ing state, and lower overhead for synchroniza- tion. Recently, von Behren et al. [12] have ar- We introduce Lazy Asynchronous I/O (LAIO), gued that compiler support and non-preemption a new asynchronous I/O interface that is well can enable multithreaded servers to achieve per- suited to event-driven programs, in particular formance comparable to event-driven servers, but high-performance servers. such compiler support is not generally available. To achieve the best possible performance, an Surprisingly, the second option does appear in event-driven server must avoid blocking on any practice. The thttpd web server [8] is a notable type of operation, from I/O to resource alloca- example, but as we show in Section 9, perfor- tion. Thus, in an event-driven server, the use of mance suffers as a result of blocking. asynchronous or non-blocking I/O is a practical The asymmetric multiprocess event-driven (AMPED) architecture that was employed by the tion 4 discusses other I/O APIs and their use in Flash web server is representative of the third cat- event-driven programs, and compares them with egory [7]. In essence, it is a hybrid architec- LAIO. Section 5 describes our LAIO implemen- ture that consists of an event-driven core aug- tation. Section 6 describes our experimental en- mented by helper processes. Flash performs all vironment. Section 7 characterizes the perfor- non-blocking operations in an event-driven fash- mance of our implementation using a set of mi- ion and all potentially blocking operations are crobenchmarks. Section 8 describes the web dispatched to helper processes. server software and the workloads used in our Unlike non-blocking I/O and AIO, LAIO is macrobenchmarks. Section 9 describes the exper- general. LAIO offers a non-blocking counter- imental results obtained with these macrobench- part for each blocking system call, thereby avoid- marks. Section 10 discusses related work. Sec- ing the tradeoffs and the programming difficulties tion 11 concludes this paper. present with previous asynchronous I/O systems. In addition to its generality, LAIO offers lazy 2 The LAIO API continuation creation. If a potentially blocking system call completes without blocking, no con- The LAIO API consists of three functions: tinuation is created, avoiding the implementation laio syscall(), laio gethandle(), cost of its creation and the programming com- and laio poll(). plexity of dealing with it. Lazy continuation cre- laio syscall() has the same signature as ation distinguishes LAIO from AIO in which exe- syscall(), a standard function for performing cuting an AIO primitive always creates a continu- indirect system calls. The first parameter iden- ation, regardless of whether the call blocks or not. tifies the desired system call. Symbolic names Furthermore, the LAIO API provides an event for this parameter, representing all system calls, notification when a blocked call is completed in are defined in a standard header file. The rest its entirety. This feature distinguishes LAIO from of the parameters correspond to the parameters non-blocking I/O in which a return from a non- of the system call being performed. If the sys- blocking call may indicate partial completion and tem call completes without blocking, the be- in which the application may need to maintain havior of laio syscall() is indistinguish- state related to that partial completion. able from that of syscall(). If, however, Our implementation of LAIO resides entirely the system call is unable to complete without in a user-level library, without modification to the blocking, laio syscall() returns -1, setting operating system’s kernel. It requires support in the global variable errno to EINPROGRESS. the kernel for scheduler activations [3] to deliver Henceforth, we refer to this case as “a back- upcalls to the LAIO library when an I/O opera- ground laio syscall().” In such cases, any tion blocks or unblocks. input parameters that are passed by reference, The contributions of this paper are three-fold. such as a buffer being written to a file, must not First, we introduce a new asynchronous I/O in- be modified by the caller until the background terface, which is easier to program and performs laio syscall() completes. better than previous I/O interfaces for work- laio gethandle() returns an opaque han- loads with an appreciable amount of disk I/O. dle for the purpose of identifying a background Second, we document the ease of programming laio syscall(). Specifically, this handle by comparing LAIO to non-blocking I/O, AIO identifies the most recent laio syscall() by and AMPED. For LAIO, non-blocking I/O, and the calling thread that reported EINPROGRESS. AMPED, we quantify this comparison by count- If the most recent laio syscall() com- ing the lines of affected code in the Flash web pleted without blocking, laio gethandle() server [7]. Third, we evaluate the performance of returns NULL. laio gethandle() is ex- LAIO by comparing the performance of two web pected to appear shortly after the return of a servers, thttpd [8] and Flash. We show that LAIO laio syscall() with return value -1 and reduces blocking compared to non-blocking I/O with errno equal to EINPROGRESS. and asynchronous I/O and incurs less overhead laio poll() waits for the completion than AMPED. of background laio syscall() operations. The remainder of this paper is organized as fol- When one or more such operations have com- lows. Section 2 describes the LAIO API. Sec- pleted before a caller-specified timeout expires, tion 3 provides an example of using LAIO. Sec- laio poll() returns a set of LAIO completion objects, one per completed background stored in that event object with the arguments re- laio syscall(). A completion object con- turned in the LAIO completion object. sists of a handle, a return value, and a pos- Figure 2 presents an event handler that writes sible error code. The handle identifies a data to a network socket. The write opera- background laio syscall(), as returned by tion is initiated using laio syscall(). If laio gethandle(). The return value and the write does not block, the number of bytes possible error code are determined by the partic- written is returned. The execution continues to ular system call that was performed by the back- client write complete(), and the event ground laio syscall(). handler returns to the event loop. In the more interesting case in which the 3 An LAIO Example write blocks, -1 is returned and errno is set to EINPROGRESS. The program calls We present an example demonstrating how to laio gethandle() to get the handle associ- write an event-driven program using LAIO. In ated with the background laio syscall() this example, we also use libevent [9], a general- operation. It initializes an event object, purpose event notification library. In gen- and associates the continuation function eral, such libraries provide a unifying abstrac- client write complete() with that tion for the various mechanisms that apply to LAIO handle. The event handler then returns different types of events.

Load more