Comparing and Evaluating Epoll, Select, and Poll Event Mechanisms

Comparing and Evaluating epoll, select, and poll Event Mechanisms Louay Gammo, Tim Brecht, Amol Shukla, and David Pariag University of Waterloo {lgammo,brecht,ashukla,db2pariag}@cs.uwaterloo.ca Abstract A common problem in HTTP server scala- bility is how to ensure that the server han- dles a large number of connections simultane- This paper uses a high-performance, event- ously without degrading the performance. An driven, HTTP server (the µserver) to compare event-driven approach is often implemented in the performance of the select, poll, and epoll high-performance network servers [14] to mul- event mechanisms. We subject the µserver to tiplex a large number of concurrent connec- a variety of workloads that allow us to expose tions over a few server processes. In event- the relative strengths and weaknesses of each driven servers it is important that the server event mechanism. focuses on connections that can be serviced Interestingly, initial results show that the se- without blocking its main process. An event select lect and poll event mechanisms perform com- dispatch mechanism such as is used parably to the epoll event mechanism in the to determine the connections on which for- absence of idle connections. Profiling data ward progress can be made without invok- shows a significant amount of time spent in ex- ing a blocking system call. Many different ecuting a large number of epoll_ctl sys- event dispatch mechanisms have been used tem calls. As a result, we examine a variety and studied in the context of network applica- select of techniques for reducing epoll_ctl over- tions. These mechanisms range from , poll /dev/poll head including edge-triggered notification, and , , RT signals, and epoll introducing a new system call (epoll_ctlv) [2, 3, 15, 6, 18, 10, 12, 4]. that aggregates several epoll_ctl calls into The epoll event mechanism [18, 10, 12] is de- a single call. Our experiments indicate that al- signed to scale to larger numbers of connec- though these techniques are successful at re- tions than select and poll. One of the ducing epoll_ctl overhead, they only im- problems with select and poll is that in prove performance slightly. a single call they must both inform the kernel of all of the events of interest and obtain new events. This can result in large overheads, par- 1 Introduction ticularly in environments with large numbers of connections and relatively few new events The Internet is expanding in size, number of occurring. In a fashion similar to that described users, and in volume of content, thus it is im- by Banga et al. [3] epoll separates mech- perative to be able to support these changes anisms for obtaining events (epoll_wait) with faster and more efficient HTTP servers. from those used to declare and control interest 216 • Linux Symposium 2004 • Volume One in events (epoll_ctl). 2 Background and Related Work Further reductions in the number of generated events can be obtained by using edge-triggered Event-notification mechanisms have a long epoll semantics. In this mode events are only history in operating systems research and de- provided when there is a change in the state of velopment, and have been a central issue in the socket descriptor of interest. For compat- many performance studies. These studies have ibility with the semantics offered by select sought to improve mechanisms and interfaces and poll, epoll also provides level-triggered for obtaining information about the state of event mechanisms. socket and file descriptors from the operating system [2, 1, 3, 13, 15, 6, 18, 10, 12]. Some To compare the performance of epoll with of these studies have developed improvements select and poll, we use the µserver [4, 7] to select, poll and sigwaitinfo by re- web server. The µserver facilitates compara- ducing the amount of data copied between the tive analysis of different event dispatch mech- application and kernel. Other studies have re- anisms within the same code base through duced the number of events delivered by the command-line parameters. Recently, a highly kernel, for example, the signal-per-fd scheme tuned version of the single process event driven proposed by Chandra et al. [6]. Much of the µserver using select has shown promising aforementioned work is tracked and discussed results that rival the performance of the in- on the web site, “The C10K Problem” [8]. kernel TUX web server [4]. Early work by Banga and Mogul [2] found Interestingly, in this paper, we found that for that despite performing well under laboratory some of the workloads considered select conditions, popular event-driven servers per- and poll perform as well as or slightly bet- formed poorly under real-world conditions. ter than epoll. One such result is shown in They demonstrated that the discrepancy is due Figure 1. This motivated further investigation the inability of the select system call to with the goal of obtaining a better understand- scale to the large number of simultaneous con- ing of epoll’s behaviour. In this paper, we de- nections that are found in WAN environments. scribe our experience in trying to determine how to best use epoll, and examine techniques Subsequent work by Banga et al. [3] sought to designed to improve its performance. improve on select’s performance by (among other things) separating the declaration of in- The rest of the paper is organized as follows: terest in events from the retrieval of events on In Section 2 we summarize some existing work that interest set. Event mechanisms like se- that led to the development of epoll as a scal- lect and poll have traditionally combined these able replacement for select. In Section 3 we tasks into a single system call. However, this describe the techniques we have tried to im- amalgamation requires the server to re-declare prove epoll’s performance. In Section 4 we de- its interest set every time it wishes to retrieve scribe our experimental methodology, includ- events, since the kernel does not remember the ing the workloads used in the evaluation. In interest sets from previous calls. This results in Section 5 we describe and analyze the results unnecessary data copying between the applica- of our experiments. In Section 6 we summarize tion and the kernel. our findings and outline some ideas for future work. The /dev/poll mechanism was adapted from Sun Solaris to Linux by Provos et al. [15], Linux Symposium 2004 • Volume One • 217 and improved on poll’s performance by intro- sults for the one-byte workload. These reducing a new interface that separated the decla- sults demonstrate that the µserver with level- ration of interest in events from retrieval. Their triggered epoll does not perform as well as /dev/poll mechanism further reduced data select under conditions that stress the event copying by using a shared memory region to mechanisms. This led us to more closely ex- return events to the application. amine these results. Using gprof, we ob- served that epoll_ctl was responsible for a The kqueue event mechanism [9] addressed large percentage of the run-time. As can been many of the deficiencies of select and poll seen in Table 1 in Section 5 over 16% of the for FreeBSD systems. In addition to sep- time is spent in epoll_ctl. The gprof out- arating the declaration of interest from re- put also indicates (not shown in the table) that trieval, kqueue allows an application to re- epoll_ctl was being called a large num- trieve events from a variety of sources includ- ber of times because it is called for every state ing file/socket descriptors, signals, AIO com- change for each socket descriptor. We exam- pletions, file system changes, and changes in ine several approaches designed to reduce the process state. number of epoll_ctl calls. These are out- The epoll event mechanism [18, 10, 12] inves- lined in the following paragraphs. tigated in this paper also separates the declara- The first method uses epoll in an edge- tion of interest in events from their retrieval. triggered fashion, which requires the µserver The epoll_create system call instructs to keep track of the current state of the socket the kernel to create an event data structure descriptor. This is required because with the that can be used to track events on a number edge-triggered semantics, events are only re- of descriptors. Thereafter, the epoll_ctl ceived for transitions on the socket descriptor call is used to modify interest sets, while the state. For example, once the server reads data epoll_wait call is used to retrieve events. from a socket, it needs to keep track of whether Another drawback of select and poll is or not that socket is still readable, or if it will epoll_wait that they perform work that depends on the get another event from indicat- size of the interest set, rather than the number ing that the socket is readable. Similar state of events returned. This leads to poor perfor- information is maintained by the server regard- mance when the interest set is much larger than ing whether or not the socket can be written. the active set. The epoll mechanisms avoid this This method is referred to in our graphs and epoll-ET pitfall and provide performance that is largely the rest of the paper . independent of the size of the interest set. The second method, which we refer to as epoll2, simply calls epoll_ctl twice per 3 Improving epoll Performance socket descriptor. The first to register with the kernel that the server is interested in read and write events on the socket. The second call oc- Figure 1 in Section 5 shows the throughput curs when the socket is closed.

Comparing and Evaluating Epoll, Select, and Poll Event Mechanisms

A Practical UNIX Capability System

Reducing Power Consumption in Mobile Devices by Using a Kernel

Onload User Guide

Programmer's Guide

A Linux Kernel Scheduler Extension for Multi-Core Systems

Linux Kernel Development 3Rd Edition

ODROID-HC2: 3.5” High Powered Storage  February 1, 2018

Monitoring File Events

Unbreakable Enterprise Kernel Release Notes for Unbreakable Enterprise Kernel Release 3

Kernel Korner

Deployment Guide

Analyzing a Decade of Linux System Calls