Analyzing a Decade of Linux System Calls
Total Page:16
File Type:pdf, Size:1020Kb
Noname manuscript No. (will be inserted by the editor) Analyzing a Decade of Linux System Calls Mojtaba Bagherzadeh Nafiseh Kahani · · Cor-Paul Bezemer Ahmed E. Hassan · · Juergen Dingel James R. Cordy · Received: date / Accepted: date Abstract Over the past 25 years, thousands of developers have contributed more than 18 million lines of code (LOC) to the Linux kernel. As the Linux kernel forms the central part of various operating systems that are used by mil- lions of users, the kernel must be continuously adapted to changing demands and expectations of these users. The Linux kernel provides its services to an application through system calls. The set of all system calls combined forms the essential Application Programming Interface (API) through which an application interacts with the kernel. In this paper, we conduct an empirical study of the 8,770 changes that were made to Linux system calls during the last decade (i.e., from April 2005 to December 2014) In particular, we study the size of the changes, and we manually identify the type of changes and bug fixes that were made. Our analysis provides an overview of the evolution of the Linux system calls over the last decade. We find that there was a considerable amount of technical debt in the kernel, that was addressed by adding a number of sibling calls (i.e., 26% of all system calls). In addition, we find that by far, the ptrace() and signal handling system calls are the most difficult to maintain and fix. Our study can be used by developers who want to improve the design and ensure the successful evolution of their own kernel APIs. 1 Introduction Since its introduction in 1991, the Linux kernel has evolved into a project that plays a central role in the computing industry. In addition to its usage School of Computing, Queen’s University, Kingston, Ontario Email: {mojtaba, kahani, bezemer, ahmed, dingel, cordy}@cs.queensu.ca 2 MojtabaBagherzadehetal. on desktop and server systems, the Linux kernel forms the foundation of the Android operating system that is used on almost 1.5 billion mobile devices [21]. Over the past 25 years, thousands of developers have contributed more than 18 million lines of code (LOC) to the Linux kernel. The kernel source code and its development process have been thoroughly analyzed by software engineering researchers (e.g., [1, 23, 24, 30, 31, 40, 42, 43, 48, 50–52, 60, 61]). As the Linux kernel forms the central part of various operating systems, it must be continuously adapted to fulfill the changing demands and expectations of users [35]. As a result, many changes to the kernel are driven by changing or increasing demands from the users of the operating systems that use the kernel, or by hardware evolution and innovation. Analysis of the evolution of the Linux kernel can provide us with a window into how the demands of both operating system users and the computing industry have evolved. The Linux kernel provides its services to an application through system calls.AllsystemcallscombinedformtheessentialApplicationProgramming Interface (API) through which an application interacts with the kernel. Even the simplest Linux application uses system calls to fulfill its goals. For example, the ls command exercises 20 system calls more than 100 times to list the contents of a directory. Studying the evolution of the API of a system can lead to valuable insights for developers of the APIs of other systems. For example, Bogart et al. [4] interviewed developers of the R, Eclipse and npm ecosystems to understand the practices that are followed by each ecosystem for breaking an API, and found that each ecosystem has their own way of handling and communicating breaking API changes. Such knowledge can be leveraged by API developers to make decisions about the way in which their own system handles breaking API changes. In this paper, we conduct an empirical study on the 8, 770 changes that were made to the Linux system calls during the last decade, to sketch an overview of the changing landscape of the Linux kernel API. We are the first, to the best of our knowledge, to focus on system calls rather than on the Linux kernel as a whole. The main contributions of our study are: 1. An overview of the evolution of the Linux system calls over the last decade in terms of the size and type of changes that were made to the system calls. 2. Astudyofthetypeofbugfixesthatweremadetothesystemcallsover the last decade. Our study can be used by developers who want to improve the design and ensure the successful evolution of their own kernel APIs. The outline of the rest of this paper is as follows. Section 2 gives background information about system calls. Section 3 discusses related work. Section 4 presents the methodology of our empirical study. Section 5, 6 and 7 discuss the results of our empirical study. Section 8 discusses the implications of our results. Section 9 discusses threats to the validity of our study. Section 10 concludes the paper. AnalyzingaDecadeofLinuxSystemCalls 3 Fig. 1: The sequence of a system call. 2 System Calls The kernel provides low-level services, e.g., network or file system-related, which need to be executed in kernel mode. In addition, the kernel enforces an isolated execution environment for each process. Therefore, it is necessary to continually make a context switch back-and-forth between the kernel and user mode. System calls are the primary method through which user space processes call kernel services. Figure 1 outlines the (simplified) execution sequence of a system call. In this paper, we consider a system call as the combination of the handler and service in the kernel mode. While it is possible to invoke a system call directly from a process, in most cases the call goes through a wrapper function (step 1) in the C standard library (i.e, glibc [20]). Thus we focus on explaining the sequence for library calls here. The glibc wrapper function traps the kernel into kernel mode and invokes the system call handler (step 2). The system call handler is a kernel function that retrieves the system call parameters from the appropriate registers and calls the required kernel services (step 3). Finally, the required kernel services are executed and the result is returned to the application (steps 4-6). There are two ways to trap into kernel mode, which we briefly discuss below. 2.1 The Old-Fashioned Way Every available Linux system call has a unique identifier number [32]. In the old-fashioned way (i.e., before Linux kernel 2.5), the glibc wrapper function copies the identifier number of the system call that it wraps into the %eax reg- ister and copies the parameters into the other registers. The wrapper function then sends an interrupt, which causes the kernel to switch into kernel mode and read the %eax register to identify the appropriate service that must be called (step 2). 4 MojtabaBagherzadehetal. 2.2 The Modern Way Hayward reported that the old-fashioned way of executing system calls was slow on Intel Pentium 4 processors [26]. To solve this problem, an alternative way of executing system calls was added to the kernel. The alternative way uses Intel’s SYSENTER and SYSEXIT (or AMD’s SYSCALL and SYSRET)instruc- tions [7]. These instructions allow fast entry and exit to and from the kernel without the use of expensive interrupts. In the remainder of the paper, we present our empirical study of the system calls over the last decade. 3 Related work In this section, we discuss prior related work. In particular, we first discuss related work on API evolution, then we discuss related work on evolution of the Linux kernel. 3.1 API Evolution There have been many studies on API evolution. In this section, we give an overview of the most important prior work. The main contribution of our work in comparison to prior work on API evolution is that we are the first, to the best of our knowledge, to deliver an in-depth study of the evolution of the kernel API of an operating system with a very long maintenance history. 3.1.1 Refactoring in APIs Dig and Johnson [12, 13] studied breaking changes in an API. They confirmed that refactoring plays an important role in API evolution. In particular, they found that more than 80% of breaking API changes are refactorings. Their conclusion is that many of these refactorings can and should be automated. In a large-scale study, Xavier et al. [68] showed that almost 28% of the API changes are breaking, which emphasizes the need for automation of API refactoring. Several tools were proposed to automate API evolution (e.g., [27, 53, 69]). For example, the CatchUp! tool [27] uses the existing refactoring support of modern IDEs to record and replay API evolution. The Diff-CatchUp tool [69] uses differences between APIs to automatically suggest plausible replacements in the code that uses the API. 3.1.2 The effect of API evolution on developers Robbes et al. [55] studied how developers in the Pharo ecosystem react to deprecation in the API. Hora et al. [28] later extended Robbes et al.’s study by studying the reaction of developers to all types of API changes in the Pharo AnalyzingaDecadeofLinuxSystemCalls 5 ecosystem. Hora et al. found that developers often do not react to API changes that are not because of deprecation. One of the main reasons is that developers are not notified of such API changes, while the use of a deprecated method yields a warning message. In addition, both Robbes et al. and Hora et al.