Improving the Quality of Error-Handling Code in Systems Software Using Function-Local Information Suman Saha

Improving the Quality of Error-Handling Code in Systems Software using Function-Local Information Suman Saha To cite this version: Suman Saha. Improving the Quality of Error-Handling Code in Systems Software using Function- Local Information. Programming Languages [cs.PL]. Université Pierre et Marie Curie - Paris VI, 2013. English. tel-00937807 HAL Id: tel-00937807 https://tel.archives-ouvertes.fr/tel-00937807 Submitted on 28 Jan 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE DE DOCTORAT DE l’UNIVERSITÉ PIERRE ET MARIE CURIE Spécialité Informatique École doctorale Informatique, Télécommunications et Électronique (Paris) Suman SAHA Pour obtenir le grade de DOCTEUR de l’UNIVERSITÉ PIERRE ET MARIE CURIE Sujet de la thèse : Improving the Quality of Error-Handling Code in Systems Software using Function-Local Information soutenue le 25 mars 2013 M. Gilles MULLER Directeur de thèse Mme. Julia LAWALL Directeur de thèse Mme. Sandrine BLAZY Rapporteur M. Laurent REVÉILLÈRE Rapporteur M. Olaf SPINCZYK Examinateur M. Yannis SMARAGDAKIS Examinateur M. Fabrice KORDON Examinateur ii Acknowledgements First and foremost, I would like to express my gratitude to my both advisors Gilles Muller and Julia Lawall, for their essential guidances and insight throughout my graduate career. Their supports and encouragements throughout the thesis have been very helpful, and without our many fruitful discussions the result would surely not have been as good. I would also like to thank my thesis jury. Both of the thesis reporters, Mme. Sandrine Blazy and M. Laurent Revéillère have spent their valuable time to improve the thesis quality. I am really thankful to them. My sincere and warm thanks go to all my colleagues at Regal group in LIP6 Lab for their inspirations and helpful comments during my thesis. I especially thank to Gaël Thomas, Jean-Pierre Lozi, Brice Berna and Lokesh Gidra for devoting some of their precious time to help me at the some points of my work. Special thanks to my friend, Mahfuza Farooque for encouraging and supporting me at every stage during my studies. Needless to say, my family deserve a great deal of credit for my development. I thank my parents and younger brother for all their love and supports. Finally, I would like to thank God for giving me ability to do this work. Abstract Adequate error-handling code is essential to the reliability of any systems software. On an error, such code is responsible for releasing acquired resources to restore the system to a viable state. Omitting such operations leads not only to memory leaks, but also to system crashes and deadlocks. The C language does not provide any abstractions for exception handling or other forms of error handling, leaving programmers to devise their own conventions for detecting and handling errors. The Linux coding style guidelines suggest placing error handling code at the end of each function, where it can be reached by gotos whenever an error is detected. This coding style has the advantage of putting all of the error-handling code in one place, which eases understanding and maintenance, and reduces code duplication. Nevertheless, this coding style is not always applied. In the first part of the thesis, we propose an automatic program transformation that transforms error-handling code into this style. We have implemented this algorithm as a tool and have applied this tool to five directories (drivers, fs, net, arch, and sound) in Linux 3.6 kernel source code as well as to five widely used open-source systems software projects: PostgreSQL, Apache, Wine, Python, and PHP. This tool successfully converts 22% of the conditionals containing state-restoring error-handling code that have the scope to merge code into one, from the basic strategy to the goto-based strategy. Even when error handling code is structured according to the Linux coding style guidelines, the management of the releasing of allocated resources remains a continual problem in ensuring the ro- bustness of systems software. Finding such faults is very challenging due to the difficulty of system- atically reproducing system errors and the diversity of system resources and their associated resource release operations. To address these issues, over 10 years of research has focused on macroscopic approaches that globally scan a code base for common resource-release operations. Such approaches are notorious for their high rates of false positives, while at the same time, in practice, they leave many faults undetected. In the second part of the thesis, we propose a novel microscopic approach to finding resource- release faults in systems software, taking into account such software’s diversity of resource types and resource-release operations. Rather than generalizing from the results of a complete scan of the source code, our approach achieves precision and scalability by focusing on the error-handling code of each function. Using a tool, Hector, that we have developed based on this approach, we have found 485 faults in 19 different C systems software projects, including Linux, Python, and Apache, with a false positive rate of 23%, well below the 30% that has been reported to be acceptable to developers. Some of these faults are exploitable by an unprivileged malicious user, making it possible to crash the entire system. vii Contents 1 Introduction 1 1.1 Refactoring Programming Code . .2 1.2 Improving the Quality of Error-Handling Code . .4 1.3 Outline of the thesis . .7 2 Background 9 2.1 Bugs in Systems Software . .9 2.2 Terminology . 10 2.3 Considered Software . 11 2.4 Error handling in Systems Software . 14 2.5 State of the Art . 18 2.5.1 Refactoring C Code . 18 2.5.2 Finding Faults in Source Code . 19 2.5.3 Improving Error-handling Code . 27 2.6 Survey on Faults in Linux . 30 3 Improving Structure of Error Handling Code 33 3.1 Summary . 33 3.2 Motivation and Background . 34 3.2.1 Motivating Examples . 34 3.2.2 Analysis . 36 3.3 Transformation Algorithm . 39 3.3.1 Identifying Error-Handling Code (step 1) . 39 3.3.2 Partition (step 2a) . 46 3.3.3 Filtering (step 2b) . 48 3.3.4 Classification and transformation (step 3) . 48 3.4 Evaluation . 52 3.4.1 An example from the Linux kernel. 52 3.4.2 The impact of filtering. 54 3.4.3 Branch classification. 56 3.4.4 Branch transformation. 58 3.4.5 Code sharing. 58 3.5 Conclusion . 59 4 Finding Faults in Error Handling Code 61 viii Contents 4.1 Summary . 62 4.2 Motivation . 62 4.2.1 Linux resource-release omission faults . 62 4.3 Systems error-handling code . 65 4.3.1 Amount of code containing error-handling code . 65 4.3.2 Role of code containing error-handling code . 66 4.3.3 Kinds of errors encountered . 66 4.4 Algorithm . 67 4.5 Implementation . 70 4.5.1 Preprocessing phase . 70 4.5.2 Instantiation of the algorithm . 71 4.5.3 Formal Description of the Algorithm . 73 4.6 Ranking reports . 76 4.7 Experimenting with Hector . 77 4.7.1 Found faults . 77 4.7.2 Comparison to specification mining. 78 4.7.3 Comparison to faults fixed in Linux. 79 4.7.4 Impact of the detected faults . 80 4.7.5 False positives . 82 4.7.6 False negatives . 84 4.7.7 The benefits of the analysis features . 84 4.7.8 Scalability . 86 4.7.9 Threats to validity . 87 4.8 Conclusion . 87 5 Conclusion and Future Work 89 5.1 Conclusion . 89 5.2 Limitations and Future Work . 90 5.2.1 Relax the need for exemplars . 90 5.2.2 Other memory related bugs . 90 5.2.3 Fixing bugs . 90 5.2.4 Finding shared variables . 92 5.2.5 Bugs in web applications . 92 5.3 Summary of Contributions . 94 107 1 Chapter 1 Introduction Contents 1.1 Refactoring Programming Code . .2 1.2 Improving the Quality of Error-Handling Code . .4 1.3 Outline of the thesis . .7 Any computing system may encounter errors, such as inappropriate requests from supported applications, or unexpected behavior from malfunctioning or misconfigured hardware. If the system’s software, such as its operating system, programming-language runtime, or web server, does not recover from these errors correctly, they may lead to more serious failures such as a crash or a vulner- ability to an attack by a malicious user. Therefore, correct error recovery is essential when a system supports long-running or critical services. Indeed, the ability to recover from errors has long been viewed as a cornerstone of system reliability [55], and much of systems code is concerned with error detection and handling. For example, 48% of Linux 2.6.34 driver code is found in functions that handle at least one error.1 Systems code is written in C, which unlike more modern programming languages such as Java, does not provide any specific abstractions for resource management or error-handling code. Error handling code is responsible for detecting the failure of an operation, releasing allocated resources to restore the system to a consistent state, and returning an appropriate error indicator to the calling function.

Improving the Quality of Error-Handling Code in Systems Software Using Function-Local Information Suman Saha

Assignment 6: Subtyping and Bidirectional Typing

(901133) Instructor: Eng

Semantic Patches for Java Program Transformation

Coccinelle: Reducing the Barriers to Modularization in a Large C Code Base

Explicitly Implicifying Explicit Constructors

Inferring Semantic Patches for the Linux Kernel

Automating Patching of Vulnerable Open-Source Software Versions in Application Binaries

The Cool Reference Manual∗

Generic Immutability and Nullity Types for an Imperative Object-Oriented Programming Language with ﬂexible Initialization

C DEFINES and C++ TEMPLATES Professor Ken Birman

Class Notes on Type Inference 2018 Edition Chuck Liang Hofstra University Computer Science

C Programming Tutorial