Refactoring of Preprocessor Directives in the #Ifdef Hell

1 Discipline Matters: Refactoring of Preprocessor Directives in the #ifdef Hell Flávio Medeiros, Márcio Ribeiro, Rohit Gheyi, Sven Apel, Christian Kästner, Bruno Ferreira, Luiz Carvalho, and Baldoino Fonseca Abstract—The C preprocessor is used in many C projects to support variability and portability. However, researchers and practitioners criticize the C preprocessor because of its negative effect on code understanding and maintainability and its error proneness. More importantly, the use of the preprocessor hinders the development of tool support that is standard in other languages, such as automated refactoring. Developers aggravate these problems when using the preprocessor in undisciplined ways (e.g., conditional blocks that do not align with the syntactic structure of the code). In this article, we proposed a catalogue of refactorings and we evaluated the number of application possibilities of the refactorings in practice, the opinion of developers about the usefulness of the refactorings, and whether the refactorings preserve behavior. Overall, we found 5670 application possibilities for the refactorings in 63 real-world C projects. In addition, we performed an online survey among 246 developers, and we submitted 28 patches to convert undisciplined directives into disciplined ones. According to our results, 63% of developers prefer to use the refactored (i.e., disciplined) version of the code instead of the original code with undisciplined preprocessor usage. To verify that the refactorings are indeed behavior preserving, we applied them to more than 36 thousand programs generated automatically using a model of a subset of the C language, running the same test cases in the original and refactored programs. Furthermore, we applied the refactorings to three real-world projects: BusyBox, OpenSSL, and SQLite. This way, we detected and fixed a few behavioral changes, 62% caused by unspecified behavior in the C programming language. Index Terms—Configurable Systems, Preprocessors, and Refactoring F 1 INTRODUCTION [5], [14], [26]. We call preprocessor usage that does not respect the syntactic structure of the source code The C preprocessor is a language-independent tool for undisciplined (e.g., wrapping a single bracket without its lightweight meta-programming that provides no percep- corresponding closing one). tible form of modularity [44]. The preprocessor is used in To resolve undisciplined preprocessor usage, in a pre- many projects written in C. Developers use preprocessor vious work, we proposed a catalog of refactorings (and directives, such as #ifdef and #endif, to mark blocks we developed an Eclipse plug-in to automate the refactor- of source code as optional or conditional, with the pur- ings) [32]. The catalog contains 14 refactorings to resolve pose of tailoring software systems to different hardware undisciplined directives divided into 4 categories: single platforms, operating systems, and application scenarios. statements, conditions, wrappers, and comma-separated Researchers and practitioners have been criticizing the elements. In this article, we report on a mixed-method C preprocessor because of its negative effect on code evaluation [8], [9] of our refactorings to answer the understanding and maintainability and its error prone- following research questions: ness [10], [5], [26], [31], [30], [33]. More importantly, the preprocessor hinders the development of tool support • RQ1: What is the number of possibilities to apply that is standard in other languages, such as automated the refactorings in practice? refactoring [49], [23], [15], [29], [21], [48]. Developers • RQ2: What opinion do developers have on our aggravate these problems when using the preprocessor catalog of refactorings in practice? in undisciplined ways, for example, using preprocessor • RQ3: Do the refactorings of the catalog preserve directives that split up a statement or expression [10], program behavior? To answer RQ1 (number of possibilities), we ana- • F. Medeiros is affiliated with the Federal Institute of Alagoas, AL, lyzed 63 open source C projects searching for opportuni- 57020-600, and with the Department of Computing and Systems, Federal ties to apply the refactorings in practice. We considered University of Campina Grande, PB, 58429-900, Brazil. R. Gheyi is C projects of different sizes and from various domains, affiliated with the Department of Computing and Systems, Federal University of Campina Grande, PB, 58429-900, Brazil. M. Ribeiro, B. such as games, operating systems, Web servers, and Ferreira, Luiz Carvalho, and B. Fonseca are affiliated with the Computing database systems. Overall, we found 5670 application Institute, Federal University of Alagoas, Maceió, AL, 57072-900, Brazil. possibilities, in 59 out of the 63 open-source projects C. Kästner is affiliated with the Institute for Software Research, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, (one refactoring possibility for every 3704 lines of code), USA. S. Apel is affiliated with the Department of Informatics and showing that our refactorings are indeed relevant to real- Mathematics, University of Passau, 94032, Germany. world C projects. 2 We answer RQ2 (opinion of developers) with two This work builds upon our prior work on studying different evaluations. First, we analyzed data from an #ifdef directives and possible refactorings. Specifically, online survey among 246 developers. We asked them we interviewed 40 developers and performed a survey about their preferences showing pairs of behaviorally with 202 participants in our previous study to learn equivalent code snippets: (1) the original code from a about the opinion of developers regarding undisciplined real-world project with undisciplined preprocessor us- directives and its common problems, such as code un- age, and (2) a disciplined version of the original code derstanding, maintainability, and error proneness [30]. snippet, created by applying one of our refactorings. By studying 12 C real-world projects, we identified Second, we selected a subset of application possibilities patterns of undisciplined preprocessor directives and in the studied subject systems and submitted patches to we proposed a preliminary version of our catalogue of the respective project converting undisciplined directives refactorings to remove these patterns of undisciplined into disciplined ones to get feedback from developers. directives [32]. In this preliminary version, we evaluated The results of our survey reveal that 63% of developers the refactorings showing that they were syntactically prefer to use the refactored code (i.e., the disciplined corrected, that is, we did not introduce syntax errors version of the code) instead of using the preprocessor with our refactorings. This article significantly expands in undisciplined ways. Likewise, we received positive on prior work by implementing and evaluating the feedback from developers when submitting patches to refactorings with multiple studies. We complement our resolve undisciplined directives. previous studies by submitting patches to remove undis- To answer RQ3 (behavior preservation), we created a ciplined preprocessor directives and by performing a model of a subset of the C language and a corresponding totally new survey with 246 developers. Furthermore, code generator, based on Alloy [19], to automatically we performed studies to verify behavior preservation in generate programs with application possibilities for our our refactorings using automatically generated programs refactorings, along with corresponding test cases. In ad- and three real-world systems. A replication package for dition, we applied our refactorings in BusyBox, OpenSSL, our current study is available at the project’s Web site.1 and SQLite, three real-world projects with test cases available. For all subjects (Alloy-based and real-world), 2 UNDISCIPLINED PREPROCESSOR USAGE we ran the test cases before and after applying our refac- The lexical operation mode, which allows introducing torings to check behavior preservation. We found and undisciplined directives at arbitrary tokens, is one of fixed some behavioral changes in one refactoring of our the most criticized aspects of the C preprocessor [10], catalog, and a few behavioral changes in the refactoring [5], [14], [26], [30], [31], [33]. Undisciplined preprocessor implementations. Most behavioral changes (62%) are re- directives do not respect the syntactic structure of the lated to unspecified behavior in the C language, though. source code (e.g., wrapping a single bracket without its We modified one refactoring that introduced behavioral correspondent closing one) [26]. For instance, Figure1 (a) changes and updated another refactoring with feedback presents part of the source code of Vim including pre- from developers. We found no behavioral changes when processor directives that we consider as undisciplined. running the test cases before and after applying the In this case, the #ifdefs wrap only part of expressions. refactorings to the three real-world projects. In Figure1 (b), we present an alternative, disciplined In summary, the main contributions of this article are: version of the code, in which the preprocessor directives • We present a mixed-method evaluation of our cata- surround entire statements only. log and implementation of refactorings to remove In previous work, we gathered evidence that devel- undisciplined preprocessor directives. The results opers do not recommend using undisciplined direc- provide

Refactoring of Preprocessor Directives in the #Ifdef Hell

Thriving in a Crowded and Changing World: C++ 2006–2020

Red Hat Enterprise Linux 8 Security Hardening

Defining the Undefinedness of C

Web Browser Access to Cryptographic Hardware

Automatic Detection of Unspecified Expression Evaluation in Freertos Programs

Lenient Execution of C on a JVM How I Learned to Stop Worrying and Execute the Code

Bring Your Own Token to Replace Traditional Smart Cards

C/C++ Coding Standard Recommendations for IEC 61508

High Integrity C++ Version

Security in Programming Languages Security in Programming Languages

The Java™ Language Specification Java SE 7 Edition

Using Cryptographic Hardware to Secure Applications