Typechef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines

UNIVERSITA` DEGLI STUDI DI CATANIA SCUOLA SUPERIORE DI CATANIA Paolo G. Giarrusso TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines DIPLOMA DI LICENZA Relatori: Chiar.mo Prof. Klaus Ostermann Chiar.mo Prof. Ing. Giuseppe Pappalardo Correlatore: Dr. Christian Kastner¨ ANNO ACCADEMICO 2010/2011 A mia madre Rosalia Abstract Analyzing static correctness of source code of software product lines implemented through conditional compilation in C is extremely difficult. Standard C parsers process only the output of the C preprocessor, which represents only a single variant of the product line, and thus contains no more any information about variability; analyzing a whole product line seems to require parsing C code without first preprocessing it, a task for which no general algorithm is known to date. Because of this, a long line of research explored heuristics yielding approximate results (Gar- rido and Johnson, 2005, 2003; Padioleau, 2009) and restrictions of preprocessor usage which allow easier parsing (Baxter and Mehlich, 2001; Adams et al., 2009; McCloskey and Brewer, 2005; Kästner et al., 2009). Therefore, checking if all variants are syntactically and type-correct nowadays essentially requires checking each variant in isolation, an impossible effort for com- plex feature models, because the number of variants to check is exponential in the number of features. In this thesis we discuss TypeChef, an on-going development effort which will be for the first time able to analyze together all variants of a software product line. Contents List of figures iv I Introduction1 1 Introduction2 II Background8 2 The C preprocessor9 2.1 CPP syntax and semantics......................9 2.1.1 Macro definition and expansion............... 10 2.1.2 Conditional compilation................... 12 2.1.3 Other constructs....................... 12 2.2 Comprehensibility of unpreprocessed code.............. 13 i CONTENTS III TypeChef 17 3 A design for a partial preprocessor 18 3.1 Requirements............................. 18 3.1.1 PPC as a partial evaluator.................. 21 3.2 Design................................. 32 3.2.1 Conditional compilation................... 32 3.2.2 The macro table........................ 34 3.2.3 Macro references....................... 35 4 Boolean formula manipulation 39 4.1 Motivation............................... 39 4.1.1 The need for simplification.................. 40 4.1.2 Existing approaches..................... 41 4.2 Preliminaries............................. 42 4.2.1 Conversion to conjunction normal form........... 45 4.2.2 Formula renaming...................... 46 4.3 The design space of in-memory representations........... 48 4.4 Formula representation........................ 52 4.4.1 Simplification......................... 54 4.4.2 Visiting a DAG and formula renaming............ 58 4.4.3 An exponential example................... 61 5 Conclusion and future works 63 ii CONTENTS 5.1 Parsing................................ 63 5.1.1 Token stream representation................. 64 5.1.2 Typechecking......................... 65 5.2 Related work............................. 65 5.3 Future works............................. 66 5.4 Conclusion.............................. 68 Acknowledgements 69 Ringraziamenti 70 Bibliography 77 iii List of Figures 2.2.1 Interaction of preprocessor facilities................. 16 3.1.1 Why Eq. (3.1.1) is unsatisfiable – example 1............. 25 3.1.2 Why Eq. (3.1.1) is unsatisfiable – example 2............. 27 4.4.1 Standard simplification rules..................... 55 4.4.2 Advanced simplification rules.................... 56 iv Part I Introduction 1 1 Introduction Many software systems need to support optional or alternative requirements in different usage scenarios. Developers achieve this through various solutions, including settings configurable at runtime. However, often for performance reasons, one wants to generate a product containing only the features of interest, which are combined together at compile-time, excluding the others. Therefore, modern software systems are often developed as software product lines (SPLs), which allow building customised versions of a software by selecting which features to enable, subject to constraints like dependencies or incompatibilities between features, en- coded by a feature model through propositional formulas. Feature selections are also called software configurations. Software product lines are often built by using conditional compilation, which is available for various languages, supported either by the language toolchain itself or through external tools. The C language integrates a preprocessing phase before parsing and compilation, performed by a separate component called the C preprocessor (henceforth CPP). Among the various possibilities, it allows conditionally including or excluding code fragments depending on which macros are defined; 2 CHAPTER 1. Introduction one can thus encode the selection of desired features by defining the corresponding macros for the preprocessing phase, and the preprocessor will therefore generate the appropriate program variant, which the compiler proper will subsequently analyze. Variability is thus lost after the preprocessing step. Checking, among others, syntax and type errors for a software product line would certainly be useful to developers, which nowadays cannot guarantee even the absence of syntax errors in all variants. Using static analysis tools for bug checking can be a further step towards correctness of a software. Additional useful questions about variability include: “Which are the possible expansions of a given macro in a given context?”, “What code might be generated by this macro usage?”, “In which variants is this code fragment included?”, “Is this code type-correct under all variants?”, “Is this variable initialized in all variants?”. Both humans (Favre, 1997; Spencer and Collyer, 1992) and tools (Hu et al., 2000; Latendresse, 2004; Baxter and Mehlich, 2001) face serious difficulties in answering these questions correctly in non-trivial scenarios. Performing such analysis at once on all variants is desirable. In many product lines (such as the Linux kernel), hundreds or thousands of features can be enabled or disabled, and for n features up to 2n variants can exist1, so that checking each one in isolation would be impractical. We need thus to analyse source code before preprocessing eliminates all variability-related information. However, devising a parser for unpreprocessed code yielding correct results in all cases is widely regarded as 1The number can be less if the feature model restricts which variants are valid, but is still typically exponential in practice. 3 CHAPTER 1. Introduction impossible, due to several problems in the design of CPP: • CPP directive direct not only conditional compilation, but also facilities for file inclusion (through #include) and macro (through #define and #undef), which interact together. For instance, conditional compilation allows macro to have alternative definitions; in turn, the result of a conditional compilation directive can depend on the expansion of macros, which in turn might have been conditionally defined, and so on. • Conditional compilation can express compile-time variability, by selecting different code depending on which features are requested, or dually on the facilities offered by the target machine; however, it is equally used to imple- ment include guards, a coding pattern which prevents a C header file from being included multiple times and bringing in scope the same declarations, possibly causing compile-time errors. Tools need to distinguish these uses, even if they are implemented through the same constructs. • Finally, and most problematically, CPP provides lexical, not syntactic macros (Brabrand and Schwartzbach, 2002), and is mostly oblivious to the language, to the point that it is often used as-is in combination with other languages. Lexical macros manipulate token streams, while syntactic macros are inte- grated into the parser and work at a higher-level, manipulating and producing abstract syntax trees. On one side, this means that conditional compilation di- rectives and macros can appear in place of any token combination and across non-terminals, as we discuss later in Sec. 5.1; thus, a conventional context- 4 CHAPTER 1. Introduction free grammar would need to accept them at each position of each production. Furthermore, CPP gives no guarantee that produced source code will be syntactically correct, while the design of syntactic macros allow ensuring guar- antees. Source code analysis would thus be easier if variability were implemented through either syntactic macros or other more disciplined solutions, such as compile-time if statements as offered by D2, feature-oriented programming (Apel and Kästner, 2009), tool-driven feature mapping (Kästner et al., 2008), and so on. However, the large amounts of software product lines implemented using CPP, not only in C, has motivated various research efforts to overcome the technical challenges. Many efforts impose restrictions on the usage of the preprocessor; however, a previous study (Liebig et al., 2011) showed that a significant percentage of CPP annotations are of the undisciplined category and thus are quite problematic to process. Our key conclusion is that it is possible to perform just a part of preprocessing, such that on the one hand variability information is preserved, while on the other hand the resulting language

Load more