Modeling C Preprocessor Metaprograms Using Purely Functional Languages

Proceedings of the 9th International Conference on Applied Informatics Eger, Hungary, January 29–February 1, 2014. Vol. 2. pp. 85–92 doi: 10.14794/ICAI.9.2014.2.85 Modeling C preprocessor metaprograms using purely functional languages Máté Karácsony Department of Programming Languages and Compilers, Faculty of Informatics, Eötvös Loránd University, Budapest [email protected] Abstract The preprocessor of the C language provides a convenient base to define relatively complex source-to-source transformations. However, writing and understanding these macros is usually fairly difficult. Lack of typing, statelessness, and uncommon syntax are the main reasons of this difficulty. To support the development of preprocessor metaprograms, a model for each of the macros can be implemented with a function in a purely functional language. If only a well-defined, restricted set of the language is used, then a working functional implementation can be translated back into a preprocessor metaprogram. This translation leads to easily recognizable patterns in the resulting code, thus makes its maintenance and understanding easier. Furthermore, testing and prototyping may need less effort using an initial implementation in a functional language. Keywords: Preprocessor metaprogramming, functional programming MSC: 68N15 Programming languages 1. Introduction With the preprocessor macros we can express advanced metaprograms. These can be used to describe source-to-source transformations to extend the capabilities of the host language itself, for example with serialization or compile-time reflection. Another possible usage of these metaprograms is calculation of complex static data based on the actual compilation options. While most C developers are familiar with the macro language, its structure and semantics is generally very different from most imperative languages, thus some of these differences are making it difficult to understand and develop metaprograms. During macro expansions, the preprocessor does not allow to access any global, mutable state information. When every macro has a single definition which does 85 86 M. Karácsony not change during preprocessing1, referential transparency of macro invocations is ensured. These properties are making the macro system very similar to a purely functional language. Since the preprocessor manipulates token streams, macros can be considered typeless. This shortcoming can easily lead to mistakes when metaprograms con- taining nested invocation of function-like macros are modified, especially when these macros are simulating data structures. Only a very few number of semantic checks are done on the metaprograms during processing2. Although expansions never directly result in failures, resulting source code is not guaranteed to be free of syntactic or semantic errors. Because the locations of these errors are often point- ing into the preprocessed, not into the original source code, in complicated cases it could be non-trivial to found the original cause in the metaprogram implementation. By utilizing the similarity of preprocessor metaprogramming and functional programming, we can use a purely functional language to ease the development and maintenance. This paper is organized as follows. First, related results will be introduced in the next subsection. Then in section 2, the basic steps of preprocessor metaprogram modeling will be presented, along with detailed explanations in each subsection. Finally, section 3 concludes the paper. Related work The preprocessor subset of Boost library [6] provides a large number of utility macros. These include integer arithmetic operators, container data types, control structures like conditionals and loops. Metaprogramming in general is closely related to functional programming. Boost also includes a separate library for C++ template metaprogramming using functional constructs [1]. In [7], it is also shown that a functional language can be embedded into C++ by template metaprogramming. 2. Modeling preprocessor metaprograms The properties of macros enable to implement them as functions in purely functional languages. This provides higher abstraction level by explicitly giving types to macro arguments and return values. Using data structures like tuples and lists also supports this. Testing and correction of errors also needs less effort this way. It is possible by the far more precise error reporting facilities of functional languages’ compilers. 2.1. Method The steps can be summarized as follows. 1For example, by redefinition using #undef and #define 2Such as avoiding recursive macro expansions Modeling C preprocessor metaprograms using purely functional languages 87 1. Define the transformation. How macros will be invoked by the user and what is the structure of the generated (expanded) source code. 2. Create the functional model. Use a restricted subset of a purely functional language. Test the functional implementation. 3. Translate the model into macros. Start with the macro-based represen- tation of data types, then rewrite every other function as a macro. 2.2. Defining transformation These steps will be presented on a simple program transformation, which extends the C language with a type class deriving mechanism similar to the one supported by the Haskell language [5]. This enables us to automatically generate code for specific behaviors of our custom data types, like equivalence checking or comparison. Haskell is also used to present the functional model of this example in later sections. The following example shows the usage of the envisioned macro system: STRUCT( point , FIELD( int , x ) , FIELD( int , y ) , DERIVING(EQ, ORD, SHOW) ) which defines a custom data structure with two integer fields, and generates the code needed to support field-wise equality check, ordering comparison and conversion to string. The desired code after macro expansions is the following: typedef struct { int x; int y; } point; bool point_equals(point a , point b) { // EQ ∗ ∗ return int_equals(&a >x , &b >x ) − − && int_equals(&a >y , &b >y ) ; − − } int point_compare(point a , point b) { // ORD ∗ ∗ i n t r ; if (0 != (r = int_compare(&a >x , &b >x)) return r; − − if (0 != (r = int_compare(&a >y , &b >y)) return r; − − return r ; } void point_show(char c s t r , point a ) { // SHOW ∗ ∗ sprintf(cstr , "\t" "x" ": "); int_show(cstr , &a >x ) ; − sprintf(cstr , "\t" "y" ": "); int_show(cstr , &a >y ) ; − } 88 M. Karácsony The resulting code contains a standard C structure definition, and in this case, three additional function definitions. These are triggered by the arguments of DERIV ING macro - each enumerated value corresponds to a specific function. The structure fields are also described with a special F IELD macro, because generated code refers structure field types and names independently. The variadic3 ST RUCT macro will generate all the resulting code, using data produced by the other macros. Note that generated code depends on the definitions of T _equals, T _compare and T _show functions for each field type T . 2.3. Functional model We can model each macro with a function. For the sake of clarity, every function that represents a macro will be marked with a prime symbol. s t r u c t 0 :: StructName [ F i e l d ] Deriving Code → → → s t r u c t 0 name fields deriving = mkStructDef ++ mkFunctions where ... The first function takes a structure name, an arbitrary number of field descrip- tions and a deriving clause, and finally returns the generated code. It creates a structure definition based on its name and fields, and the corresponding function to each value in the deriving clause. While macros are generating tokens, this function basically composes a string. Typing is made to be more meaningful with type aliases: type Code = String type StructName = String Since the two remaining macros, F IELD and DERIV ING are representing data, their implementation will be detailed in the next section. 2.4. Translation of data types Containers The Boost library already supports four types of basic containers: tuples, lists, sequences and arrays [6]. These containers can be considered as polymorphic, because the macro system does not restrict the element types. It is possible to create heterogeneous lists and embed any container into another. Boost provides a large set of operations to build and manipulate these data structures, for example splitting a list into a head element and a tail list. For the sake of preprocessing performance, sequences are preferred over lists4. 3A macro with variable number of arguments, available since C99[2] 4However, an empty list can not be represented with a sequence Modeling C preprocessor metaprograms using purely functional languages 89 Algebraic data types and constructors Using previous containers we can simulate algebraic data types, which generally have the following form in Haskell [5]: data T = Cons1 T p11 . T p1n | ... | Consm T pm1 . T pmk where Consx denotes the name of a constructor and T pxy is the type of one of its parameters. As every constructor can be represented as a function with the appropriate arity [3, 4], they can be implemented as a function-like macro with the same number of arguments. #d e f i n e T_Cons1 (p11, . , p1n ) (T_Cons1 )( p11 )...(p1n ) ... #d e f i n e T_Consm (pm1, . , pmk ) (T_Cons1 )( pm1 )...(pmk ) Each constructor creates a sequence, where the first item is the name of the constructor function itself that can be used as a tag later. This makes pattern matching possible, which is explained in the next subsection. The rest of the sequence is simply holding the actual arguments provided to the call of the constructor. If constructor names are unique for all types, which is often a requirement in functional languages, the type name prefix of constructor macros can be omitted. In our example, structure field declarations have a type and a name: type FieldType = String type FieldName = String data Field = Field FieldType FieldName f i e l d 0 :: FieldType FieldName F i e l d → → f i e l d 0 = F i e l d Again, type aliases are used to give more meaning to string types. Function corresponding to macro F IELD is a type constructor.

Load more