What Macros Are and How to Write Correct Ones
Total Page:16
File Type:pdf, Size:1020Kb
What Macros Are and How to Write Correct Ones Brian Goslinga University of Minnesota, Morris 600 E. 4th Street Morris, Minnesota 56267 [email protected] ABSTRACT such as Java and Scheme. There arise situations in most Macros are a powerful programming construct found in some high-level languages where code falls short of this ideal by programming languages. Macros can be thought of a way having redundancy and complexity not inherent to the prob- to define an abbreviation for some source code by providing lem being solved. Common abstraction mechanisms, such as a program that will take the abbreviated source code and functions and objects, are often unable to solve the problem return the unabbreviated version of it. In essence, macros [6]. Languages like Scheme provide macros to handle these enable the extension of the programming language by the situations. By using macros when appropriate, code can be programmer, thus allowing for compact programs. declarative, and thus easier to read, write, and understand. Although very powerful, macros can introduce subtle and Macros can be thought of as providing an abbreviation for hard to find errors in programs if the macro is not written common source code in a program. In English, an abbrevi- correctly. Some programming languages provide a hygienic ation (UMM) only makes sense if one knows how to expand macro system to ensure that macros written in that pro- it into what it stands for (University of Minnesota, Morris). gramming language do not result in these sort of errors. Likewise, macros undergo macro expansion to derive their In a programming language with a hygienic macro system, meaning. Just as abbreviations are text-to-text transforma- macros become a reliable part of the language. tions, macros are source-to-source transformations [2]. Macros in C, Common Lisp, and Scheme will be explored, As an example, one can define additional control-flow con- focusing on issues relating to the correctness of macros in structs to a language using macros. unless is such a con- each language. To demonstrate macros solving a problem, a struct, and is equivalent to an if with a negated conditional. Project Euler problem will be solved with the help of macros. To add support for unless, we just write a macro that will Finally, some current work relating to hygienic macros in the turn expressions of the form unless <condition>: <code> Racket programming language will be discussed. into if not <condition>: <code>. The compiler can com- pile unless isEmpty(file): readData(file) by expand- ing the macro into if not isEmpty(file): readData(file). Categories and Subject Descriptors unless cannot be defined as a function because a function D.3.3 [Programming Languages]: Language Constructs would have its arguments evaluated before being called, and and Features so it would try to read data from the file even if it was empty. This behavior is called call-by-value evaluation order, and is Keywords used by the vast majority of programming languages. This also shows that macros can do things that functions cannot. Macros, hygienic macros, macro hygiene, programming lan- Macros create specialized sub-languages, allowing for the guage design succinct expression of otherwise verbose concepts in the pro- gramming language. Using macros, the programmer can 1. INTRODUCTION create, encapsulate, and reuse entire sub-languages in their programming language [3]. 1.1 Macros Code that is declarative states what it does, and not how it 1.2 What Can Go Wrong? does it. Declarative code is easier to read, write, and under- stand because the intent is not obscured by implementation When a macro is used, it produces source code to be used details. This is the underlying idea in high-level languages in place of the macro call. The macro must be well written or else the source code produced may not interact correctly with the environment in which it was called. For instance, it may introduce a variable whose name clashes with another variable, or it might cause some user code to be unintention- ally evaluated multiple times. These can lead to errors that are hard to track down because their cause is obscured by This work is licensed under the Creative Commons Attribution- the macro. The first situation only occurs if the program- Noncommercial-Share Alike 3.0 United States License. To view a copy mer happens to use a particular variable name, and so can of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Fran- be very hard to detect. A macro that does not suffer from cisco, California, 94105, USA. these flaws is called hygienic. Hygienic macros preserve the (define f #include <stdio.h> (lambda (x) (let ((y (* x 3))) #define SQ1(x) x*x (if (< y 10) #define SQ2(x) ((x)*(x)) (* y y))))) int main(int argc, char* argv[]) { Figure 1: A function in Scheme printf("%d\n", SQ1(2+3)); printf("%d\n", SQ2(2+3)); int a = 4; behavior of the program no matter where they are used in printf("%d\n", SQ2(++a)); the program. return 0; A language with a hygienic macro system provides hy- } gienic macro expansion, a method of maintaining the rela- tionship between a variable's name and its value, respect- Figure 2: Macro usage in C ing the structure of expressions in the process [2]. One does not need a hygienic macro system to write hygienic macros. However, a hygienic macro system provides guar- #define SQ1(x) x*x defines SQ1 to be a macro of one argu- antees about hygiene that a non-hygienic macro system can- ment x. Upon macro expansion, the argument to the macro not, and so greatly improves the reliability of macros. replaces x in x*x to get the macro expansion. For instance, SQ1(5) would macro expand to 5*5. 1.3 Overview of Lisp-like languages SQ1 is the naive way to write a squaring macro: just mul- Multiple Lisp-like languages are used in this paper. The tiply the input by itself. However, the first printf prints 11 syntax of Lisp-like languages are very similar to each other, instead of the expected result 25. The reason for this is that so this section will focus specifically on Scheme. Scheme is a SQ1(2+3) expands to 2+3*2+3, which results in 11. functional programming language, meaning that programs SQ2 introduces parentheses to fix the problem. SQ2(2+3) are based on a composition of functions. In Scheme, all expands to ((2+3)*(2+3)), so second printf prints 25 as source code is comprised of lists. A list in Scheme is of the desired. Parentheses allow C macros to acheive some level form (a b c), with the empty list being (). When a list, of hygiene [2]. say (f a b c), is evaluated, one of three things happen: if Unfortunately, SQ2 is still incorrect because the argument f is a special form (such as if), the list is evaluated using of the macro is evaluated twice. This was not a problem in special rules. If f is a macro, the macro produces a form the above example because 2+3 does not have side-effects, to be used in its place. If neither of these are true, then f but consider the third printf call. Here SQ2(++a) expands is evaluated as a function, and the remainder of the list is to ((++a)*(++a)). To allow extra optimization, this expres- evaluated and passed to the function as arguments. sion is undefined in C [1]: An anonymous function is introduced in Scheme by lambda. lambda takes a list of argument names, followed by some body code. Each argument name will be bound to the cor- Between the previous and next sequence point responding argument (the first to the first name, the second an object shall have its stored value modified at to the second, etc.) when executing the body. most once by the evaluation of an expression. Local variables are introduced using let. let takes a list of variable-value pairs and some body code. The variables This means that the expression ((++a)*(++a)) invokes un- from the variable-value pairs assume the values from the defined behavior because the value of a is modified twice in pairs when executing the body. the expression. When the code is compiled with the com- Figure 1 provides an example of defining a function in piler GCC, the compiled code increments a, then increments Scheme. This code defines f to be a function of one argu- a again, and then multiplies a by a to produce the result 36. ment x, introducing y to be three times x, and returns y if y When the compiler Clang compiles this code, the compiled is less than 10, otherwise it returns nothing. It is important code increments a, copies a to tmp, increments a, and then to note that the second argument to if, in this case (* y multiplies a by tmp to get 30. Because the expression is y), is only evaluated when the condition evaluates to true. undefined, both results are valid. Thus because of the un- hygienic nature of the macro, the value of SQ2(++a) depends on the compiler used. 2. MACROS IN C This macro could be fixed by storing the value of the argu- In an imperative programming language, programs are ment to a local variable, and then squaring that (the equiv- structured as a sequence of commands that update memory.