General Constant Expressions for System Programming Languages

General Constant Expressions for System Programming Languages Gabriel Dos Reis Bjarne Stroustrup Texas A&M University Texas A&M University [email protected] [email protected] Abstract time to minimize startup costs and memory consumption. Main- Most mainstream system programming languages provide sup- stream system programming languages, such as C and C++ do not port for builtin types, and extension mechanisms through user- have standard, reliable, and systematic language features for ex- defined types. They also come with a notion of constant expressions pressing that. At best, values that must be known before run time whereby some expressions (such as array bounds) can be evaluated can only be expressed in an impoverished subset of a language and at compile time. However, they require constant expressions to be must rely on non-standard conventions selectively implemented by written in an impoverished language with minimal support from the compilers. type system; this is tedious and error-prone. This paper presents a We developed a general and formal model for compile time framework for generalizing the notion of constant expressions in evaluation that can be applied to provide compile-time evalua- modern system programming languages. It extends compile time tion for a wide variety of programming languages. For space con- evaluation to functions and variables of user-defined types, thereby straints, we will limit this paper to an informal presentation of that including formerly ad hoc notions of Read Only Memory (ROM) framework. Our examples will be in C++ and some of our specific objects into a general and type safe framework. It allows a program- language-technical design decisions reflect our experience with the mer to specify that an operation must be evaluated at compile time. C++ language, implementation, use, and the C++ standards pro- Furthermore, it provides more direct support for key meta program- cess. ming and generative programming techniques. The framework is Consider a simple a jump table: formalized as an extension of underlying type system with a bind- typedef void (*handler_t)(); ing time analysis. It was designed to meet real-world requirements. extern void handler1(); In particular, key design decisions relate to balancing experssive extern void handler2(); power to implementability in industrial compilers and teachability. const handler_t jumpTable[] = { It has been implemented for C++ in the GNU Compiler Collection, &handler1, &hanlder2 and is part of the next ISO C++ standard. }; Categories and Subject Descriptors D.3 [Programming Lan- Obviously we could place jumpTable in ROM — all the infor- guages]: Language Constructs and Features mation to do so is available. Indeed, some (but not all) production compilers recognize such constructs and generate appropriate static General Terms Languages, Standardization initialization. However, relying on conventions leads to brittle and non-portable code, pushing programmers towards proprietary lan- 1. Introduction guages and language extensions. Modern high level programming languages typically feature a va- The usual meaning of const object is an unchanging datum, a riety of abstraction mechanisms to support popular programming value. However, that interpretation is insufficient to ensure that the methodologies: object-based programming, object-oriented pro- object is initialized in such a way it is placed in ROM. Consider: gramming, functional programming, generic programming, etc. double mileToKm(double x) { return 1.609344 * x; } For system programming, however, the abstraction support is in- complete. A distinctive trait of system programming is the ability to const double marks[] = { describe data known at program translation time. Such data are usu- mileToKm(2.3), mileToKm(0.76) ally denoted by so called constant expressions (C, C++, Java, etc.) }; or static expressions (e.g. Ada). Furthermore, for embedded systems we typically need to describe data that should reside in Read This defines the array object marks as unchanging, but there is no Only Memory (ROM). It is also common to need tables with non- guarantee that the values are computed at compile time. Indeed trivial structure that ideally are computed at compile time or link since mileToKm() is a function, this will generate (redundant) dynamic (run-time) initialization, that unfortunately ensures that marks never ends up in ROM. ISO C++ [1] does not provide a systematic and reliable way for the programmer to require that a particular object must be evaluated at compile time or link time. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed A workaround for this “permissive” const semantics is for pro- for profit or commercial advantage and that copies bear this notice and the full citation grammers to use preprocessor macros, designed for token substitu- on the first page. To copy otherwise, to republish, to post on servers or to redistribute tions, rather than general and type safe language mechanisms. For to lists, requires prior specific permission and/or a fee. example the function mileToKm may be replaced with the macro SAC ’10 March 22–26, 2010, Sierre, Switzerland Copyright c 2010 ACM 978-1-60558-638-0/10/03. $5.00 #define MILE_TO_KM(M) ((M) * 1.609344) However, preprocessor macros are brittle abstraction tools, fraught Given those two types our example shrinks to: with type, namespace, and scope problems. For example, assume constexpr LengthInKM marks[] = { that two collaborating teams want to make sure that units of mea- LengthInMile(2.3), LengthInMile(0.76) surement are not confused (potentially leading to catastrophic }; crashes). An obvious solution would be to adopt a “typeful programming” discipline using distinct classes/types to represent val- Obviously the representations of both classes LengthInKM and ues of differing units. For instance, they may provide LengthInKM LengthInMile are “sufficiently simple” for the compiler to “un- and LengthInMile types so that the marks initialization could be derstand” — each is represented by a single value of the built-in written: type double. All member functions are defined with the keyword constexpr to ensure that the compiler checks that it can evaluate const LengthInKM marks[ ] = { their bodies at compile time. Obviously, these functions are “suffi- LengthInKM(MILE_TO_KM(2.3)), ciently simple” for compile time evaluation. LengthInKM(MILE_TO_KM(0.76)) We use two objects of type LengthInMile to initialize an array }; marks of objects of type LengthInKM. Because marks is defined That looks good, but the use of the class LengthInKM with a with the keyword constexpr the construction and conversions are constructor brings back the problem with ROMability avoided all performed at compile time. The compiler will issue an error if through the use of the macro MILE_TO_KM instead of the func- the initialization of a constexpr object is dynamic. The notion of tion mileToKm. Again, this may go unnoticed since the resulting constant expression is no longer limited to a handful builtin types (dynamic initialization) code is correct from the C++ language and operations, but also applies to user-defined types and functions. point of view. What we see here is a failure to support user-defined Critically, the rules for compilation and evaluation are simple for type abstractions at the same level as builtin types. In reality, the both compilers and users. problem is worse because it forces programmers to avoid key facil- The rest of this paper is structured as follows. Section 2 introduces ities of the C++ standard library and of embedded systems support the notion of literal types, constexpr functions, and discusses some libraries. of the pitfalls that need to be avoided. Then x3 considers recur- The problems illustrated above are not specific to C or C++. They sive constexpr function. Section 4 consider extensions to functions are shared by all mainstream languages used for system program- taking parameters by reference; then x5 considers interaction with ming. Almost invariably, they define the notion of constant expres- object-oriented facilities (e.g. inheritance, virtual functions). Fi- sion or static expression as an expression of builtin types (e.g. in- nally we highlight related works in x6 and conclude in x8. Fine tegers only) using only a restricted list of builtin operators (i.e. no points and technical aspects of the design are made precise in a de- functions) and builtin constants. This paper proposes a methodol- tailed static semantics following a natural semantics style [3] pre- ogy to develop the notion of general constant expression applicable sentation contained in a much longer technical report. to most modern system programming languages. The methodol- ogy has been instantiated for C++, implemented in a version of the 2. Constant Expressions GNU Compiler Collection, and has been accepted for the C++0x We start with the notion of a constant expression as it has appeared draft standard. It introduces two notions: in C++ from its inception. From that, we proceed through a se- • literal types: A literal type is one which is “sufficiently simple” quence of generalizations. so that the compiler knows its layout at compile time (see x2.2). In some contexts — such as array bounds — an expression is re- • constexpr functions: A constexpr function is one which is “suf- quired to evaluate to a constant. A constant expression is either ficiently simple” so that it delivers a constant expression when a literal, a relational expression the sub-expressions of which are int called with arguments that are constant values (see x2.1). constant expressions, an arithmetic expression of type whose sub-expressions are constant expressions, or a conditional expres- The “length example” can be written more clearly, type safe, and sion whose sub-expressions are all constant expressions. The name without added run-time or space overheads in C++0x. We start by of global variables defined with the const keyword and initialized defining a type LenghtInKM: with constant expressions also evaluates to a constant expression.

Load more