A mechanized semantics for C++ object construction and destruction, with applications to resource management Tahina Ramananandro, Gabriel dos Reis, Xavier Leroy

To cite this version:

Tahina Ramananandro, Gabriel dos Reis, Xavier Leroy. A mechanized semantics for C++ object construction and destruction, with applications to resource management. POPL ’12 - 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM, Jan 2012, Philadel- phia, PA, United States. pp.521-532, ￿10.1145/2103656.2103718￿. ￿hal-00674663￿

HAL Id: hal-00674663 https://hal.inria.fr/hal-00674663 Submitted on 27 Feb 2012

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Mechanized Semantics for C++ Object Construction and Destruction, with Applications to Resource Management

Tahina Ramananandro Gabriel Dos Reis ∗ Xavier Leroy † INRIA Paris-Rocquencourt Texas A&M University INRIA Paris-Rocquencourt [email protected] [email protected] [email protected]

Abstract oriented programming. Many languages also provides some sup- We present a formal operational semantics and its Coq mechaniza- port for clean-up of objects at the end of their lifetime, e.g. finaliz- tion for the C++ object model, featuring object construction and ers executed asynchronously at loosely-specified times or dispose destruction, shared and repeated , and virtual statements allowing a programmer to explicitly request clean-up. function call dispatch. These are key C++ language features for Because it is enforced by the programming language itself and high-level system programming, in particular for predictable and not by coding conventions, the object construction and destruction reliable resource management. This paper is the first to present a protocol provides strong guarantees that can be leveraged by good formal mechanized account of the metatheory of construction and programming practices. This is the case for Resource Acquisition Is destruction in C++, and applications to popular programming tech- Initialization (RAII), a popular C++ programming idiom for safely niques such as “resource acquisition is initialization.” We also re- managing system resources such as file descriptors. With RAII, port on irregularities and apparent contradictions in the ISO C++03 resource acquisition is always performed in constructors, and re- and C++11 standards. source release in the corresponding destructors. Since C++ guar- antees that every execution of a constructor is eventually matched Categories and Subject Descriptors D.1.5 [Programming by an execution of a destructor, RAII minimizes the risk of leaking techniques]: Object-oriented Programming; D.2.0 [Software resources. Engineering]: General—Standards; D.2.2 [Software Engineer- As practically useful as they are, constructors and destructors ing]: Design Tools and Techniques—Object-oriented design raise delicate issues for the programmers, for static analysis tools, methods; D.2.4 [Software Engineering]: Software/Program and even for language designers. While an object is being con- Verification—Correctness proofs; D.3.3 [Programming structed or destructed, it is not in a fully operational state. This Languages]: Language Constructs and Features—Classes and raises the question of what operations constructor/destructor code objects, Inheritance; F.3.3 [Logics and meanings of programs]: should be allowed to perform on this object, and what semantics to Studies of program constructs—Object-oriented constructs give to these operations. A typical example of this dilemma is vir- tual method dispatch: in Java and C#, unrestricted virtual method General Terms Languages, Verification calls are allowed in constructors, with the same semantics as on fully constructed objects; in contrast, C++ takes object construc- 1. Introduction tion states into account when resolving virtual function calls, in a way that provides stronger soundness guarantees but complicates One of the earliest decisions for C++ (in 1979) was to provide lan- semantics and compilation. Section 2 reviews the main features of guage support for the construction and destruction of objects [15]: C++ object construction and destruction. programmer-supplied code fragments called constructors are au- This paper reports on a formal, mechanized semantics for con- tomatically executed when a new object is created and before it struction and destruction in the C++ object model. This model is is made available to the rest of the program; these constructors especially rich: it features multiple inheritance (both shared and re- can execute arbitrary code sequences and are intended to estab- peated), fields of structure and structure array types, as well as sub- lish the execution environment for operations by initializing fields tle interactions between construction states, virtual function calls, and maintaining class-level invariants. Symmetrically, the language and dynamic casts. Our semantics, fully mechanized in the Coq also supports destructors, which are executed before an object is proof assistant, is, to the best of our knowledge, the first to cap- destroyed at precisely-defined times. Such general constructors are ture faithfully these interactions as intended by the C++ standard. now available in most programming languages supporting object- This semantics, described in section 3, is the first contribution of

∗ the paper. Partially supported by NSF grants CCF-1035058 and OISE-1043084. The second contribution, in section 4, is the specification and † Partially supported by ANR grants Arpege` U3CAT and INS Verasco. formal verification of several expected properties of the construc- tion/destruction protocol, ranging from common-sense properties such as “any subobject is constructed, or destroyed, at most once” to the first precise characterization of the properties that make the Permission to make digital or hard copies of all or part of this work for personal or RAII programming idiom work in practice. We also study the evo- classroom use is granted without fee provided that copies are not made or distributed lutions of the dynamic types of objects throughout their life cycle for profit or commercial advantage and that copies bear this notice and the full citation and formally justify the program points where compilers must gen- on the first page. To copy otherwise, to republish, to post on servers or to redistribute erate code to update dynamic types (v-tables). to lists, requires prior specific permission and/or a fee. As a third contribution, this work exposes some irregularities POPL’12, January 25–27, 2012, Philadelphia, PA, USA. Copyright c 2012 ACM 978-1-4503-1083-3/12/01. . . $10.00 and apparent contradictions in the ISO C++03 and C++11 stan- dards, summarized in section 5. These have been reported to the } ISO C++ committee; some were resolved in C++11, others will be private: in future revisions. We finish the paper with a few words about ver- vector features; ified compilation (§6), a discussion of related work (§7) and future }; work (§8), and concluding remarks (§9). Construction of a FeaturedLocus object entails constructing its All results presented in this paper were mechanized using Locus base-class subobject, followed by copy-initialization of its the Coq proof assistant [1]. The Coq development is available features field from the v parameter. If, in the Locus constructor, at http://gallium.inria.fr/˜tramanan/cxx/. By lack of the call to the virtual function show resolved to the final overrider space, some aspects of the semantics are only sketched in this FeaturedLocus::show (as in Java or C#), it would access the yet- paper. A full description can be found in the first author’s Ph.D. to-be-constructed field FeaturedLocus::features – an undesirable thesis [12, part 2]. and unpleasant outcome. Consequently, Principle II argues for re- solving the call (in the constructor) to an overrider not from a de- 2. Object construction and destruction in C++ rived class; here it is Locus::show. A constructor is a special function that turns “raw memory” into Finally, Principle III supports predictability and reliability. For an object. Its purpose is to create an environment where class in- instance, if an object acquires N resources through the construction variants hold, and where the class’s member functions will operate of its subobjects, in general one would want to release them in the [2, 15, 16]. Conversely, at the end of the useful life of an object, an- reverse order of acquisition, or else a deadlock might ensue. other special function — called destructor — releases any acquired Turning these general principles (and their implications) into resources and turns the object back to mere raw memory. The time executable specification in the face of object-oriented language fea- span between the completion of a constructor and the start of a de- tures such as multiple inheritance (both repeated and shared), late structor is the lifetime of the constructed object. The C++ language binding function dispatch (virtual function calls), type queries, etc. rules were designed to guarantee that class invariants set up by a is an interesting challenge. On the other hand, their fulfillment constructor hold through the entire lifetime of an object. For exam- leads to interesting soundness results and, more importantly, reli- ple, a complete object’s dynamic type does not change during its able software development techniques for the programmer. lifetime. Initialization versus assignment There is a difference between C++’s object construction and destruction mechanism is rooted initialization and “declaration followed by a first-time assignment”. in a few design principles: And it is not a stylistic distinction. Base-class subobjects and non- I. Every object and each of its subobjects (if any) is constructed static data members are constructed before the proper body of exactly once. the constructor is executed. Initializers for subobjects are speci- II. No object construction should rely on parts yet to be con- fied in a comma-separated list before the opening brace of a con- structed, and no object destruction should rely on parts already structor body (see the definition of the constructors for Locus and destroyed. FeaturedLocus in the previous example.) In particular, there is no III. Every object is destroyed in the exact reverse order it was other way to initialize base-class subobjects, const or reference constructed. nonstatic data members. A base-class subobject, or a nonstatic data member, not mentioned in the member-initializer list is implicitly These principles, independently of implementation considerations, initialized with a default constructor, i.e. a constructor that can be enable support for reliable, modular, large-scale software develop- called with no argument. Consequently, the language rules imply ment. that all subobjects are constructed by the time the first statement of Principle I supports the idea that a resource acquired by an the constructor body (if not empty) is executed. The constructor for object (say a file lock) is predictably acquired exactly once. FeaturedLocus could have been written as Principle II embodies a notion of locality and scoping. While the constructor is creating the computation environment for member FeaturedLocus(int a, int b, const vector& v) functions, very few class guarantees can be made. It is desirable : Locus(a,b) { features = v; } to localize the complexity and/or convolution needed to correctly In this variation, the Locus base-class subobject is initialized; the call a function in a few specific places before an object begins its nonstatic data member features is default constructed, and finally lifetime. Furthermore, the author of a class should not have to worry the constructor body assigns the value of v to features. In C++, about which data members of possible derived classes are correctly assignments of objects of class types are generally implemented by initialized during the execution of a constructor. On the other hand, user-defined functions. These functions must assume that the left she should be able to rely on base-class invariants. Consider the hand side of the assignment is an object already constructed and following example: in a valid state. In particular, vector’s assignment function must struct Locus { carefully check for self-assignment to avoid premature memory re- Locus(int a, int b) : x(a), y(b) { show(); } lease; it must properly release the resources the object was manag- virtual void show() const { ing before taking hold on the new ones. By contrast, during copy- cout << x << ’ ’ << y << endl; construction, the object being constructed has not acquired any re- } sources yet; so the implementation code of the copy-constructor is protected: usually far simpler than that for assignment. int x; int y; Initialization order Principle III has an interesting implication on }; the construction order of the constituents of an object. Observe that there is exactly one destructor per class, and that there can be as struct FeaturedLocus : Locus { FeaturedLocus(int a, int b, const vector& v) many constructors (per class) as deemed necessary by the program- : Locus(a, b), features(v) { } mer. Therefore, for a given class, subobjects in all constructors must void show() const { be initialized in the same order — for that order must be the reverse cout << x << ’ ’ << y destruction order of the (unique) destructor. Any well-defined order << ’ ’ << features << endl; can do. However, for simplicity, a natural choice is the declaration complete object or Here, the default constructor Locus explicitly constructs the data most-derived subobject inheritance subobject members x and y with a default value (zero) so that in the com- plete object declaration “Locus origin;”, the origin object has its coordinates initialized to zero. Would the semantics be the same were x and y not explicitly listed in the member-initializer list? The answer is no: objects of builtin type with no explicit initializers direct and direct nonstatic direct nonstatic are constructed but, left with indeterminate values. This irregular- indirect nonvirtual data nonvirtual data ity was introduced as a way to satisfy the widespread practice in C virtual bases members bases members where objects are given first-time values long after their declara- bases tions. This irregularity, and others related to destructors for builtin types, surely complicate the C++ semantics rules as well as formal- Figure 1. A recursive tree representation of the subobjects of a ization. However, they are not fundamental to the object model. In class, such that a depth-first left-to-right traversal yields the subob- fact, the C++ standards committee is considering unifying initial- ject construction order. ization and lifetime rules in a post-C++11 standard. We hope this work will help in that endeavor. order: initialization of all base-class subobjects in the (left-to-right) 3. Formal operational semantics order of declaration in the class definition; followed by initializa- tion of all nonstatic data members in their order of declarations in 3.1 Syntax of the core language the class definition. We focus on a core language of objects that features the main as- This construction order is simple, predictable, and reflects the pects of the C++ object model: construction and destruction; multi- compositional aspect of “class growth”. However, shared inheri- ple inheritance (both virtual and non-virtual), virtual functions, and tance (virtual base classes) poses a problem for the principle of nonstatic data members (a.k.a. “fields”) of scalar, array or complete “base before derived” and “strict order of declaration construction”. non-abstract object types. To simplify the presentation, we formal- Indeed, consider the following hierarchy of IO stream classes: ize only fields that are arrays of object types, viewing a field of struct istream : virtual basic_ios { ... }; object type T as a one-element array of type “T [1]”. struct ostream : virtual basic_ios { ... }; The core language is a language of statements operating over struct iostream : istream, ostream { ... }; variables in 3-address style. We assume that typechecking was performed, static overloading was resolved, and expressions were where the class basic_ios maintains states and implements opera- decomposed into elementary statements. The syntax of statements tions common to both input and output streams; the classes istream follows. and ostream implement input and output stream functionalities. The class iostream combines both input and output facilities. If an op,... : Op Builtin operations iostream object was to be constructed according to the declaration var,... : Var Variables order outlined above, the corresponding basic_ios base-class sub- B,C,... : Class Classes object would be constructed twice: once when the istream base- fname : Field Field names class subobject is constructed, and a second time when the ostream mname : Method Method names base-class subobject is constructed. This problem is resolved by ex- Stmt ::= skip Do nothing | var ′ := op(var ∗) Builtin operation ecuting constructors for virtual bases (in declaration order) before ′ | var := var->C fname Field read all other constructors. More generally, C++ retains the following ′ | var->C fname := var Scalar field write almost-declaration order construction for a complete object, also ′ depicted on Figure 1: | var := &var[var index]C Array cell access | var ′ := 1. virtual base-class subobjects are constructed in a depth-first dynamic casthBiC (var) Dynamic cast ′ ∗ left-to-right traversal of the inheritance graph; | var ->C mname(var ) Virtual function call 2. direct non-virtual base-class subobjects are constructed in dec- | Stmt 1; Stmt 2 Statement sequence laration order; | return Function return 3. nonstatic data members are constructed in declaration order. |{C var[var count]= Block-scoped object {ObjInit ∗}; Stmt} Builtin object types The C++ programming language was de- signed with two competing ideals: a sound high-level object model The Coq formalization also features simple control structures with equal support for user-defined types and builtin types, and a (if/then/else, infinite loops with early exits), as well as static casts. near-total compatibility with C for system programming. The re- We leave these extra features out in this paper to concentrate on sulting tension has made the general model of object construction the essential features of the language: accesses to fields; virtual and destruction grow barnacles. Indeed, C lacks the notion of object function calls; dynamic casts; and block-scoped objects. The construction so central to C++’s object model. But, in reality, both statement {C x[N]= {ObjInit ∗}; Stmt} allocates an array of N languages agree on the semantics of programs that read from unini- objects of type C, constructs each of its elements according to the tialized objects: undefined behavior, except in very limited cases list of initializers ObjInit ∗, binds the array to variable x, executes where a read access is done as “raw memory”. In C, object initial- the block body Stmt, destructs each element of the array x, and ization is usually expressed in terms of storage acquisition followed finally deallocates the whole array. by assignment; C++ distinguishes assignment from construction. Initializers are syntactically presented in C++ as constructor To achieve the same ultimate semantics and to support generic pro- calls with expressions as arguments. Since our language lacks ex- gramming, C++ extended the notion of constructor (and destructor) pressions, we model initializers as a statement (which evaluates the to builtin types. A builtin data member can be explicitly initialized expression arguments into temporary variables) followed by an in- with default value using the default construction syntax: vocation of a constructor for object fields or the initialization of a scalar field. struct Locus { Locus() : x(), y() { } }; ∗ I ObjInit ::= Stmt; C(var ) Class object initializer B virtual base of C B −h(h,l)i→ A FieldInit ::= (Stmt; fname(var)) Scalar field initializer I | fname{ObjInit ∗} Structure field C −h(Shared,l)i→ A initializers (one for An object is said to be a most-derived object if, and only if, it each array cell) is not a base-class subobject of any other object. A most-derived Init ::= ObjInit | FieldInit object of type C is designated by the trivial inheritance path (Repeated,C :: ǫ). Arguments to functions and constructors can be scalar values, i.e. I integers, or floating point data, or pointers to objects. We do not Inheritance paths compose naturally: if C −h(h,l)i→ B and I I model passing objects by value: this raises delicate issues with the B −h(h′,l′)i→ A, we have C −h(h,l)@(h′,l′)i→ A. construction and destruction of temporary objects, discussed in §8. Here, @ is the cast to base operator, defined as Constr ::= C(var ∗): Init ∗{Stmt} Constructor follows: for a cast through non-virtual inheritance, (h,l)@(Repeated,B :: l′)=(h,l−q l′); whereas through virtual Destr ::= ∼ C(){Stmt} Destructor ′ ′ MethodDef ::= virtual void Virtual function inheritance, we have (h,l)@(Shared,l ) = (Shared,l ). (We q mname(var ∗){Stmt} (method) write − for list concatenation.) FieldDef ::= scalar fname; Data member In the presence of fields that are structures or arrays of struc- | struct C[size] fname; (field) tures, the notions of paths and subobjects must be extended to al- Base ::= B | virtual B low selecting elements of arrays of structures. Ramananandro et ClassDef ::= struct C : Base∗ al. [13] define array paths as lists of (i,σ,F ) triples, where i is {FieldDef ∗ MethodDef ∗ an array index, σ an inheritance path, and F the name of a struc- ∗ ture array field. An array path goes from an array of n structures Constr Destr} Class definition ′ ′ Program ::= ClassDef ∗; of type C to an array of n structures of type C , which we write A main(){Stmt} Program C[n] −hαi→ C′[n′] and define by the inference rules below. An CI A complete program consists of a hierarchy of classes. Each class auxiliary predicate C[n] −hi,σi→ A captures the selection of the definition contains a list of virtual and non-virtual base classes, a i-th element of an array followed by the extraction of a base-class list of data members (a.k.a. fields), definitions for virtual functions subobject σ of type A. (a.k.a. methods), one or several constructors, and one destructor. I Each constructor consists of a parameter list, a sequence of ini- 0 ≤ iC f ,L, Env, B), K, G) destruction protocol, starting with the simpler of the two, namely → (Codepoint(skip ,L, Env ′, B), K, G) destruction. Destruction starts when the body of a block has re- duced to skip or return: Likewise, for an assignment to a scalar field, we have: (st = skip ∧ L = ǫ) ∨ st = return G.LocType(ℓ)=(C,n) Env(var)= π G ⊢ π : C f =(fid, (Sc,t)) ∈F(C) ′ G.ConstrStateF (π,f)= Constructed (Codepoint(st,L, Env, (ℓ,L ) :: B), K, G) ′ ′ → (DestrArray(ℓ,L,n − 1,C), Env(var )= res G = G[FieldValue(π,f) ← res] ′ ′ Kcontinue(st, Env,L , B) :: K, G) (Codepoint(var->C f := var ,L, Env, B), K, G) → (Codepoint(skip ,L, Env, B), K, G′) The DestrArray execution state requests the destruction of all ele- Note the side condition preventing assignment to a scalar field that ments of the structure array C[n] at ℓ, starting with the last element is not Constructed. For reads, this conditions is not necessary: (n − 1). Eventually, we reach a state where no more elements re- Theorem 4 below shows that if a field has a value, then it is main to be destructed, in which case we effectively exit from the Constructed. block. Dynamic types Virtual function calls and dynamic casts both (DestrArray(ℓ,α, −1,C), involve the notion of dynamic type of a subobject, which the C++ Kcontinue(st, Env,L, B) :: K, G) standard defines as its most-derived object during the lifetime → (Codepoint(st, L, Env, B), K, G) of the latter. However, the standard also permits virtual function calls and dynamic casts during the execution of the constructor When the destruction of a most-derived object (i.e. a structure body, or field initializers, or destructor body corresponding array cell) is requested, we first enter the body of the associated to the subobject. To unify these two cases, we introduce the destructor in a variable environment that binds this to the object. notion of generalized dynamic type. We define the predicate ′ ′′ 0 ≤ i ∼ C(){stmt} gDynType(ℓ,α,i,σ,B,σ ,σ ), meaning that the base-class ∅ ′ π =(ℓ, (α,i, (Repeated,C :: ǫ))) Env = [this ← π] subobject σ of static type B is the generalized dynamic type of ′ G = G[ConstrState(π) ← StartedDestructing] the subobject (ℓ, (α,i,σ)) with σ = σ′@σ′′. (DestrArray(ℓ,α,i,C), K, G)

A CI → (Codepoint(stmt,ǫ, Env,ǫ), G.LocType(ℓ)=(D,n) D[n] −hαi→ C[m] −h(i,σ)i→ B Kdestr(π) :: Kdestrcell(ℓ,α,i,C) :: K, G′) G.ConstrState(ℓ, (α,i, (Repeated,C :: ǫ))) = Constructed G ⊢ gDynType(ℓ,α,i,σ,C, (Repeated,C :: ǫ),σ) When the destructor body returns, the fields of the subobject have to be destructed, in reverse declaration order. G.LocType(ℓ)=(D,n) A CI π =(ℓ, (α,i, (h,l))) last(l)= C L = rev(F(C)) D[n] −hαi→ C[m] −h(i,σ )i→ C ◦ ◦ (Codepoint(return, Stmt ∗, Env,ǫ), Kdestr(π) :: K, G) G.ConstrState(ℓ, (α,i,σ )) = c ◦ → (Destr(π, Fields,L), K, G) c = BasesConstructed ∨ c = StartedDestructing ′ I ′ C◦ −hσ i→ B σ = σ◦@σ Destructing a scalar field erases its value and changes its construc- ′ tion state to Destructed, before proceeding with the remaining G ⊢ gDynType(ℓ,α,i,σ,C ,σ ,σ ) ◦ ◦ fields. The semantics of virtual function calls is similar to that given by Wasserrab et al. [18], except that we use the generalized f =(fid, (Sc,t)) ′ F dynamic type instead of the most-derived object. If C◦ is G = G[FieldValue(π,f) ←⊥][ConstrState (π,f) ← Destructed] the generalized dynamic type of subobject σ′, the predicate ′ ′′ ′′ ′′ ′′ (Destr(π, Fields,f :: L), K, G) VFDispatch(C◦,σ ,f,B ,σ ) determines the subobject σ : B → (Destr(π, Fields,L), K, G′) of C◦ containing the definition of virtual function f that must be invoked, following the same algorithm as in [18]: Destructing a structure array field changes its construction state 1. Determine the static resolving subobject σ declaring f. to StartedDestructing, then requests the destruction of the array, f starting from its last cell, and remembering the remaining fields 2. Choose the final overrider for the method. The final overrider through Kdestrother in the continuation. ′′ is the inheritance subobject σ between C◦ and σf , nearest to C◦, and declaring f. π =(ℓ, (α,i,σ)) f =(fid, (St, (C,n))) α′ = α−q (i,σ,f) :: ǫ The transition rule for a virtual function call is, then, G′ = G[ConstrStateF (π,f) ← StartedDestructing] Env(var)=(ℓ, (α,i,σ)) ′ (Destr(π, Fields,f :: L), K, G) G ⊢ gDynType(ℓ,α,i,σ,C◦,σ◦,σ ) ′ ′ ′′ ′′ → (DestrArray(ℓ,α ,n − 1,C), VFDispatch(C◦,σ ,f,B ,σ ) ′ ′′ Kdestrother(π, Fields,f,L) :: K, G ) B .f = f(varg 1,..., varg n){body} ∀j, Env(var j )= vj ′ ∅ ′′ Env = [varg 1 ← v1] ... [varg n ← vn][this ← (ℓ, (α,i,σ◦@σ ))] Then, once all cells have been destructed, the field enters the De- (Codepoint(var->B f(var 1 ... var n),L, Env, B), K, G) structed state, and we proceed with the destruction of the remain- → (Codepoint(body,ǫ, Env ′,ǫ), Kretcall(var,L, Env, B) :: K, G) ing fields. Destructed, and we proceed with the destruction of the preceding G′ = G[ConstrStateF (π,f) ← Destructed] structure array cells. (DestrArray(ℓ′,α′, −1,C), Kdestrother(π, Fields,f,L) :: K, G) π =(ℓ, (α,i, (h,l))) → (Destr(π, Fields,L), K, G′) last(l)= C G′ = G[ConstrState(π) ← Destructed] Eventually, all fields have been destructed. The subobject then (Destr(π, Bases(Virtual),ǫ), K, G) ′ changes its construction state to DestructingBases. At this point, → (DestrArray(ℓ,α,i − 1,C), K, G ) no virtual function call nor dynamic casts may be used on this subobject. The destruction of the direct non-virtual bases starts, in Construction To execute a block statement, we allocate memory reverse declaration order. for the structure array declared in the block, then start the construc- π =(ℓ, (α,i, (h,l))) last(l)= C L = rev(DNV(C)) tion protocol by requesting the construction of the first element of ′ this array: G = G[ConstrState(π) ← DestructingBases] (Destr(π, Fields,ǫ), K, G) ℓ 6∈ dom(G.LocType) G′ = G[LocType(ℓ) ← (C,n)] → (Destr(π, Bases(DirectNonVirtual),L), K, G′) Env ′ = Env[c ← Ptr(ℓ,ǫ, (Repeated,C :: ǫ))] Destructing a (virtual or direct non-virtual) base B of π enters its (Codepoint({C c[n]= {ι}; st}, St, Env, Bl), K, G) destructor, remembering the other bases through Kdestrother. → (ConstrArray(ℓ,ǫ,n, 0,C,ι, Env ′), Kcontinue st, Env ′, St, Bl , ′ ∼ B(){stmt} ( ) :: K G ) ′ ′ π = AddBase(π,β,B) Env = ∅[this ← π ] The construction protocol implements the necessary traversal of all ′ ′ G = G[ConstrState(π ) ← StartedDestructing] subobjects of c using the same “small-stepping” style, based on (Destr(π, Bases(β),B :: L), K, G) continuations, that was illustrated above in the case of destruction. → (Codepoint(stmt,ǫ, Env,ǫ), There are two main differences. First, the construction order is Kdestr(π′) :: Kdestrother(π, Bases(β),B,L) :: K, G′) the exact opposite of the destruction order: instead of enumerating array cells by decreasing indices, fields and non-virtual bases in Eventually, we reach a state where all direct non-virtual bases of π reverse declaration order and virtual bases in reverse VO order, have been destructed. There are two cases to consider, depending we enumerate array cells by increasing indices, fields and non- on the top of the continuation stack. In the first case, the continu- ′ virtual bases in declaration order and virtual bases in VO order. ation stack starts with a Kdestrother(π , Bases(β),B,L), indicat- Second and more importantly, we need to execute the initializers ing that π is not a most-derived object. In this case, there is no need that compute initial values for scalar fields and the arguments to to destruct the virtual part of π (this will be done later by the en- be passed to constructors, adding a number of transition rules. We closing most-derived object) and we are done with the subobject π: refer the reader to [12, chapter 9] for full details. its construction state becomes Destructed, and we proceed with the destruction of the remaining bases of π′. G′ = G[ConstrState(π) ← Destructed] 4. Properties of construction and destruction (Destr(π, Bases(DirectNonVirtual),ǫ), In this section, we state (and sometimes outline the proofs of) sev- Kdestrother(π′, Bases(β),B,L) :: K, G) eral semantic properties of interest for construction and destruction. → (Destr(π′, Bases(β),L), K, G′) Some are technical consequences of the definition of our semantics, but others capture higher-level properties that C++ programmers The second case is when the continuation stack starts with a Kde- often rely on, such as the RAII principle. strcell. Then, π must be a most-derived object, and its direct and indirect virtual bases need to be destructed. According to the C++ standard, virtual bases must be destructed in reverse inheritance 4.1 Run-time invariant graph order. Consider the following recursive function: Our transition rules and grammars for execution states are lax in VO that they put very few constraints on the general shapes of states. list({Repeated, Shared}×C) → list(C) However, in transition sequences starting from the initial state, the VO(ǫ) = ǫ reachable states are a small subset of all possible states and enjoy q ′ VO((Repeated,B) :: q) = VO(D(B))− VO(q) many low-level properties that are essential to prove the high-level ′ VO((Shared,B) :: q) = VO(D(B))−q (B :: VO(q)) theorems in this section. We gathered about 20 such properties in where D(B) is the list of direct bases of B, in declaration order, one invariant INV and proved that this invariant is satisfied by the tagged with Repeated(for non-virtual bases) or Shared(for virtual initial state and preserved by the transition rules. This is the largest bases), and −q ′ is list concatenation with elimination of duplicates part of the Coq mechanization, totaling about 15000 lines of Coq ′ and taking about 2 hours to re-check the proof. (l1−q l2 =def l1−q (l2 \ l1)). It is easy to see that the list L = VO(D(C)) contains all the direct and indirect virtual bases of C, A detailed explanation of the invariant is given in [12, sec- exactly once each, and contains no other classes. Moreover, the tion 10.1]. Here, we just show the two most interesting conse- order in which virtual bases appear in list L is consistent with the quences of the invariant, which relate the construction states of cer- A B tain subobjects. inheritance graph order. In particular, if and are virtual bases ′ C A B A B Let p,p be two subobjects of the same complete object. We of such that is a virtual base of , then appears before ′ L say that p is a direct subobject of p if, either, (1) p is a direct non- in . ′ ′ L′ = rev(VO(D(C))) virtual base of p ; (2) or p is a most-derived object and p is a virtual base of p′; (3) or p is a field subobject of p′ (that is, p is a cell of a (Destr(π, Bases(DirectNonVirtual),ǫ), structure array field of p′). Kdestrcell(ℓ,α,i,C) :: K, G) ′ → (Destr(π, Bases(Virtual),L ), K, G) Lemma 1 (Vertical relations on construction states). Assume that Finally, when all virtual bases have been destructed, the subobject p is a direct subobject of p′. The construction states of p and p′ in under consideration (necessarily a most-derived object) becomes any execution state satisfying INV are related as follows: If p′ is. . . Then p is. . . A similar theorem holds for destruction. Here, we use Unconstructed Unconstructed unchanged the notion of trivial destructor from the C++ standard: StartedConstructing Unconstructed if p is field subobject of p′ a class C has a trivial destructor if (1) its destructor is just return; BasesConstructed Constructed if p is a base subobject of p′ (2) all virtual bases and direct non-virtual bases of C have trivial Constructed Constructed destructors; (3) for each structure array field f of C, if f has type StartedDestructing Constructed if p is a base subobject of p′ B, then B has a trivial destructor. DestructingBases Destructed if p is a field subobject of p′ Destructed Destructed 4.3 Safety of field accesses and virtual function calls The rule for reading the contents of a scalar field puts no precondi- Let p′ be a subobject of static type C, and p ,p be two direct 1 2 tion on the initialization state of the field. We can, however, show subobjects of p′. We say that p occurs before p if, either, (1) p′ 1 2 that this rule gets stuck whenever the field is not in the Constructed is a most-derived object and p ,p are two virtual bases of p′ in 1 2 state. inheritance graph order; (2) p1 and p2 are two direct non-virtual bases of p in declaration order; (3) p1 and p2 are two cells of the Theorem 4. A scalar field has a well-defined value only if it is same array field, in the order of their indexes within the array; (4) Constructed. p1 and p2 are two cells of two different fields in declaration order; ′ (5) p is a most-derived object and p1 is a virtual base, and p2 is a Proof. Follows from INV and the fact that setting the value of a ′ direct non-virtual base of p or a cell of an array field; (6) p1 is a scalar field only occurs in two cases: either on an ordinary scalar ′ direct non-virtual base of p and p2 is a cell of an array field. field, which is forbidden if the field is not Constructed, or on scalar field initialization, which turns the field construction state Lemma 2 (Horizontal relations on construction states). Assume to Constructed. Moreover, our semantics erases the value of the that p occurs before p . The construction states of p and p in 1 2 1 2 scalar field when destructing it. any execution state satisfying INV are related as follows: If p is. . . Then p is. . . 1 2 ++ Unconstructed Like C itself, our semantics allows fields to have no initializ- StartedConstructing Unconstructed ers and therefore remain without a defined value after construction. BasesConstructed However, we also proved that if we remove the transition rule al- lowing fields without initializers to proceed, a scalar field has a Constructed in an arbitrary state well-defined value if and only if it is Constructed. StartedDestructing Virtual functions have no initialization state per se. However, DestructingBases Destructed the C++ semantics for virtual function calls provides a very useful Destructed guarantee concerning the construction state of the this subobject: In the remainder of this section, we only consider execution Theorem 5. Whenever a virtual function is called, the subob- states that can be reached from the initial state and therefore satisfy ject bound to its this parameter is in state BasesConstructed or the invariant INV . Constructed or StartedDestructing, and all its base-class subob- 4.2 Progress jects are in state Constructed. To check that our transition rules make sense and that no rule Proof. Consider a call to a virtual function f on a subobject (ℓ,p). is missing, we prove a “progress” property of construction and Since this call does not go wrong, the generalized dynamic type destruction: once construction of a complete object starts, it always po of this subobject is defined. By definition of generalized dy- eventually reaches a point where the object is fully constructed, namic types, (ℓ,po) is in state BasesConstructed or Constructed without getting stuck in the middle; and likewise for destruction. or StartedDestructing. Invariant INV then guarantees that all This result is false in general: since constructors, initializers and base-class subobjects of (ℓ,po) are Constructed (from Lemma 1). destructors can perform arbitrary computation, they can get stuck The final overrider for function f being either (ℓ,po) or one of its on e.g. an attempt to assign an unconstructed scalar field. However, base-class subobjects, the result follows. we can prove the expected result if we restrict ourselves to nearly trivial constructors and trivial destructors. In other words, a virtual function can always safely assume We say that a class C has a nearly trivial constructor if (1) C that bases have been properly constructed. It cannot, however, has a default constructor with no arguments and return as its body; assume that fields of its defining class are in the constructed state, (2) initializers for bases and fields just call the default constructors, since initializers for these fields can call the virtual function. As without any other computations; (3) scalar fields are initialized with discussed in §2 and by Qi and Myers [11], Java does not provide constants; (4) for each structure array field f of C, if f has type a similar guarantee; through inheritance and method overriding, it B, then B has a nearly trivial constructor. This notion extends is possible for a method to be invoked before its super classes have the concept of trivial constructor from the C++ standard, e.g. by completed their initialization. allowing virtual bases and virtual functions. 4.4 Evolution of construction states Theorem 3 (Construction progress). Let (P, K, G) be an execution state where P = Codepoint({C c[n]; st} = ι,...), i.e. we are Lemma 6. If s → s′ is a transition step of our small-step op- about to execute a block statement. Assume that C is a class having erational semantics, and if the construction state of the subobject a nearly trivial constructor and that the initializer list ι calls the (ℓ,p) is c in s and c′ 6= c in s′, then c′ = S(c) and any other default constructor for every array cell. Then, there exists a state subobject (ℓ′,p′) 6=(ℓ,p) keeps its construction state unchanged. (P′, K′, G′) such that Proof. By case analysis on the transition rules, with the help of ∗ ′ ′ ′ • (P, K, G) → (P , K , G ) invariant INV . • P′ = Codepoint(st,...), i.e. construction has terminated and the block body is about to be executed; Corollary 7. Any subobject is never constructed or destructed ′ • in state G , all cells of the array c are in the Constructed state. more than once. In particular, any virtual base subobject is constructed or de- Proof. At state s2, p1 is Constructed but p2 is not. Hence, the structed at most once, despite being potentially reachable through lifetime of p1 cannot be included in that of p2. By theorem 9, the several inheritance paths. Also, if an object goes from one construc- lifetime of p2 is therefore included in that of p1. Since p1 is no tion state to another, then it must go through all construction states longer Constructed at state s5, so is p2 at state s5. Since at most in between: one subobject changes construction state during a given transition ∗ (Lemma 6), p 6= p and p is no longer Constructed at state s . Theorem 8 (Intermediate values theorem). If s s′, then, for any 1 2 2 4 → The result then follows from the the intermediate values theorem subobject ℓ,p and for any construction state c such that: ( ) (Theorem 8). ConstrStates(ℓ,p) ≤ c< S(c) ≤ ConstrStates′ (ℓ,p) Subobjects of different complete objects In the full C++ lan- ∗ ∗ ′ there exist “changing states” s1,s2 such that s → s1 → s2 → s guage, including dynamic allocation, the lifetimes of two subob- and ConstrStates1 (ℓ,p)= c and ConstrStates2 (ℓ,p)= S(c). jects of different complete objects are, in general, unrelated: operations can be interleaved arbitrarily. In our core (Here, for a state s =(P, K, G), we write ConstrStates(π) for G.ConstrState(π).) language, complete objects can only be created by block statements and therefore follow a stack discipline. 4.5 Object lifetimes Theorem 14. The lifetimes of two subobjects (ℓ1,p1) and (ℓ2,p2) An important design principle of C++ is that destruction is per- of different complete objects ℓ1 6= ℓ2 are either disjoint or included formed in the exact reverse order of construction. We now for- in one another. malize and prove this property, along with more general properties This result follows from Theorem 9 and the stronger property about the relative lifetimes of two subobjects. below, which shows that throughout the execution of a block, the Two subobjects of the same complete object Let ℓ be a complete construction states of subobjects of already-allocated complete ob- object of type C[n]. jects do not change: ∗ Theorem 9. If p1 and p2 are two subobjects of the complete Lemma 15. Let s1 → s2 → s3 → s4 be the execution of a object ℓ, either the lifetime of p1 is included in that of p2, or the block: the transition s1 → s2 enters the block and allocates a lifetime of p2 is included in that of p1. fresh complete object ℓ; the transition s3 → s4 exits this block. For ′ all complete objects ℓ already allocated in state s1, the allocation To prove this theorem, we need to characterize the construction ′ ′ states of its subobjects (ℓ ,p ) are identical in s1 and s4. order between subobjects. We write p1 C[n] p2 to say that p1 occurs before p2 in the depth-first, left-to-right traversal of the 4.6 RAII: Resource Acquisition Is Initialization construction tree in Figure 1. (See [12, chapter 10] for a formal definition in terms of the “direct subobject” and “occurs before” RAII is a programming discipline where precious program re- relations of §4.1.) The two crucial properties of this construction sources (such as file descriptors) are systematically encapsulated order are the following: in classes, all acquisitions of such resources are performed within constructors of the corresponding class, and all releases of such Lemma 10 (The construction order is total). If C[n] −hp1i→ B1 resources are performed within destructors of the corresponding and C[n] −hp2i→ B2, then either p1 C[n] p2 or p2 C[n] p1. class. We cannot prove a general result guaranteeing the proper encapsulation of resources in classes: this is a matter of program Lemma 11 (Construction order and lifetimes). For any reach- verification. We can, however, prove that in a terminating program able execution state s and any complete object ℓ of type C[n], every construction of a subobject is correctly matched by a destruc- if p  p and ConstrStates(ℓ,p ) = Constructed, then 1 C[n] 2 2 tion. ConstrStates(ℓ,p1) = Constructed. In other words, if p1 C[n] p2, then the lifetime of p2 is included in the lifetime of p1. Theorem 16 (RAII). Consider a partial program execution ∗ Theorem 9 then follows directly from the two lemmas above. sinit → s1 → s2 where the transition s1 → s2 marks the end of a block statement that allocated the complete object ℓ. Using Lemma 11 and the definition of C[n], we have the follow- ing stronger special case: Then, between the initial state sinit and s1, the following events occurred, in this order: Theorem 12. If p is a subobject of p , then the lifetime of p is 2 1 1 • Every subobject of ℓ was constructed exactly once. included in that of p2. • Every subobject of ℓ was destructed exactly once, in reverse As a corollary of Theorem 9, it follows that if p1 is constructed order of construction. before p2, then p2 is destructed before p1. Theorem 13 (Destruction in reverse order of construction). Let ℓ Proof. The invariant INV implies that, at state s1, all cells of the structure array ℓ are Destructed, and so are all of their subobjects. be a complete object of type C[n] and p1,p2 be two subobjects of ℓ. Consider a subobject (ℓ,p). Its state is Unconstructed in the initial Consider the reduction sequence below where p1 is constructed before p , then later p is destructed: state and Destructed in s1. Therefore, by the intermediate value 2 1 theorem (Theorem 8), it must have entered state Constructed at ∗ ∗ s0 → s1 → s2 → s3 → s4 → s5 some point, then left state Constructed at some later point, mean- (¬c1) (c1) (¬c2) (c2) (c1) (¬c1) ing that (ℓ,p) has been initialized once, then destructed once. The claim on construction and destruction order follows from Theo- (We write c below a state s to denote that p is Constructed in i i rem 13. state s, and ¬ci to mean that pi is not Constructed.) Then, there ex- ′ ′ ist intermediate states s3,s4 such that p2 stops being Constructed 4.7 Generalized dynamic types between these two states: ∗ ∗ ′ ′ ∗ We finish this section with interesting properties of generalized s0 → s1 → s2 → s3 → s3 → s4 → s4 → s5 dynamic types, which were introduced in §3.4 to give semantics (¬c1) (c1) (¬c2) (c2) (c2) (¬c2) (c1) (¬c1) to virtual function calls and dynamic casts. A B1 B2 C B2 B1 A 5. Impact on the C++ language specification A: BC CS SD DB The development of the formal semantics reported in this paper B1 C B1 uncovered several issues with the C++03 standard. They were re- B1: ported to the ISO C++ committee. Some of them were fixed in time BC CS SD DB B2 C B2 for the C++11 standards; others will be addressed in future revi- sions. B2: BC CS SD DB C Virtual functions calls during object construction and destruction C: The formulation of ISO C++03 [6] of the behavior of the abstract StartedConstructing BC CS SD DB Destructed machine when a virtual function is called during construction (or destruction) was unclear and did not clearly support the original BC = BasesConstructed; CS = Constructed; SD = StartedDestructing; DB = DestructingBases intent (correctly modeled in this paper). This was reported to the ISO C++ committee as CWG issue number 1202. The formulation Figure 2. Evolution of the dynamic types of an instance of C was clarified in time for adoption in C++11. and of its subobjects in the example of §4.7. Dynamic types are Conflicting description of end of object lifetime Although the undefined at points where the “signal” is low, and defined and language formally defined in this paper does not allow explicit equal to the indicated type when the “signal” is high. Thick vertical management of object lifetime, coming up with a simple, coherent, transitions denote points where a compiled implementation must unified, and faithful description of object lifetime led us to discover update pointers to v-tables. conflicts and unintended semantics in the ISO C++ documents (both C++03 and C++11.) The C++11 standard [7] has a conflicting description of the effect of calling a destructor. Two paragraphs Theorem 17. The generalized dynamic type of a subobject, if (3.8p1 and 3.8p4) claim that an object’s lifetime ends when a defined, is unique. call to a non-trivial destructor occurs or its storage is reused or Proof. The result is trivial if the most-derived object is released. However, another paragraph (14.2p4) claims that once a Constructed, since it is then equal to the generalized dynamic destructor is invoked (its triviality notwithstanding), “the object no type. Otherwise, we observe that the most-derived object can have longer exists”. This issue will be resolved after C++11 is published. at most one base class subobject in state BasesConstructed or We expect that for consistency, the resolution will not consider StartedDestructing (as a consequence of invariant INV ), and “triviality” of the destructor to determine when an object’s lifetime conclude. ends. Effect of calling destructor for builtin types ++ A subtle aspect of generalized dynamic types is that they do C allows an ex- p->˜T() T p not continuously exist: as the construction or destruction of a most- pression of the form where is a non-class type name and T derived object proceeds, the generalized dynamic type of one of is an expression that points to a object. This form was introduced its base-class subobjects alternates between defined and undefined. to support function templates explicitly managing object lifetime Consider the following program: without forcing the author to distinguish between builtin types and class types. It was also intended to bring uniformity. However, the structA {virtual voidf();}; formulation of the behavior of the abstract machine appears to indi- struct B1: virtual A {}; cate that the following program fragment has a well-defined mean- struct B2: virtual A {virtual void f ();}; ing, for any scalar type T. struct C: B1, B2 {} T f(T x) { Figure 2 shows the evolution of the dynamic type of an instance of T t = x; C and of its subobjects while this instance undergoes construction t.˜T(); then destruction. For example, during the construction of base return t; B2, the subobject B1 is already Constructed, but its generalized } dynamic type is undefined (the constructor of B1 has returned but C This is in a clear conflict with the case where T is a class type. is not yet constructed), and calling f on B1 has undefined behavior. Indeed, for every class type T, that function leads to an undefined Theorem 18. Let (ℓ, (α,i,σ)) be a subobject and (ℓ, (α,i,σ◦)) behavior, as it attempts to destroy the t twice: once be its most derived object (i.e. σ◦ =(Repeated,C :: ǫ)). Consider explicitly through the call to the destructor and a second time im- a transition s → s′ where the construction state of (ℓ, (α,i,σ)) plicitly when the function exits. Finally, it appears that the program changes. Then, the generalized dynamic types for all subobjects in is ill-formed (and a diagnostic is required) when T is an array type. the program evolve as shown in Table 1. Lifetime of array objects The C++ standards explicitly indicate In a compiled implementation, dynamic types are materialized that the lifetime of an array object starts as soon as storage for it as extra fields in the in-memory representations of objects, these has been allocated, regardless of when the lifetime of its elements fields containing (in general) pointers to v-tables. When the gen- starts. In general, a complete object’s lifetime starts after all its eralized dynamic type of a subobject is undefined, its dynamic subobjects have been constructed. This irregularity appears to be type field can contain any value. However, when the generalized a hand-over from C in its formulation. It will be reconsidered after dynamic type is defined, the dynamic type field must contain a publication of C++11 along with a more unified treatment of the pointer to the corresponding v-table. Theorem 18, therefore, pin- lifetime of objects of builtin types. points exactly the program points where the compiled code must update dynamic type fields: when all bases of a subobject are con- structed, and just before the construction of fields begins, dynamic 6. Application to verified compilation type fields must be updated for the subobject in question and all To strengthen confidence in the semantics presented here and stress of its bases; likewise, when the subobject undergoes destruction, at its usability, the first author developed and proved correct a simple the point where the destructor is entered. compiler that translates the core language presented in this paper When the subobject goes from to then the gen. dynamic type of goes from to (ℓ, (α,i,σ)) Unconstructed StartedConstructing (ℓ, (α,i,σ′)) Undef. Undef. (ℓ, (α,i,σ@σ′′)) Undef. σ (ℓ, (α,i,σ)) StartedConstructing BasesConstructed (ℓ, (α,i,σ′)) not a base of σ Undef. Undef. (ℓ, (α,i,σ@σ′′)) σ Undef. (ℓ, (α,i,σ)) with σ =6 σ◦ BasesConstructed Constructed (ℓ, (α,i,σ′)) not a base of σ Undef. Undef. ′ (ℓ, (α,i,σ◦)) BasesConstructed Constructed (ℓ, (α,i,σ )) σ◦ σ◦ ′ (ℓ, (α,i,σ◦)) Constructed StartedDestructing (ℓ, (α,i,σ )) σ◦ σ◦ (ℓ, (α,i,σ@σ′′)) Undef. σ (ℓ, (α,i,σ)) with σ =6 σ◦ Constructed StartedDestructing (ℓ, (α,i,σ′)) not a base of σ Undef. Undef. (ℓ, (α,i,σ@σ′′)) σ Undef. (ℓ, α, i, σ) StartedDestructing DestructingBases (ℓ, (α,i,σ′)) not a base of σ Undef. Undef. (ℓ, (α,i,σ)) DestructingBases Destructed (ℓ, (α,i,σ′)) Undef. Undef. (ℓ′, (α′,i′,σ′)) with (ℓ, (α,i,σ)) Any Any Does not change (ℓ, α, i) =(6 ℓ′,α′,i′)

Table 1. Evolutions of generalized dynamic types when the state of a subobject (ℓ, (α,i,σ)), of most-derived object σ◦, changes. to a simple, non-object-oriented intermediate language similar to struct D : B { CompCert’s Cminor [8]. This verification is described in [12, chap- virtual int f() { return 42; } ter 11]. The compiler proceeds in two passes, using as intermediate }; language the CoreC++ language of Wasserrab et al. [18] extended int main () { D d; return d.b->f(); }; with a setDynType operation that explicitly modifies the dynamic In B’s constructor, the initialization b(this) saves in b a pointer type of an object. whose dynamic type is B (correctly, at that time). However, after The first translation pass is broadly similar to that outlined in completion of D’s constructor, the dynamic type of *b is not updated Wasserrab’s thesis [17], turning constructors, initializers and de- to D, causing the wrong f virtual function to be dispatched in main. structors into non-virtual function calls. However, setDynType op- The CoreC++ semantics of Wasserrab, Nipkow, Snelting and erations are inserted at the program points characterized by Theo- Tip [18] is a major starting point for our work. It does not cover ob- rem 18 to implement the proper semantics for virtual function calls ject construction and destruction. Wasserrab’s Ph.D. thesis [17] de- and dynamic casts. Following a popular optimization, two versions scribes this semantics and its Isabelle/HOL formalization in greater of every constructor are generated, one for the “most derived” case, details, as well as a static, unverified translation of construction and the other for the “inheritance subobject” case. The second transla- destruction into non-virtual function calls. This translation fails to tion pass extends our earlier work on object layout [13]. correctly implement C++’s semantics for virtual function calls: in The proofs of semantic preservation are standard arguments by the translated program, the dynamic type of a subobject is always forward simulation diagrams [8, 3.7]. The proofs are rather big § its most derived object, regardless of construction state. (about 8000 lines of Coq for each pass) because there are many We are not aware of any other formal semantics for C++ that cases to consider, along with complex invariants relating execution addresses construction and destruction. states, but present no major conceptual difficulties. Type systems for safe object initialization Several research 7. Related work projects develop type systems for object-oriented languages that enforce safety guarantees about object initialization, such as the Formal semantics for C++ Norrish [10] describes a formal se- fact that well-typed programs never read fields of an uninitialized mantics for C++ that was mechanized in HOL. His semantics de- object. Fahndrich¨ and Xia [3] introduce a type system to remove scribes a much larger subset of C++ than ours, including in partic- useless field initializations to null while still ensuring safe object ular expressions with side effects and partially-specified evaluation initialization. Qi and Myers [11] introduce a more general type order, exceptions, free store (new and delete), and temporary ob- system to precisely and statically determine, at each program jects. Construction and destruction are modeled by “dynamic trans- point, which fields may be read or not. Hubert et al. [5] formalize lation”: the reduction rule for the construction of an object produces a type system for safe Java object initialization using the Coq “on the fly” a statement containing the necessary invocations of proof assistant. Their type system shares with our operational initializers and constructors for fields and bases; this statement is, semantics the use of construction states for objects, but in their then, reduced normally. As a way to state the semantics, Norrish’s case, construction states are lifted to the type level and maintained approach is arguably simpler than our “small-stepping” of the con- at compile-time. This enables their system to statically check struction/destruction protocol. However, we suspect that Norrish’s contracts over methods that constrain the construction states of approach would make it more difficult to prove meta-properties some of the arguments, for instance. about construction states and lifetimes, in the style of §4. Addition- Our work makes no attempt at providing static typing safety ally, we suspect that Norrish’s semantics does not correctly capture properties beyond the few offered by C++. Instead, we aim at the interaction between construction, destruction, and virtual func- a precise description of the dynamic semantics of construction tion calls. Consider: and destruction in the presence of multiple inheritance. The type struct B { systems mentioned above are based on the Java and C# single- B* b; inheritance object models. Moreover, they only deal with object virtual int f() { return 18; } initialization, without object destruction or finalization. Indeed, B() : b(this) { } in Java, object finalization is weakly specified: although the Java }; language specification [4] requires objects to be finalized, it does not describe more precisely when object finalization should occur. traversal which could perhaps be generalized and abstracted over. C# offers a destruction mechanism, namely object disposal, but it Finally, it would be interesting to exploit this semantics in the con- must be called explicitly by the programmer. text of type systems and other static analyses that verify stronger safety properties about initialization. 8. Extending the semantics Our semantics presents the core features of C++ object construc- References tion and destruction as faithfully as possible towards the Standard. [1] The Coq proof assistant, 1999–2012. URL http://coq.inria.fr. However, it can be extended in a number of directions. [2] M. A. Ellis and B. Stroustrup. The Annotated C++ Reference Manual. Addison-Wesley, 1990. Manual We anticipate no difficulties with supporting free store objects, i.e. the new and delete operators. [3] M. Fahndrich¨ and S. Xia. Establishing object invariants with delayed Only slight modifications to the operational semantics appear nec- types. In 22nd conf. on Object-Oriented Programming Systems and Applications (OOPSLA’07), pages 337–350. ACM, 2007. essary. Of course, Theorem 14 would be invalidated, since objects no longer follow a stack discipline. However, it appears very dif- [4] J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language ficult to formalize more general , al- Specification. Addison Wesley, 3rd edition edition, 2005. lowing for instance explicit destructor calls and the use of place- [5] L. Hubert, T. Jensen, V. Monfort, and D. Pichardie. Enforcing secure ment operators such as “new(p) C” to construct an object at a given object initialization in Java. In Computer Security – ESORICS 2010, memory location. volume 6345 of Lecture Notes in Computer Science, pages 101–115. Springer, 2010. Temporary objects of class types Expression evaluation, passing [6] International Standard ISO/IEC 14882:2003. Programming Lan- arguments by value to functions, or returning results by value en- guages — C++. International Organization for Standards, 2003. tail construction and destruction of temporary objects. While the [7] International Standard ISO/IEC 14882:2011. Programming Lan- lifetime of these objects are well-defined by the C++ standard guages — C++. International Organization for Standards, 2011. (and they follow a first-constructed-last-destroyed discipline), their [8] X. Leroy. A formally verified compiler back-end. Journal of Auto- storage durations are not specified. In fact, we found inconsisten- mated Reasoning, 43(4):363–446, 2009. cies between the definitions of storage duration as specified by the [9] Lockheed Martin. Joint Strike Fighter Air Vehicle C++ Coding Stan- C++11 standards and the C99 standards. Accounting for temporary dards for the System Development and Demonstration Program, 2005. objects beyond scalar values returned by functions or passed as ar- URL http://www.research.att.com/˜bs/JSF-AV-rules.pdf. guments will require nontrivial extensions to our semantics. [10] M. Norrish. A formal semantics for C++. Technical report, NICTA, Copy constructor elision More challenging is to give semantics 2008. to functions that return values of class types. They are objects [11] X. Qi and A. C. Myers. Masked types for sound object initialization. In constructed in the callee but destroyed in the caller at the end of the 36th symp. Principles of Programming Languages (POPL’09), pages full expression containing the call. For decades, C++ has allowed 53–65. ACM, 2009. elision of copy constructors (even if they have observable behavior) [12] T. Ramananandro. Mechanized Formal Semantics and Verified Com- that would normally copy a local object to the (temporary) return pilation for C++ Objects. PhD thesis, Universite´ Paris Diderot, Jan. value, provided certain non-aliasing conditions (easy to check) are 2012. met. These program transformations are necessary in practice to [13] T. Ramananandro, G. Dos Reis, and X. Leroy. Formal verification of obtain good performance but are not semantics-preserving in the object layout for C++ multiple inheritance. In 38th symp. Principles traditional sense. The same caveat applies for the C++11 notion of of Programming Languages (POPL’11), pages 67–80. ACM, 2011. “move constructors”. [14] J. G. Rossie and D. P. Friedman. An algebraic semantics of subobjects. In 10th conf. on Object-Oriented Programming, Systems, Languages, Exceptions It is easy to capture a key aspect of C++ exceptions in and Applications (OOPSLA’95), pages 187–199. ACM, 1995. our semantics: block-scoped objects are properly destroyed when [15] B. Stroustrup. Classes: An abstract data type facility for the C lan- an exception is thrown. Exception objects are temporary objects guage. SIGPLAN Not., 17:42–51, January 1982. with lifetime dynamically controlled by exception handlers acting [16] B. Stroustrup. The design and evolution of C++. ACM Press/Addison- much like function invocations with the exception object as argu- Wesley Publishing Co., New York, NY, USA, 1994. ment. C++ allows catching exceptions by value, so our observations for function call argument by value apply here. Furthermore, the se- [17] D. Wasserrab. From Formal Semantics to Verified Slicing – A Modular Framework with Applications in Language Based Security. PhD mantics of constructors need to be altered to invoke destructors for thesis, Karlsruher Institut fur¨ Technologie, Fakultat¨ fur¨ Informatik, completely constructed subobjects in presence of exception. Sim- Oct. 2010. ilarly, the semantics for destructors should be altered to include [18] D. Wasserrab, T. Nipkow, G. Snelting, and F. Tip. An operational program abortion if a subobject destructor raises an exception. Fi- semantics and type safety proof for multiple inheritance in C++. In nally, interaction between construction of dynamic objects and ex- 21st conf. on Object-Oriented Programming, Systems, Languages, and ceptions (both of which reflect related design decisions) should be Applications (OOPSLA’06), pages 345–362. ACM, 2006. investigated. These features are at the basis of generalized RAII techniques for dynamically controlled resources; popular examples include smart pointers such as unique_ptr and smart_ptr.

9. Concluding remarks We hope that this work sheds light on the precise semantics of con- struction and destruction in C++ and what they actually guaran- tee to the working programmer. Several features remain to be ad- dressed, but the subset that was formalized is already quite realistic and similar to recommended subsets for critical embedded systems [9]. The semantics implements a pattern for “small-stepping” a tree