An Operational Semantics and Type Safety Proof for Multiple Inheritance in C++

An Operational Semantics and Type Safety Proof for Multiple Inheritance in C++ Daniel Wasserrab Tobias Nipkow Gregor Snelting Frank Tip Universitat¨ Passau Technische Universitat¨ Universitat¨ Passau IBM T.J. Watson Research [email protected] Munchen¨ [email protected] Center passau.de [email protected] passau.de [email protected] Abstract terms of implementation-level constructs such as v-tables. We are We present an operational semantics and type safety proof for only aware of a few formal treatments—and of no operational multiple inheritance in C++. The semantics models the behavior semantics—for C++-like languages with shared and repeated mul- of method calls, field accesses, and two forms of casts in C++ tiple inheritance. The subobject model by Rossie and Friedman class hierarchies exactly, and the type safety proof was formalized [20], upon which our work is based, formalizes the object model of and machine-checked in Isabelle/HOL. Our semantics enables one, C++. Rossie and Friedman defined the behavior of method calls and for the first time, to understand the behavior of operations on C++ member access using this model, but their definitions do not follow class hierarchies without referring to implementation-level artifacts C++ behavior precisely, they do not consider the behavior of casts, such as virtual function tables. Moreover, it can—as the semantics and they do not provide an operational semantics. In 1996, Rossie, is executable—act as a reference for compilers, and it can form Friedman, and Wand [21] stated that “In fact, a provably-safe static the basis for more advanced correctness proofs of, e.g., automated type system [. ] is an open problem”, and to our knowledge this program transformations. The paper presents the semantics and problem has remained open until today. type safety proof, and a discussion of the many subtleties that we The CoreC++ language studied in this paper features all the encountered in modeling the intricate multiple inheritance model essential elements of the C++ multiple inheritance model (while of C++. omitting many features not relevant to operations involving class hierarchies). The semantics of CoreC++ were designed to mirror those of C++ to the maximum extent possible. In previous versions 1. Introduction of the semantics [36], we explored a number of variations, and we We present a operational semantics and type safety proof for the will briefly discuss these in §8. multiple inheritance model of C++ in all its complexity, includ- Our interest in formalizing the semantics of multiple inheritance ing both repeated and shared (virtual) inheritance. This semantics was motivated by previous work by two of the present authors on: enables one—for the first time!—to fully understand and express (i) restructuring class hierarchies in order to reduce object size at the behavior of operations such as method calls, field accesses, and run-time [33], (ii) composition of class hierarchies in the context of casts in C++ programs without referring to compiler data structures an approach for aspect-orientation [24], and (iii) refactoring class such as virtual function tables (v-tables) [27]. hierarchies in order to improve their design [25, 23], In each of Type safety is a language property which can be summarized by these projects, class hierarchies are generated, multiple inheritance the famous slogan “Well-typed programs cannot go wrong” [13]. may arise naturally, and additional program transformations are Cardelli’s definition of type safety [6] demands that no untrapped then used to replace multiple inheritance by a combination of single errors may occur (although controlled exceptions are allowed). The inheritance and delegation. We plan to exploit our formal semantics type safety property that we prove is the fact that the execution of for a correctness proof of these transformations. a well-typed, terminating program will deliver a result of the ex- In summary, this paper makes the following contributions: pected type, or end with an exception. The semantics and proof are formalized and machine-checked using the Isabelle/HOL theorem prover [14]. • We present a formal semantics and machine-checked type One of the main sources of complexity in C++ is a complex safety proof for multiple inheritance in C++. This enables one, form of multiple inheritance, in which a combination of shared for the first time, to understand and express the behavior of (“virtual”) and repeated (“nonvirtual”) inheritance is permitted. Be- operations involving C++ class hierarchies without referring to cause of this complexity, the behavior of operations on C++ class compiler data structures. hierarchies has traditionally been defined informally [28], and in • We discuss some subtle ambiguities concerning the behavior of member access and method calls in C++ that were uncovered in the course of designing the semantics. • By formalizing the complex behavior of C++ multiple inheritance, we extend the applicability of formal semantics and the- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed orem prover technology to a new level of complexity. for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Thus the message to language semanticists is that the much Copyright c ACM [to be supplied]. $5.00. maligned C++ system of multiple inheritance contains a perfectly sound core. 1 2. Multiple inheritance defining a subobject type as a pair (C, D) would be insufficient, 2.1 An intuitive introduction to subobjects because, as we have seen in Fig. 1, a C-object may contain multiple D-components in the presence of repeated multiple inheritance. C++ features both nonvirtual (or repeated) and virtual (or shared) Therefore, a subobject is identified by a pair [C, Cs], where C de- multiple inheritance. The difference between the two flavors of notes the type of the “complete object”, and where the path Cs inheritance is subtle, and only arises in situations where a class consists of a sequence of class names C1 ····· Cn that encodes Y indirectly inherits from the same class X via more than one the transitive inheritance relation between C1 and Cn. There are path in the hierarchy. In such cases, Y will contain one or multiple two cases here: For repeated subobjects we have that C1 = C, and X-“subobjects”, depending on the kind of inheritance that is used. for shared subobjects, we have that C1 is the least derived (most More precisely, if only shared inheritance is used, Y will contain general) shared base class of C that contains Cn. This scheme is a single, shared X-subobject, and if only repeated inheritance is sufficient because shared subobjects are unique within an object used, the number of X-subobjects in Y is equal to N, where N (i.e. there can be at most one shared subobject of type S within is the number of distinct paths from X to Y in the hierarchy. any object). More formally, for a given class C, the set of its sub- If a combination of shared and repeated inheritance is used, the objects, along with a containment ordering on these subobjects, is number of X-subobjects in an Y -object will be between 1 and inductively defined as follows: N (a more precise discussion follows). C++ hierarchies with only single inheritance (the distinction between repeated and shared (i) [C, C] is the subobject that represents the “full” C-object. inheritance is irrelevant in this case) are semantically equivalent (ii) if S1 = [C, Cs.X] is a subobject for class C where Cs is any to Java class hierarchies. sequence of class names, and X shared-inherits from Y , then Fig. 1(a) shows a small C++ class hierarchy. In these and subse- S2 = [C, Y ] is a subobject for class C that is accessible from quent figures, a solid arrow from class C to class D denotes the fact S1 through a pointer. that D repeated-inherits from C, and a dashed arrow from class C (iii) if S1 = [C, Cs.X] is a subobject for class C where Cs is any to class D denotes the fact that D shared-inherits from C. Here, sequence of class names, and X repeated-inherits from Y , then and in subsequent examples, all methods are assumed to be vir- S2 = [C, Cs.X.Y ] is a subobject for class C that is directly tual (i.e. dynamically dispatched), and all classes and inheritance contained within subobject S1. relations are assumed to be public. In Fig. 1(a), all inheritance is repeated. Since class Bottom Fig. 1(c) and Fig. 2(c) show subobject graphs for the class hi- repeated-inherits from classes Left and Right, a Bottom- erarchies of Fig. 1 and Fig. 2, respectively. Here, an arrow from 0 0 object has one subobject of each of the types Left and Right. As subobject S to subobject S indicates that S is directly contained 0 Left and Right each repeated-inherit from Top, (sub)objects of in S or that S has a pointer leading to S . For a given subobject these types contain distinct subobjects of type Top. Hence, for the S = [C, Cs.D], we call C the dynamic class of subobject S and D C++ hierarchy of Fig. 1(a), an object of type Bottom contains two the static class of subobject S. Associated with each subobject are distinct subobjects of type Top. Fig. 1(b) shows the layout used the members that occur in its static class.

Load more