Classes = Objects + Data Abstraction

Classes = Ob jects + Data Abstraction y Kathleen Fisher and John C. Mitchell Computer Science Department, Stanford University, Stanford, CA 94305 fkfisher,[email protected] January 11, 1996 Abstract We describ e a typ e-theoretic foundation for ob ject systems that include \interface typ es" and ++ \implementation typ es," in the pro cess accounting for access controls suchasC private, protected and public levels of visibility. Our approach b egins with a basic ob ject calculus that provides a notion of ob ject, metho d lo okup, and ob ject extension an ob ject-based form of inheritance. In this calculus, the typ e of an ob ject gives an interface, as a set of metho ds public memb er functions and their typ es, but do es not imply any implementation prop erties such as the presence or layout of any hidden internal data. We extend the core ob ject calculus with a higher-order form of data abstraction mechanism that allows us to declare sup ertyp es of an abstract typ e and a list of metho ds guaranteed not to b e present. This results in a exible framework for studying and improving practical programming languages where the typ e of an ob ject gives certain implementation guarantees, suchaswould b e needed to statically determine the o set of a function in a metho d lo okup table or safely implement binary op erations without exp osing the internal representation of ob jects. We provetyp e soundness for the entire language using op erational semantics and an analysis of typing derivations. Two insights that are immediate consequences of our analysis are the identi cation of an anomaly asso ciated with ++ C private virtual functions and a principled, typ e-theoretic explanation for the rst time, ++ as far as we know of the link b etween subtyping and inheritance in C , Ei el and related languages. 1 Intro duction In theoretical studies of ob ject systems, suchas[AC94b, Bru93, FHM94, PT94] and the earlier pap ers app earing in [GM94], typ es are viewed as interfaces to ob jects. This means that the typ e of an ob ject lists the op erations on the ob ject, generally as metho d names and return typ es, but do es not restrict its implementation. As a result, ob jects of the same typ e mayhave arbitrarily di erentinternal representations. In contrast, the typ e of an ob ject in common practical ob ject- ++ oriented languages such as Ei el [Mey92] and C [Str86, ES90] may imp ose some implementation constraints. In particular, although the \private" internal data of an ob ject is not accessible outside the memb er functions of the class, all ob jects of the same class must have all of the private internal data listed in the class declaration. In this pap er, we presentatyp e-theoretic framework that incorp orates b oth forms of typ e. We explain the basic principles by extending a core ob ject calculus with a standard higher-order data abstraction mechanism as in [MP88, CW85]. We also Supp orted in part byaFannie and John Hertz Foundation Fellowship and NSF Grant CCR-9303099. y Supp orted in part by NSF Grant CCR-9303099 and the TRWFoundation. 1 discuss a sp ecial-purp ose syntax that is closer to common practice and that eliminates a few minor syntactic inconveniences in our sp eci c use of standard abstract data typ e declarations. From a programming p oint of view, \interface" typ es are often more exible than typ es that constrain the implementation of ob jects. With this form of typ e, we could de ne a single typ e of matrix ob jects, for example, then represent dense matrices with one form of ob ject and sparse matrices with another. If the typ e only gives the interface of an ob ject, then b oth matrix representations could have the same typ e and therefore b e used interchangeably in any program. This kind of exibili ty is particularly useful when we write library op erations on matrices without assuming any particular implementation. Such library functions may b e written using a standard interface typ e, without concern for how matrices might b e implemented in later or earlier developmentof a software system. Typ es that restrict the implementations of ob jects are also imp ortant. If we know that all p oint ob jects inherit a sp eci c representation of x and y co ordinates, for example, then a program ++ may b e optimized to take advantage of this static guarantee. The usual implementations of C , for example, use typ e information to statically calculate the o set of memb er data relativetothe starting address of the ob ject. A similar calculation is used to nd the o set of virtual member functions in the v-table at compile time; see [ES90, Section 10.7c]. Such optimizations are not p ossible in an untyp ed language such as Smalltalk [GR83] and would not b e p ossible in a typ ed language where ob jects of a single typ e could have arbitrarily dissimilar implementations. A second, more metho dological reason that programmers maybeinterested in implementation typ es is that there are greater guarantees of b ehavioral similarity across subtyp e hierarchies. More sp eci cally, traditional typ e systems generally give useful information ab out the signature or do- main and range of op erations. This is a very weak form of sp eci cation and, in many programming situations, it is desirable to have more detailed guarantees. While b ehavioral sp eci cations are dif- cult to manipulate e ectively,wehave a crude but useful approximation when typ es x part of the implementation of an ob ject. To return to p oints, for example, if we know that all subtyp es of p oint share a common implementation of a basic function like move , then the typ e system, in e ect, guarantees a b ehavioral prop ertyofpoints. This maybeachieved in our framework if move is private or if we add the straightforward capability of restricting rede nition of protected or public metho ds. A more subtle reason to use typ es that restrict the implementations of ob jects has to do with the implementation of binary op erations. In an ob ject-oriented context, a binary op eration on typ e A is realized as a memb er function that requires another A ob ject as a parameter. In a language where all ob jects of typ e A share some common representation, it is p ossible for an A member function to safely access part of the private internal representation of another A ob ject. A simple example of this arises with set ob jects that have only a memb ership test and a union op eration in their public interfaces. With interface typ es, some ob jects of typ e set might b e represented internally using bit-vectors, while others might use linked lists. In this case, there is no typ e-safe way to implement union, since no single op eration will access b oth a bit-vector and a linked list correctly. With only interface typ es, it is necessary to extend the public interface of b oth kinds of sets to make this op eration p ossible. In contrast, if the typ e of an ob ject conveys implementation information, then a less exible but typ e-safe implementation of set union is p ossible. In this case, all set ob jects would have one representation and a union op eration could b e implemented by taking advantage of this uniformity. This pap er presents a provably sound typ e system, based on accepted typ e-theoretic principles, that allows us to write b oth \interface typ es" and \implementation typ es". The system is relatively simple in outline, since it may b e viewed as a straightforward combination of basic constructs that 2 have b een studied previously.However, there are a numb er of details involving subtyp e assertions ab out abstract typ es, extensions to abstract typ es, covariance and contravariance of metho ds, and the absence of metho ds that make the exact details of the system relatively subtle. In summary, the pap er has three main p oints: i the general view that classes corresp ond to abstract data typ es, ii a precise formulation of a higher-order abstract typ e mechanism and a exible underlying ob ject ++ calculus that together make it p ossible to establish a corresp ondence b etween C -style classes and abstract data typ es whose representations are ob jects, iii a pro of of typ e soundness for the typ e ++ system that arises from this analysis. While there is a folkloric b elief that C and Ei el classes provide a form of data abstraction, we b elieve that this is the rst technical account suggesting a precise corresp ondence b etween class constructs and a standard non-ob ject-oriented form of data abstraction. Our presentation of classes as abstract data typ es requires a numb er of op erations on ob jects. Sp eci cally, the underlying ob ject calculus without data abstraction must provide a basic form of ob ject that allow ustoinvoke metho ds of an ob ject, extend ob jects with new metho ds, and replace existing metho ds. Moreover, in order to capture the form of subtyping presentintyp ed ob ject- oriented languages, wemust have at least the usual form of subtyping b etween ob ject typ es.

Load more