Classes = Ob jects + Data Abstraction
y
Kathleen Fisher and John C. Mitchell
Computer Science Department, Stanford University, Stanford, CA 94305
fkfisher,[email protected]
January 11, 1996
Abstract
We describ e a typ e-theoretic foundation for ob ject systems that include \interface typ es" and
++
\implementation typ es," in the pro cess accounting for access controls suchasC private,
protected and public levels of visibility. Our approach b egins with a basic ob ject calculus
that provides a notion of ob ject, metho d lo okup, and ob ject extension an ob ject-based form
of inheritance. In this calculus, the typ e of an ob ject gives an interface, as a set of metho ds
public memb er functions and their typ es, but do es not imply any implementation prop erties
such as the presence or layout of any hidden internal data. We extend the core ob ject calculus
with a higher-order form of data abstraction mechanism that allows us to declare sup ertyp es
of an abstract typ e and a list of metho ds guaranteed not to b e present. This results in a
exible framework for studying and improving practical programming languages where the typ e
of an ob ject gives certain implementation guarantees, suchaswould b e needed to statically
determine the o set of a function in a metho d lo okup table or safely implement binary op erations
without exp osing the internal representation of ob jects. We provetyp e soundness for the entire
language using op erational semantics and an analysis of typing derivations. Two insights that
are immediate consequences of our analysis are the identi cation of an anomaly asso ciated with
++
C private virtual functions and a principled, typ e-theoretic explanation for the rst time,
++
as far as we know of the link b etween subtyping and inheritance in C , Ei el and related
languages.
1 Intro duction
In theoretical studies of ob ject systems, suchas[AC94b, Bru93, FHM94, PT94] and the earlier
pap ers app earing in [GM94], typ es are viewed as interfaces to ob jects. This means that the typ e
of an ob ject lists the op erations on the ob ject, generally as metho d names and return typ es, but
do es not restrict its implementation. As a result, ob jects of the same typ e mayhave arbitrarily
di erentinternal representations. In contrast, the typ e of an ob ject in common practical ob ject-
++
oriented languages such as Ei el [Mey92] and C [Str86, ES90] may imp ose some implementation
constraints. In particular, although the \private" internal data of an ob ject is not accessible
outside the memb er functions of the class, all ob jects of the same class must have all of the private
internal data listed in the class declaration. In this pap er, we presentatyp e-theoretic framework
that incorp orates b oth forms of typ e. We explain the basic principles by extending a core ob ject
calculus with a standard higher-order data abstraction mechanism as in [MP88, CW85]. We also
Supp orted in part byaFannie and John Hertz Foundation Fellowship and NSF Grant CCR-9303099.
y
Supp orted in part by NSF Grant CCR-9303099 and the TRWFoundation. 1
discuss a sp ecial-purp ose syntax that is closer to common practice and that eliminates a few minor
syntactic inconveniences in our sp eci c use of standard abstract data typ e declarations.
From a programming p oint of view, \interface" typ es are often more exible than typ es that
constrain the implementation of ob jects. With this form of typ e, we could de ne a single typ e
of matrix ob jects, for example, then represent dense matrices with one form of ob ject and sparse
matrices with another. If the typ e only gives the interface of an ob ject, then b oth matrix represen-
tations could have the same typ e and therefore b e used interchangeably in any program. This kind
of exibili ty is particularly useful when we write library op erations on matrices without assuming
any particular implementation. Such library functions may b e written using a standard interface
typ e, without concern for how matrices might b e implemented in later or earlier developmentof
a software system.
Typ es that restrict the implementations of ob jects are also imp ortant. If we know that all
p oint ob jects inherit a sp eci c representation of x and y co ordinates, for example, then a program
++
may b e optimized to take advantage of this static guarantee. The usual implementations of C ,
for example, use typ e information to statically calculate the o set of memb er data relativetothe
starting address of the ob ject. A similar calculation is used to nd the o set of virtual member
functions in the v-table at compile time; see [ES90, Section 10.7c]. Such optimizations are not
p ossible in an untyp ed language such as Smalltalk [GR83] and would not b e p ossible in a typ ed
language where ob jects of a single typ e could have arbitrarily dissimilar implementations.
A second, more metho dological reason that programmers maybeinterested in implementation
typ es is that there are greater guarantees of b ehavioral similarity across subtyp e hierarchies. More
sp eci cally, traditional typ e systems generally give useful information ab out the signature or do-
main and range of op erations. This is a very weak form of sp eci cation and, in many programming
situations, it is desirable to have more detailed guarantees. While b ehavioral sp eci cations are dif-
cult to manipulate e ectively,wehave a crude but useful approximation when typ es x part of
the implementation of an ob ject. To return to p oints, for example, if we know that all subtyp es of
p oint share a common implementation of a basic function like move , then the typ e system, in e ect,
guarantees a b ehavioral prop ertyofpoints. This maybeachieved in our framework if move is
private or if we add the straightforward capability of restricting rede nition of protected or public
metho ds.
A more subtle reason to use typ es that restrict the implementations of ob jects has to do with
the implementation of binary op erations. In an ob ject-oriented context, a binary op eration on typ e
A is realized as a memb er function that requires another A ob ject as a parameter. In a language
where all ob jects of typ e A share some common representation, it is p ossible for an A member
function to safely access part of the private internal representation of another A ob ject. A simple
example of this arises with set ob jects that have only a memb ership test and a union op eration
in their public interfaces. With interface typ es, some ob jects of typ e set might b e represented
internally using bit-vectors, while others might use linked lists. In this case, there is no typ e-safe
way to implement union, since no single op eration will access b oth a bit-vector and a linked list
correctly. With only interface typ es, it is necessary to extend the public interface of b oth kinds of
sets to make this op eration p ossible. In contrast, if the typ e of an ob ject conveys implementation
information, then a less exible but typ e-safe implementation of set union is p ossible. In this case,
all set ob jects would have one representation and a union op eration could b e implemented by taking
advantage of this uniformity.
This pap er presents a provably sound typ e system, based on accepted typ e-theoretic principles,
that allows us to write b oth \interface typ es" and \implementation typ es". The system is relatively
simple in outline, since it may b e viewed as a straightforward combination of basic constructs that 2
have b een studied previously.However, there are a numb er of details involving subtyp e assertions
ab out abstract typ es, extensions to abstract typ es, covariance and contravariance of metho ds, and
the absence of metho ds that make the exact details of the system relatively subtle. In summary, the
pap er has three main p oints: i the general view that classes corresp ond to abstract data typ es, ii
a precise formulation of a higher-order abstract typ e mechanism and a exible underlying ob ject
++
calculus that together make it p ossible to establish a corresp ondence b etween C -style classes and
abstract data typ es whose representations are ob jects, iii a pro of of typ e soundness for the typ e
++
system that arises from this analysis. While there is a folkloric b elief that C and Ei el classes
provide a form of data abstraction, we b elieve that this is the rst technical account suggesting a
precise corresp ondence b etween class constructs and a standard non-ob ject-oriented form of data
abstraction.
Our presentation of classes as abstract data typ es requires a numb er of op erations on ob jects.
Sp eci cally, the underlying ob ject calculus without data abstraction must provide a basic form of
ob ject that allow ustoinvoke metho ds of an ob ject, extend ob jects with new metho ds, and replace
existing metho ds. Moreover, in order to capture the form of subtyping presentintyp ed ob ject-
oriented languages, wemust have at least the usual form of subtyping b etween ob ject typ es. This
muchwas already provided by our previous ob ject calculus, presented in [FHM94, FM95]; similar
primitives are also provided in alternative approaches suchas[AC94b,AC94a].
In adding a data abstraction mechanism, wemust incorp orate subtyp e sp eci cations, negative
information absence of metho ds, and variance information in abstract typ e declarations. This is
so that abstract typ es are extensible in essentially the same way as their underlying representations.
Therefore, we extend our basic calculus from [FHM94, FM95] with more detailed static information
ab out typ e variables and since most op erations are done on functions from typ es to so-called
rows rowvariables. While the intuitive decomp osition of classes into data abstraction and ob ject
primitives is essentially straightforward, the need to maintain detailed information ab out subtyping
prop erties and extensibility of abstract typ es leads to certain technical complications and subtleties
that had to b e overcome in developing this analysis.
A detail for readers of [FHM94] is that the \class" typ es of that pap er were interface typ es,
not classes in the sense of a form of ob ject typ e that includes implementation information. We
therefore renamed them \pro" for prototyp e typ es in [FM95] and continue that terminology here.
2 Overview by Example
2.1 Protection levels
Each class in an ob ject-oriented program has two kinds of external clients: sections of the program
that use ob jects created from the class and classes that derive new classes from the original. Since
the metho ds of a class may refer to each other, the class also has an internal \client," namely
++
itself. We therefore asso ciate three interfaces with each class. Using C terminology, these may
b e distinguished as follows:
Private metho ds are only accessible within the implementation of the class,
Protected metho ds are only accessible in the implementation of the class and derived classes,
Public metho ds may b e accessible, through ob jects of the class, in any mo dule of the program.
One goal of this pap er is to showhowwe can asso ciate a di erenttyp e with eachinterface and
use essentially standard typ e-theoretic constructs to restrict visibility in each part of a program 3
appropriately. In doing so, wepay particular attention to the fact that although the private or
protected metho ds may not b e accessible in certain contexts, it is imp ortant for the typ e system
to guarantee their existence.
2.2 Point and color p oint classes
++
Using C -like syntax, we might declare classes of p oints and colored p oints as shown b elow. For
simplicity,we use only one-dimensional p oints.
class Point
private x : int;
protected setX : int -> Point;
public getX : int;
mv : int -> Point;
newPoint : int -> Point;
end;
class ColorPoint : Point
private c : color;
protected setC : color -> ColorPoint;
public getC : color;
newColorPoint : color -> int -> ColorPoint
end;
metho ds are used to assign to private x-co ordinate or color data, and get Intuitively, the set
metho ds are used to read the values of the data. The move metho d mv changes the x-co ordinate of a
++
p oint or colored p oint. These classes re ect a common idiom of C programming, where the basic
data elds are kept private so that the class implementor maychange the internal representation
without invalidating client co de. Protected metho ds make it p ossible for derived classes to change
the values of private data, without providing the same capability outside of derived classes.
++
In C and in the pseudo-co de ab ove, each class contains a sp ecial function called a constructor
for the class. Here the constructor is distinguished by the syntactic form newClassname . Since all
ob jects of a class are created by calling a class constructor, it is imp ortant to b e able to call the
constructor function b efore any ob jects have b een created. Therefore, unlike the other functions
listed in the class declarations, the class constructor is not a metho d; it do es not b elong to ob jects
of the class. Constructors are included in class declarations primarily as a syntactic convenience.
2.3 Interface typ e expressions
In translating the pseudo-co de ab oveinto a more precise form, we write a typ e expression for
eachinterface of each class, resulting in six distinct but related typ es. In hop es that this practice
will provide useful mnemonics, we follow a systematic naming convention where, for class A ,we
call the typ e expressions for the public, protected, and private interfaces A ; A , and A ,
pub prot priv
resp ectively.
Although eachinterface is essentially a list of metho ds names and their typ es, it is necessary
to use a typ e function instead of a typ e for eachinterface. The reason is that the typ e asso ciated
with the ob jects instantiated from a given class is recursively de ned; this typ e is a xed-p ointof
atyp e function. Point-wise subtyping b etween suchtyp e functions is the critical relation b etween
interfaces for typ e-checking inheritance. 4
For Point , this metho dology gives us the following typ e functions, using the form hh ::: ii for
ob ject interface typ es:
def
!
Point = t hhgetX : int; mv : int tii
pub
def
! !
Point = t hhsetX : int t; getX : int; mv : int tii
prot
def
! !
Point = t hhx : int; setX : int t; getX : int; mv : int tii
priv
These interface functions are formed from the class declaration for Point by replacing o ccurrences
of Point byatyp e variable t and lamb da-abstracting to obtain a typ e function. We will use row
variables to range over suchtyp e functions, which map typ es to nite lists of metho d/typ e pairs.
The analogous interfaces for ColorPoint are written using a free rowvariable p , which will b e
b ound to the abstract typ e-function for p oints in the scop e where the ColorPoint interfaces will
app ear:
def
ColorPoint = t hhpt j getC : Colorii
pub
def
!
ColorPoint = t hhpt j setC : Color t; getC : Colorii
prot
def
!
ColorPoint = t hhpt j c : Color; setC : Color t; getC : Colorii
priv
Since the sup ertyping b ounds on identi er p will give all of the relevant protected or public
Point metho ds, there is no need to list the metho ds inherited from Point . Consequently these
ColorPoint interfaces list exactly the same metho ds as our pseudo-co de ColorPoint class. Since
the variable p will b e \existentially b ound" in an abstract data typ e declaration, the o ccurrence
of p in each expression will guarantee that all the private metho ds of Point ob jects are presentin
every ColorPoint ob ject, without exp osing any other information ab out private Point metho ds.
So-called \kind" information in the declaration of p will guarantee that metho ds named c; setC;
and getC are not present, making it typ e-safe to extend p ob jects with these new metho ds.
2.4 Implementations
A class implementation sp eci es an ob ject layout and set of metho d b o dies co de for the metho ds of
the ob jects. In our approach, the ob ject layout will b e given bya typ e expression and the metho d
b o dies will b e part of the constructor function. Following general principles of data abstraction,
the metho d b o dies may rely on asp ects of the representation that are hidden from other parts of
the program.
We use a subtyp e-b ounded form of data abstraction based on existential typ es [MP88, CW85].
Using this formalism, the implementation of p oints will b e given by a pair with subtyp e-b ounded
existential typ e of the form
fjp <:::::: = P ; ConsImp jg
priv p
consisting of the private interface P for p oints and a constructor function ConsImp that might
priv p
use an initial value for the x -co ordinate, for example, to return a new p oint ob ject. Our framework
allows anynumb er of constructor functions, or other \non-virtual" op erations to b e provided here.
However, for simplicity,we discuss only the sp ecial case of one constructor p er class. For the
moment, we leave the sup ertyp e b ound, indicated by the ellipsis :::above, unsp eci ed since we
later discuss two separate approaches, a minimal one in whichwe give the protected interface here
later restricting the program view to the public interface and a more sp ecial-purp ose approachin
which b oth the protected and public interfaces are sp eci ed in the class implementation. The \kind" 5
+
Abstype p <: P :: T ! fc; getC; setCg; ;
prot
w
!
with newPoint : Int pro u pu
+
is fjp < : P :: T ! fc; getC; setCg; ; = P ; Imp jg
prot priv p
w
in
+
Abstype cp < : CP :: T ! ;; ;
prot
w
! !
cp u with newColorPoint : Int Color pro u
+
is fjcp <: CP :: T ! ;; ; = CP ; Imp jg
prot priv cp
w
in
+
Abstype p < : P :: T ! ;; ;
pub
w
!
pt with newPoint : Int ob j t
+
is fjp <: P :: T ! ;; ; = p; newPointjg
pub
w
in
+
Abstype cp <: CP :: T ! ;; ;
pub
w
! !
cp t with newColorPoint : Int Color ob j t
+
is fjcp <: P :: T ! ;; ; = cp; newColorPointjg
pub
w
in
hProgrami
Figure 1: Nested abstract data typ es for p oints and colored p oint classes.
will indicate that we are declaring an abstract typ e function, list metho ds that are guaranteed
not to b e present in the implementation of p oints, and describ e the variance of p oint ob jects. The
variance information is needed to infer subtyping relationships for ob ject typ es that contain row
variable p .
The implementation of ColorPoint similarly has the form
0
fjcp <:::::: = CP ; ConsImp jg
priv cp
where the constructor ConsImp , rst invokes the Point class constructor newPoint , then extends
cp
the resulting prototyp e with the new metho ds c; setc; and getc . De nitions of the Point and
ColorPoint class constructors are given in Section 3.4.
2.5 Class hierarchies as nested abstract typ es
Our basic view of classes and implementation typ es is that the class-based pseudo-co de in Section 2.2
may b e regarded as sugar for the sequence of nested abstract data typ e Abstype declarations
given in Figure 1. Since it is syntactically awkward to use two declarations p er class, one giving the
protected interface and the other the public interface, we discuss alternate syntactic presentations
in Section 3.5.
In order, the four abstract typ e declarations give the protected view of Point, the protected
view of ColorPoint, the public view of Point, and the public view of ColorPoint. The nesting
structure allows the implementation of ColorPoint to refer to the protected view of Point and
allows the program to refer to public views of b oth classes. 6
The two inner declarations hide the protected view of a class by redeclaring the same typ e name
and constructor and exp osing the public view with a di erenttyp e, as discussed b elow. Since the
implementation of the public view, in each case, is exactly the same as the implementation of the
protected view, hiding the protected metho ds is the only function of the two inner declarations.
We admit that reusing b ound variables is a bit of a \hack," but this is a relatively minor issue that
can b e solved by departing from the simple blo ck-structured scoping mechanism we use here.
One distinction b etween the protected and public views is that the constructors for the protected
view return an ob ject with a pro typ e while the public view constructors return ob jects with ob j
typ es. These twotyp es from our underlying ob ject calculus allow di erent sets of op erations on
ob jects. Sp eci cally, new metho ds may b e added to an ob ject with a pro typ e and existing metho ds
can b e overridden. If an ob ject has an ob j typ e, on the other hand, the only op eration is to invoke
a metho d of the ob ject. Since wehave di erent sets of op erations, there are di erent subtyping
rules: the subtyping relation is relatively richbetween ob j typ es, while the only sup ertyp es of a
pro typ e are ob ject typ es. Intuitively, this means that the metho ds of an ob ject may b e mo di ed
or extended when it is used as a prototyp e, but not once it is \promoted" to a memberofan ob j
typ e. While the distinction b etween these two kinds of typ es is convenient for our purp oses here, it
was originally devised as a mechanism for obtaining nontrivial subtyping in the presence of ob ject
inheritance primitives ob ject extension and metho d rede nition.
3 Ob ject calculus and typ e system
3.1 The calculus
The expressions of our core calculus are untyp ed lamb da terms, includin g variable x , application
e e and lamb da abstraction x: e , extended with four ob ject forms:
1 2
hi the empty prototyp e or ob ject
e m send message m to prototyp e or ob ject e
he + m=e i extend prototyp e e with new metho d m having b o dy e
1 2 1 2
he m= e i replace prototyp e e 's metho d b o dy for m by e
1 2 1 2
These expressions are the same as in [FHM94, FM95]. However, the typ e system used in the
present pap er is more detailed. We extend this core calculus with two encapsulation primitives:
Abstype r <: R :: with x : is e in e
1 2
w
0
fjr <: R :: = R ; ejg
w
The rst is used to provide limited access to implementation e in client e . Typ e expression
1 2
r < : R :: and assumption x : provide the interface for this access. The typ e system will
w
0
require expression e to have the form fjr < : R :: = R; ejg, which is the implementation of
1
w
the abstraction. The comp onents of these expressions will b e explained in more detail after we
intro duce the typ e system.
3.2 Op erational semantics
The op erational semantics include -reduction for evaluating function applications and a rule
for evaluating message sends:
eval
he m=e im ! e he m=e i
1 2 2 1 2 7
where may b e either +or.We also need various b o okkeeping rules to access metho ds
de ned within e . These b o okkeeping rules app ear in App endix A; they are explained in full in
1
[FHM94]. The reduction rule Absty pe for abstract data typ e declarations is:
eval
0 0
Abstype r < : R :: with x : is fjr <: R :: = R ; e jg in e ! [R =r; e =x]e
1 2 1 2
w w
3.3 Static Typ e System
The typ e expressions, given in App endix B, include typ e variables, function typ es, pro typ es, ob j
typ es, and existential typ es. To reduce the complexity of the typ e system, these typ es are divided
0
into two categories. The unquanti ed, monotyp es are indicated using metavariables ; ; ;:::.
1
0
The quanti ed typ es, indicated using metavariables ; ; ;:::,may contain existential quanti ers.
1
A row is a nite list of method name, type pairs. Row expressions app ear as sub expressions
of typ e expressions, with rows and typ es distinguished by kinds. Intuitively, the elements of kind
fm~ g; V are the rows that do not include the metho d names in fm~ g and whose the free typ e
variables app ear with variance indicated in variance set V . Wekeep track of the absence of
metho ds in order to guarantee that metho ds are not multiply de ned. The variance information,
which tells whether a variable app ears monotonically,antimonotonically, or neither, is necessary
for subtyping judgements involving typ es of the form pro t R or ob j t R since R may contain
a
rowvariables. Typ e functions, which arise as row expressions with kind T ! fm~ g; V , are used
to infer a form of higher-order p olymorphism for metho d b o dies and to provide typ e interfaces to
encapsulated implementations. The annotation a indicates the variance of the abstracted variable.
Toavoid unnecessary rep etition in our presentation, we use the meta-variable prob j for either
hhm : ;:::;m : ii are ob jects e such that ob j or pro.Intuitively, the elements of typ e prob j t
1 1 k k
for 1 i k , the result of e m is a value of typ e .However, since the b ound typ e variable t
i i
may app ear free in , the typ e of e m is actually the result of replacing each free o ccurrence of
i i
t in by prob j t hhm : ;:::;m : ii. Because of this substitution, prob j t hh:::ii is e ectively
i 1 1 k k
a sp ecial form of recursively-de ned typ e.
The typing rule for sending a message to an ob ject is