Ralph E. Johnson University of Illinois at Urbana-Champaign
Abstract subclassing with subtyping [Sus81, B182, SCB’86, Str86, Mey88]. In our type system, types are based This paper describes a type system for Smalltalk that on classes (i.e. each class defines a type or a fam- is type-safe, that allows most Smalltalk programs to ily of types) but subclassing has no relationship to be type-checked, and that can be used as the basis of subtyping. This is because Smalltalk classes inherit an optimizing compiler. implementation and not specification. Our type system uses discriminated union types and signature types to describe inclusion polymor- 1 Introduction phism and describes functional polymorphism (some- times called parametric polymorphism, see [DT88]) There has been a lot of interest recently in type using bounded universal quantification. It has the systems for object-oriented programming languages following features: [CW85, DT88]. Since Smalltalk was one of the ear- liest object-oriented languages, it is not surprising a automatic case analysis of union types, that there have been several attempts to provide a type system for it [SuaBl, BI82]. Unfortunately, l effective treatment of side-effects by basing the none of the attempts have been completely successful definition of the subtype relation on equality of [Joh86]. In particular, none of the proposed type sys- parameterized types, and tems are both type-safe and capable of type-checking l type-safety, i.e. a variable’svalue always con- most common Smalltalk programs. Smalltalk vio- forms to its type. lates many of the assumptions on which most object- oriented type systems are based, and a successful Because the type system uses class information, it can type system for Smalltalk is necessarily different from be used for optimization. It has been implemented as those for other languages. part of the TS (Typed Smalltalk) optimizing compiler We have designed a new type system for Smalltalk. [JGZ88]. The biggest difference between our type system and others is that most type systems for object-oriented programming languages equate classes with types and 2 Background
Authors’ address, telephone, and e-mail: Smalltalk [GRSS] is. a pure object-oriented program- Department of Computer and Information Sciences, E301 CSE, Gainesville, FL 32611, (904) 392-1507, graverOcis.ti.edu ming language in that everything, from integers to Department of Computer Science, 1304 W. Springfield Ave., text windows to the execution state, is an object. In Urbana, IL 61801, (217) 244-0093, [email protected] particular, classes are objects. Since everything is an object, the only operation needed in Smalltalk is mes- sage sending. Smalltalk uses run-time type-checking. Every message send is dynamically bound to an im- plementation depending on the class of the receiver. This unified view of the universe makes Smalltalk a Permission to copy without fee all or part of this material is granted compact yet powerful language. provided that the copies are not made or distributed for direct Smalltalk is extremely extensible. Most operations commercialadvantage, the ACM copyright notice and the title of the are built out of a small set of powerful primitives. publication and its date appear, and notice is given that copying is by permission of the Asociation for Computing Machinery. To copy othcr- wise, or to republish, requires a fee and/or specific permission. 0 1990 ACM 089791-3434/9O/OOOi/O136 $1.50 136 For example, control structures are all implemented in terms of blocks, which are the Smalltalk equivalent class Change: of closures. Smalltalk’s equivalent to if-then-else is values the ifTtue:ifFalse: message, which is sent to a boolean TAtray with: self class with: self parameters object with two block arguments. If the receiver is an parameters instance of class True then the “true block” is evalu- self subclassResponsibility ated. Similarly, if the receiver is an instance of class Fake then the “false block” is evaluated. Looping is class ClassRelatedChange: implemented in a similar way using recursion. These parameters primitives can be used to implement case statements, TclassName generators, exceptions, and coroutines[Deu81]. Smalltalk has been used mostly for prototyping and class MethodChange: exploratory development. It is ideal for these pur- parameters poses because Smalltalk programs are easy to change fAtray with: className with: selector and reuse, and Smalltalk comes with a powerful pro- gramming environment and large library of generally class OtherChange: useful components. However, it has not been used parameters as much for production programming. This is partly Tself text because it is hard to use for multi-person projects, partly because it is hard to deliver application pro- grams without the large programming environment, Figure 1: Change and its subclasses. and partly because it is not as efficient as languages like C. In spite of these problems, the development and maintenance of Smalltalk programs is so easy that more and more companies are developing ap- plications with it. class is needed) [JF88]. Although conscientious prd- A type system can help solve Smalltalk’s problems. grammers remove these improprieties from finished Type information makes programs easier to under- programs, we wanted a type system that would al- stand and can be used to automatically check inter- low them, since they are often important intermedi- face consistency, which makes Smalltalk better suited ate steps in the development process. Thus, a static for multiperson projects. However, our main motiva- type system for Smalltalk must be as flexible as the tion for type-checking Smalltalk is to provide infor- traditional dynamic type-checking. mation needed by an optimizing compiler. Smalltalk In an untyped object-oriented language like methods (procedures) tend to be very small, mak- Smalltalk, classes inherit only the implementation ing aggressive inline substitution necessary to achieve of their superclasses. In contrast, most type sys- good performance. Type information is required to tems for object-oriented programming languages re- bind methods at compile-time. Although some type quire classes to inherit the specification of their super- information can be acquired by dataflow analysis of classes [BI82, Car84, CW85, SCB*86, Str86, Mey88], individual methods [CUL89], explicit type declara- not just the implementation [Sua81, Joh86]. A class tions produce better results and also make programs specification can be an explicit signature [BHJL86], more reliable and easier to understand. From this the implicit signature implied by the name of a class point of view, the type system has been quite suc- [SCB*86], or a signature combined with method pre- cessful, because the TS compiler can make Smalltalk and post-conditions, class invariants, etc. [Mey88]. programs run nearly as fast as C programs. Inheriting specification means that the type of meth- It is important that a type system for Smalltalk not ods in subclasses must be subtypes of their specifica- hurt Smalltalk’s good qualities. In particular, it must tions in superclasses. not hinder the rapid-prototyping style often used by Since Smalltalk is untyped, only implementation is Smalltalk programmers. Exploratory programmers inherited. This makes retrofitting a type system dif- try to build programs as quickly as possible, mak- ficult. Since specification inheritance is a logical or- ing localized changes instead of global reorganization. ganization, many parts of the Smalltalk inheritance They often use explicit run-time checks for particu- hierarchy conform to specification inheritance, but lar classes (evidence that code is in the wrong class) there is nothing in the language that requires or en- and create new subclasses to reuse code rather than forces this. Thus, it is common to find classes in to organize abstraction (evidence that a new abstract the Smalltalk- class library that inherit implemen-
137 tation and ignore specification. 3 Types
Dictionary is a good example of a class that inher- The abstract and concrete syntax for type expressions its implementation but not specification; it is a sub- is shown in Figure 2 (abstract syntax on the left and class of Set. A Dictionary is a keyed lookup table concrete syntax on the right). In the concrete syn- that is implemented as a hash table of tax grammar terminals are underlined, (something)* pairs. Dictionary inherits the hash table implemen- represents zero or more repetitions of the something, tation from Set, but applications using Sets would and (something)+ represents one or more repetitions behave quite differently if given Dictionaries instead. of the something. Each of the type forms will be de- scribed in detail in the following sections. Abstract classes, a commonly used Smalltalk pro- We use the abstract syntax in inference rules and gramming technique, provide other examples where the concrete syntax in examples. Type expressions classes inherit implementation but not specification. are denoted by a, t, and u. Type variables are de- Consider the method definitions shown in Figure 1. noted by p (abstract syntax) or by P (concrete_syn- These (and other) classes are used to track changes tax). Lists (tuples) of types are denoted by t = made to the system. The abstract class Change de- . The empty list is denoted by <> fines the values method in terms of the parameters and t et’is the list < t, 21, t2,. . . , t, >. Similar nota- method. The implementation of the values method is tion is used for lists of type variables. C denotes a inherited by the subclasses of Change. The result of class name and m denotes a message selector. sending the values message depends on the result of Strictly speaking, a type variable is not a type; it sending the parameters message, which is different for is a place holder (or representative) for a type. We each class in Figure 1. Hence, the specification of the assume that all free type variables are (implicitly) values method is also different for each of the classes. universally quantified at an appropriate level. For example, the local type variables of a method type A class-based type system with explicit union types are assured to be universally quantified over the type works for an implementation inheritance hierarchy of the method; the class type parameters of a class and greatly simplifies optimization, but is not very are assumed to be universally quantified over all type flexible when it comes to adding new classes to the definitions in the class. We also assume that all type system. To regain some of the flexibility we incor- variables are unique (i.e. have unique names) and are porate type-inference and signatures. This gives us implicitly renamed whenever a type containing type both flexibility and the ability to optimize. variables is conceptually instantiated. For example, the local type variables of a method type are renamed A type system suitable for an optimizing Smalltalk each time the method type is “used” at a call site. We compiler must also have parameterized types. It is do use explicit range declarations for type variables difficult to imagine a (non-trivial) Smalltalk applica- to achieve bounded universal quantification (see Sec- tion that does not use Collections. The compiler must tions 3.5 and 3.6). Given these assumptions, we treat be able to associate the type of objects being added type variables 8s types. to a Collection with the type of objects being removed from it. Without parameterized types, all Collections are reduced to homogeneous groups of generic ob- 3.1 Subtyping jects. Furthermore, due to the potentially imperative side-effects of any operation, a special imperative def- The intuitive meaning of the subtype relation 5 is inition of subtyping for parameterized types is used that if B C t then everything of type 8 is also of type (see Section 3.1). t. The subtyping relation for our type system can be formalized 8s 8 Set, of type inference rules (8s in The type system must also be able to describe the [CW85]). C is a set of inclusion constraints for type functional and inclusion polymorphism exhibited by variables. The notation C.p E t means to extend the many Smalltalk methods. Functional polymorphism set C with the constraint that the type variable p is a can be described, in the usual manner, with (im- subtype of the type 1. The notation C t- s 5 t means plicit) bounded universal quantification of type vari- that from C we can infer s & t. A horizontal bar is ables. Inclusion polymorphism can be described with logical implication: if we can infer what is above it explicit union types or signatures. then we can infer what is below it. The following are basic inference rules. For a detailed discussion of these issues see [Gra89]. C t t & Anything
138 t ..- Type ::= (object type) ‘.‘- W) ClassName (of: Type)+ (block type) Iid ) Block (of: Z$pe)* returns: Type (union type) lt+t I (2) _ (signature type) I * (d Type)Sreturns: Type)*) I 0) t (Type) (type variable) IP IP (7 1 Anything 1 Anything (1) 1 Nothing [ Nothing
Figure 2: The abstract and concrete syntax of types.
C I- Nothing 5 t variables that can appear in type declarations for in- stance variables. C.p&tl-pCt An object type is a class name together with a pos- c.t L pl-t g p sibly empty list of types. The type list of an object type provides values for the class type parameters de- CkP 5 P fined for the corresponding class. Thus, the size of the The following rules are used for the type parame- type list that may accompany a class name is fixed by ter lists of object types (see Section 3.2) and for the the corresponding class definition. For example, the argument type lists for block types (see Section 3.4). classes Smalllnteger and Character have zero class type parameters, so SmallInteger and Character are valid Ck<>&<> object types. The classes Array and Set each have one class type parameter, so Array of: SmaIUnteger and Set of: (Array of: Character) are valid object types. Due to the imperative nature of Smalltalk [GR83, Subtyping is reflexive, transitive, and antisymmet- Joh86] and because of the antimonotonic ordering of ric . function types [DT88] we define subtyping for ob- cl-t c: t ject types as equality, i.e. one object type is included cl-s c t ctt c u in another if and only if they are equai (Cardelli Cl-8 c u also takes this approach for “updatable” objects in [CarSS]). (Recall that C is a set of inclusion con- Cl-8 L t cl-t c 8 straints and C is a class name.) Cks=t
3.2 Object Types Some variables contain only one kind of object. For example, all Characters * have an instance variable 3.3 Union Types that contains a Smalllnteger ASCII value. Other vari- ables can contain different kinds of objects. For ex- The type system defined so far cannot describe an ample, a Set can contain Smalllntegers, Characters, Array containing both Characters and Smallintegers. Arrays, Sets, and so on. Each class has a set of zero In Smalltalk, a variable can contain many different or more type variables called class type parametera kinds of objects over its lifetime. Thus, a type can that can be used in type expressions to specify the be a nonempty “set” of object types (or other types). types of instance variables. These are the only type An example of a union type is
1 The phrase “a SomeClass” refers to an instance of class SomeClass. The phrase “class SomeClass” refers to the class (Array of: SmallInteger) + (Array of: Character) itself. + (Array of: (SmallInteger + Character)).
139 Here, the plus operator is read as “or,” so a variable of the above type could contain an Array whose elements SequenceableCollection subclass: #OrderedCollection are all Smalllntegers, all Characters, or a mixture of instancevariables: ‘firstlndcx the two. lastlndex ’ Our type system does not use the subclass relation classvariables: ” on classes to induce the subtype relation on types. In- typeparameters: ‘ElementType’ stead, type inclusion is based on discriminated union poolDictionaries: ” types. An object type t is included in a union type u category: ‘Sequenceable-Collections’ only if t is included in one of u’s elements. cl-s c t Figure 3: Class definition for OrderedCollection. et-a g t+u A union type u is included in a union type u’ only if each element of u is included in IL’. of u’ and if each argument type of U’ is included in cl-s F u cl-t F u the corresponding argument type of u (this is the cl-s+t g u standard antimonotonic subtype relation for function types [MS82]). Some examples are: Character & Character + SmallInteger cl-2 g s’ cf-2 r t’ Army of: SmallInteger C_ Cl-s’+t & 2-t’ (Array of: SmallInteger) + (Array of: Character) For example, Array of: SmallInteger g Array of: (SmallInteger + Character) Block returns: Character & Block returns: (Character + SmallInteger) Note that our type system does not reflect class inher- itance. Even though Integer is a subclass of Number, Block of: (Float + Smalllnteger) returns: Smallinteger Inieger is not a subtype of Number. However, we can E Block of: Float returns: (Character f SmallInteger). specify the type of all subclasses of Number by listing them, i.e. Integer + Float + Fraction2 3.5 Typed Class Definitions 3.4 Block Types All instance variables and class variables must have their types declared. Each class defines a set of type Blocks (function abstractions) are treated differently variables called class type parameters, which may be from other objects. A block type consists of the name used to specify the types of instance and class vari- Block, a possibly empty list of types for block argu- ables. ments, and a return type. For example, Figure 3 shows the definition of OrderedCollection, Block of: (Array of: Character) of: SmallInteger which is a subclass of SequenceableCollection. Note returns: Character that the type of each instance variable is declared when the variable is declared and that angle brackets represents the type of a block with two arguments, are used to set off type expressions from the regular and Block returns: SmallInteger represents a block Smalltalk code. with no arguments. Block types differ from object A class definition defines an “object type” in which types in several ways. Unlike object types, there is the type list is the list of class type parameters (ap- no class Block. Types beginning with the name Block pearing in the same order as in the class definition). can have different sized argument type lists. The type defined by the above class definition would A block type u is included in a block type u’ only be if the return type of u is included in the return type Orderedcollection of: ElementType. *Common union types such as Integer $ Float + Fraction and True + F&c arc presently abbreviated using the global The scope of the class type parameter ElementType variables Num6rtType and BooleanType, respectively. Another is limited to the method and variable definitions in approach is to use the object type of an abstract class (classes class OrderedCollection (the same scope as a class vari- such as Number, Collection, and Boolean that provide an im- able). Class type parameters are inherited by sub- plementation template for subclasses but have no instances of their own) to automatically denote the union of the types of classes, just like instance and class variables. Any all its non-abstract subclasses. use of the OrderedCollection type outside of this scope
140 able declaration
{ IocalTypeVariables: < P2 > declares P to be a type variable that can be asso- receiverType: ciated with any of the types Integer, Character, or arguments: argl arg2 Integer + Character. Local type variables are instan- temporaries: temp tiated during type-checking to types determined by blockArguments: blkArg the types of actual arguments at a call site. Local returnType: } type variabies will usually, though not necessarily, correspond to the class type parameters of some class. message selector and argument names The do: method for SequenceableCollection is shown “comment stating purpose of message” in Figure 5 as an example of a typed method defini- tion. Notice that certain fields of the type declara- 1 temporary variable names 1 tions have been omitted. The syntax rules for type statements declarations are fairly liberal. Fields can occur in any order. All fields except receiverType: and returnType: Figure 4: Template for typed methods. can have multiple occurrences. The smallest type declaration for a method is “{}.” Self represents the type of the pseudo-variable self (and super). Unfortunately, the class of self is dif- ferent in each class that inherits a method, so Self must “instantiate” its class type parameter to a type must differ, too. Thus, each method is type checked constant (like Character) or to a locally known type for each class that inherits it by replacing Serf with variable. Thus, the above type actually defines a fam- the type appropriate for the inheriting class. In the ily of object types. absence of a receiverType: declaration (which is over The range (upper bound for type instantiation) 99% of the time}, Self defaults to the object type of a class type parameter may be restricted by for the class in which the method is being compiled. including an optional range type declaration. A Otherwise, Self is replaced with the type given in the class type parameter may only be instantiated to receiverType: declaration. A legal receiverType: must a type that is a subtype of its range type. The be a subtype of the default value of Self range declaration of a class type parameter is syn- Returning to the example of Figure 5, tactically identical to the type declaration for a SequenceableCollection has one class type parameter variable. In the above example, if we wished to called Elementnpe, so the type substituted for Self restrict the elements of OrderedCollections to be will be Sequenceablecollection of: ElementType. The Characters, the typeparameters: field would be de- argument aBlock is a block that takes one argument of clared as ‘ElementType ‘. If the range type ElementType and returns an object of unknown declaration of a class type parameter is omitted it is type P. When do: is envoked in another method assumed to be Anything. definition both Elementfipe and P will be associ- ated with actual receiver and argument types. The 3.6 Typed Method Definitions temporary variables index and length have values de- pendent on the size of Collections. In Smalltdk im- The types of method arguments, temporary variables, plementations with 30+ bit Smalllntegers, the possi- and block arguments can be explicitly declared or in- ble size of any Collection is well within the range of ferred by the TS type-inference mechanism [Gra89]. Smalllntegers.3 There is no explicit return statement The only difference between a typed method and an in the method so it implicitly returns self. untyped method is that the typed method has type Type declarations and typed methods introduce declarations. Type declarations must precede any the notions of message and method types. A mes- part of the method definition. A typed method tem- sage type consists of a message selector, a receiver plate is shown in Figure 4. The arguments for the type, a list of argument types, and a return type. A fields arguments:, temporaries:, and blockArguments: method type consists of a message type and a set of are lists of identifier declaration pairs. The constraints on the type variables used in the message arguments for the receiverType: and returnType: fields are single type declarations. The arguments for ‘Smalltalk defines infinite precision integer arithmetic using the classes LargeNegativelntcgcr, SmallInteger, and the IocalTypeVariables: field are capitalized identifiers. LargePoritivelnteger (the three subclasses of class Integer). The range of a type variable can be restricted by using Smalllntegers are essentially machine integers; instances of the the range: modifier. For example, the local type vari- Large-Integer classes are more complex.
141 { IocalTypeVariablcs:
arguments: aBlock temporaries: index length returnType: 3
do: aBlock “Evaluate aBlock for each of the receiver’s elements.”
( index length 1 index t- 0. length +- self size. [(index e index + 1) <= length] whileTrue: [aBlock value: (self at: index)]
Figure 5: The do: method for SequenceableCollection.
type. A method type denotes the type of a method relative to a specific class of receivers. Therefore, SeZf ( receiverType: should already have been expanded and should never arguments: anlnteger appear in a method type. The type of the above returnType: } method is denoted by new: anlntegtr “Answer a new instance of size anlnteger.” do: f. T(super new: (anlnteger max: 1)) setTally Message types use the same notation. Local range constraints on type variables are shown enclosed in Figure 6: The new: method in Set class. braces following the return type of a method type. For example,