Adding Genericity to the Java Programming Language
Total Page:16
File Type:pdf, Size:1020Kb
Making the future safe for the past: Adding Genericity to the JavaTM Programming Language Gilad Bracha, Sun Microsystems, [email protected] Martin Odersky, University of South Australia, [email protected] David Stoutamire, Sun Microsystems, [email protected] Philip Wadler, Bell Labs, Lucent Technologies, [email protected]{labs.com Abstract This approach is exemplified by the Java program- ming language[GLS96]. Generics are represented by We present GJ, a design that extends the Java program- this idiom throughout the standard Java libraries, in- ming language with generic types and methods. These cluding vectors, hash tables, and enumerations. As the are both explained and implemented by translation into Java Development Kit (JDK) has evolved, generics have the unextended language. The translation closely mim- played an increasing role. JDK 1.1 introduced an ob- ics the way generics are emulated by programmers: it server pattern that depends on generics, as do the col- erases all type parameters, maps type variables to their lection classes introduced in JDK 1.2. Oberon also relies bounds, and inserts casts where needed. Some sub- on the generic idiom, and dynamically typed languages tleties of the translation are caused by the handling of such as Smalltalk [GR83] use this idiom implicitly. overriding. Nonetheless, generics may merit direct support. De- GJ increases expressiveness and safety: code utiliz- signing a language with direct support for subtyping ing generic libraries is no longer buried under a plethora and generics is straightforward. Examples include Mod- of casts, and the corresponding casts inserted by the ula 3, Ada 95, Eiffel, and Sather. Adding generics to translation are guaranteed to not fail. an existing language is almost routine. We proposed GJ is designed to be fully backwards compatible with adding generics to the Java programming language in the current Java language, which simplifies the tran- Pizza [OW97], and we know of four other proposals sition from non-generic to generic programming. In [AFM97, MBL97, TT98, CS98]. Clemens Szyperski particular, one can retrofit existing library classes with proposed adding generics to Oberon [RS97]. Strongtalk generic interfaces without changing their code. [BG93] layers a type system with generic types on top An implementation of GJ has been written in GJ, of Smalltalk. and is freely available on the web. The generic legacy problem However, few propos- 1 Introduction als tackle the generic legacy problem: when direct sup- port for generics is added to a language that supports Generic types are so important that even a language them via the generic idiom, what happens to legacy that lacks them may be designed to simulate them. code that exploits this idiom? Some object-oriented languages are designed to support Pizza is backward compatible with the Java pro- subtypes directly, and to support generics by the idiom gramming language, in that every legal program of the of replacing variable types by the top of the type hier- latter is also legal in the former. However, this compat- archy. For instance, a collection with elements of any ibility is of little help when it comes to generics. For type is represented by a collection with elements of type example, JDK 1.2 contains an extensive library of col- Object. lection classes based on the generic idiom. It is straight- forward to rewrite this library to use generics directly, To appear in the 13th Annual ACM SIGPLAN replacing the legacy type Collection by the parametric Conference on Object-Oriented Programming Sys- tems, Languages, and Applications (OOPSLA’98), type Collection<A>. However, in Pizza these two types Vancouver, BC, Canada, October, 1998. are incompatible, so one must rewrite all legacy code, or write adaptor code to convert between legacy and parametric types. Code bloat may result from refer- ences to both the legacy and parametric versions of Security One may contrast two styles of implement- the library. Note the problem is not merely with the ing generics, homogeneous and heterogeneous. The ho- size of legacy libraries (which may be small), but with mogeneous style, exemplified by the generic idiom, re- managing the upgrade from the legacy types to para- places occurrences of the type parameter by the type metric types (which can be a major headache if refer- Object. The heterogeneous style, exemplified by C++ ences to legacy types are dispersed over a large body and Ada, makes one copy of the class for each instantia- of code). If legacy libraries or code are available only tion of the the type parameter. The GJ and Pizza com- in binary rather than source, then these problems are pilers implement the homogeneous translation, while compounded. Agesen, Freund, and Mitchell [AFM97] propose having the class loader implement the heterogeneous transla- GJ Here we propose GJ, a superset of the Java pro- tion. Other proposals utilize a mixture of homogeneous gramming language that provides direct support for and heterogeneous techniques [CS98]. generics. GJ compiles into Java virtual machine (JVM) As observed by Agesen, Freund, and Mitchell, byte codes, and can be executed on any Java compli- the heterogeneous translation provides tighter security ant browser. In these respects GJ is like Pizza, but GJ guarantees than the homogeneous. For example, un- differs in that it also tackles the generic legacy problem. der the homogeneous translation a method expecting GJ contains a novel language feature, raw types, to a collection of secure channels may be passed a collec- capture the correspondence between generic and legacy tion of any kind of object, perhaps leading to a security types, and a retrofitting mechanism to allow generic breach. To minimize this problem, GJ always inserts types to be imposed on legacy code. A parametric type bridge methods when subclassing a generic class, so the Collection<A> may be passed wherever the correspond- user may ensure security simply by declaring suitable ing raw type Collection is expected. The raw type and specialized subclasses. parametric type have the same representation, so no The homogeneous translation also enjoys some ad- adaptor code is required. Further, retrofitting allows vantages over the heterogeneous. Surprisingly, with the one to attribute the existing collection class library with security model of the Java virtual machine, the hetero- parametric types, so one only requires one version of the geneous translation makes it impossible to form some library; an added plus is that new code will run in any sensible type instantiations. (This problem is entirely JDK 1.2 compliant browser against the built-in collec- obvious, but only in retrospect.) GJ and other lan- tion class library. Raw types and retrofitting apply even guages based on the homogeneous translation do not if libraries or code are available only as binary class files, suffer from this difficulty. and no source is available. Combined, these techniques greatly ease the task of upgrading from legacy code to Type inference While type systems for subtyping generics. and for generics are well understood, how to combine The semantics of GJ is given by a translation back the two remains a topic for active research. In particu- into the Java programming language. The translation lar, it can be difficult to infer instantiations for the type erases type parameters, replaces type variables by their arguments to generic methods. bounding type (typically Object), adds casts, and in- GJ uses a novel algorithm for this purpose, which serts bridge methods so that overriding works properly. combines two desirable (and at first blush contradic- The resulting program is pretty much what you would tory) properties: it is local, in that the type of an ex- write in the unextended language using the generic id- pression depends only on the types of its subexpres- iom. In pathological cases, the translation requires sions, and not on the context in which it occurs; and it bridge methods that can only be encoded directly in works for empty, in that inference produces best types JVM byte codes. Thus GJ extends the expressive power even for values like the empty list that have many pos- of the Java programming language, while remaining sible types. Further, the inference algorithm supports compatible with the JVM. subsumption, in that if an expression has a type, then GJ comes with a cast-iron guarantee: no cast in- it may be regarded as having any supertype of that serted by the compiler will ever fail. (Caveat: this guar- type. antee is void if the compiler generates an `unchecked' In contrast, the algorithm used in Pizza is non-local warning, which may occur if legacy and parametric code and does not support subsumption (although it does is mixed without benefit of retrofitting.) Furthermore, work for empty), while the algorithm used in Strongtalk since GJ compiles into the JVM, all safety and security [BG93] does not work for empty (although it is local and properties of the Java platform are preserved. (Reassur- supports subsumption), and algorithms for constraint- ance: this second guarantee holds even in the presence based type inference [AW93, EST95] are non-local (al- of unchecked warnings.) though they work for empty and support subsumption). Pizza uses a variant of the Hindley-Milner algorithm the relative strengths of parametric and virtual types [Mil78], which we regard as non-local since the type of appears elsewhere [BOW98]. It may be possible to a term may depend on its context through unification. merge virtual and parametric types [BOW98, TT98], but it is not clear whether the benefits of the merger Raw types and retrofitting Raw types serve two outweigh the increase in complexity. purposes in GJ: they support interfacing with legacy code, and they support writing code in those few sit- Status An implementation of GJ is freely available on uations (like the definition of an equality method) the web [GJ98a].