Advances in Programming Languages: Generics, interoperability and implementation

Stephen Gilmore The University of Edinburgh

February 1, 2007

Understanding generic code Generic extends Java with generic type variables which allow us to define generic classes and generic methods. Generic C# extends C# similarly. When adding genericity to an object- oriented programming language it is important to do this in a way which does not invalidate existing code. Legacy code (which does not use the generic features of the language) should continue to operate as before and there should be a smooth upgrade path from legacy code to generic code. In Generic Java (Java 1.5) generics are compiled out by the Java compiler. Type variables are replaced by Object and downcasts are inserted automatically by the compiler instead of being coded manually by the programmer. The implications of this are that generic code is never slower than non-generic code (but there is no reason to think that it will be faster) while being more reliable than non-generic code. The way-ahead-of-time optimising compiler gcj will be extended to deal with Java generics when GCC-4.3 is released. The gcj compiler will include the Eclipse Compiler for Java, ecj as its front-end. This extended gcj could produce faster code by eliminating casts. This example shows use of SUN’s javac. /* File: programs/java/Cell.java */ /* A generic, updateable cell */ class Cell { A value; Cell(A value) { this.value = value; } A get() { return value; } void set (A value) { this.value = value; } } The above example is a simple generic cell which holds values. We now try comparing that generic Java class with its compiled bytecode representation. We can inspect the Java bytecode produced by the compiler by using the Java disassembler javap. We use the command javap -c Cell When compiled to Java bytecode we see that the cell is operating on objects of class java.lang.Object.

1 UG4 Advances in Programming Languages — 2005/2006 2

Compiled from "Cell.java" class Cell extends java.lang.Object{ java.lang.Object value;

Cell(java.lang.Object); Code: 0: aload_0 1: invokespecial #1; //Method java/lang/Object."":()V 4: aload_0 5: aload_1 6: putfield #2; //Field value:Ljava/lang/Object; 9: return java.lang.Object get(); Code: 0: aload_0 1: getfield #2; //Field value:Ljava/lang/Object; 4: areturn void set(java.lang.Object); Code: 0: aload_0 1: aload_1 2: putfield #2; //Field value:Ljava/lang/Object; 5: return

} That was an example of how generic classes are defined. How are they used? /* File: programs/java/CellTest.java */ class CellTest { public static void main (String[] args) { Cell sc1 = new Cell("First"); String s1 = sc1.get(); /* No cast needed */ System.out.println(s1); } } As usual with generic classes, a cast is not written by the application developer, but it is automatically inserted by the compiler. Compiled from "CellTest.java" class CellTest extends java.lang.Object{ CellTest(); Code: 0: aload_0 1: invokespecial #1; //Method java/lang/Object."":()V 4: return public static void main(java.lang.String[]); UG4 Advances in Programming Languages — 2005/2006 3

Code: 0: new #2; //class Cell 3: dup 4: ldc #3; //String First 6: invokespecial #4; //Method Cell."":(Ljava/lang/Object;)V 9: astore_1 10: aload_1 11: invokevirtual #5; //Method Cell.get:()Ljava/lang/Object; 14: checkcast #6; //class String 17: astore_2 18: getstatic #7; //Field java/lang/System.out:Ljava/io/PrintStream; 21: aload_2 22: invokevirtual #8; //Method java/io/PrintStream.println:(Ljava/lang/String;)V 25: return

} Perhaps the overriding motivation for taking this approach to implementing generics in Java is that the Java Virtual Machine should be unaffected by this approach. From the JVM point of view the bytecode which it executes would look just like it came from a (typesafe) object polymorphic method. Because many manufacturers have implemented Java Virtual Machines it was a sensible design constraint to contain this change to the Java language within the compiler. Unfortunately SUN’s Java 1.5 compiler does not correctly fulfil the intended design because it compiles only for the 1.5 virtual machine, not for earlier ones. This largely undermines the approach of compilation by erasure and means that generics could have been supported in the JVM. A separate tool, Retroweaver, can convert Java 1.5 and higher class files to Java 1.4, 1.3 or 1.2, but depending on a third-party tool to repair the compiled class files is not attractive to most developers. We demonstrate how the problem is experienced by users trying to run modern bytecode on older JVMs.

[scap]stg: export ALT=/etc/alternatives [scap]stg: alias java6c=$ALT/java_sdk_1.6.0/bin/javac [scap]stg: java6c -version javac 1.6.0 [scap]stg: java6c GenericStack.java [scap]stg: alias java4=$ALT/jre_1.4.2/bin/java [beetgreens]stg: java4 -version java version "1.4.2" gij (GNU libgcj) version 4.1.1 20060525 (Red Hat 4.1.1-1)

Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [scap]stg: java4 GenericStack Exception in thread "main" java.lang.ClassFormatError: GenericStack (unrecognized class file version)

An alternative is to ask the Java compiler to generate Java 1.4 compliant bytecode.

[scap]stg: java6c GenericStack.java [scap]stg: file GenericStack.class UG4 Advances in Programming Languages — 2005/2006 4

GenericStack.class: compiled Java class data, version 50.0 [scap]stg: java6c -target jsr14 GenericStack.java [scap]stg: file GenericStack.class GenericStack.class: compiled Java class data, version 48.0 [scap]stg: java4 GenericStack c b a Unfortunately it transpires that this use of the target flag is not recommended by the SUN compiler developers.

I should not mention the javac flag ”-target jsr14”, which causes javac to gen- erate maximally-1.4-compatible code with the new language features enabled. That flag is useful for bootstrapping and testing javac, but you should not depend on it. If it doesn’t work as you want and you report a bug about it, you can expect the flag to be removed.

Neal Gafter, SUN lead Java compiler developer

Some of the complications are that, because of separate compilation, it is necessary for class files to carry generic type variable information. This is added to the metadata in the class file, specifically as a new “Signature” attribute for classes, methods and fields. Generics in C# are supported down into the Common Language Runtime, thus their im- plementation does not have the impact on typing which erasure does and can improve the performance of compiled code by removing casts and unnecessary “boxing” (compilation by erasure does not give any performance improvement).

Polymorphic classes in Objective Caml Generic Java allows developers to parameterise classes by a generic class variable—the param- eter must be a reference type, not one of the built-in ground types (such as int, boolean, float or double). Polymorphic classes in Objective Caml are parameterised by a generic type variable. Classes can be parametric over a reference type or one of the ground types (such as int, bool or float). We now see how we would re-implement the simple generic cells from our Java example in Objective Caml. (* File: programs/caml/Cell.ml *) class [’a] cell (value_init : ’a) = object val mutable value = value_init method get = value method set (new_value) = value <- new_value end ;; This class definition is parameterised by a type variable ’a, and by the parameter to the con- structor of the class. Caml does not support overloading so a class has only a single constructor, unlike Java. The cell class is used in the Ocaml top-level loop thus. # let c = new cell(3);; (* create a new cell *) val c : int cell = UG4 Advances in Programming Languages — 2005/2006 5

# c#get ;; (* invoke the get method *) - : int = 3

# c#set(4) ;; (* invoke the set method *) - : unit = ()

# c#get;; (* invoke the get method again *) - : int = 4

Subclassing and specialisation If we found that we used cells of strings almost exclusively in our application it might be worthwhile to subclass the cell class and specialise it to a cell of strings. This can be done in a very similar fashion in Ocaml and Generic Java. (* File: programs/caml/StringCell.ml *) class stringCell (s : string) = object inherit [string] cell s end ;;

/* File: programs/java/StringCell.java */ class StringCell extends Cell { StringCell(String s) { super(s); } }

Bounded polymorphic methods So far, we have used Java’s generic methods to define methods which are parameterised on a generic class variable. We know that in Java’s object polymorphism we can define methods which are parameterised on a specific class (and all of its subclasses). We can combine these to define methods where the generic class parameter is bounded above by a specific superclass. For example, class C defines a generic class C whose parameter (A) is a subclass of B. As an example we will define a class of sorted lists, whose elements must be comparable. Before we progress to defining such a class we first implement a non-generic version, discover a problem with it, and then progress to defining a generic version. The non-generic version is called OrderedList and the generic version is called SortedList. UG4 Advances in Programming Languages — 2005/2006 6

/* File: programs/java/OrderedList.java */ /* A non-generic version */ public class OrderedList {

/* Fields to hold the head of the list and the tail */ public Comparable head; public OrderedList tail;

/* A constructor for sorted lists */ public OrderedList(Comparable head, OrderedList tail){ this.head = head; this.tail = tail; }

/* Insert a new element, maintaining ordering */ public void insert (Comparable e) { OrderedList l = this; while(l.tail != null && l.head.compareTo(e) < 0) { l = l.tail; }; if (l.head.compareTo(e) >= 0) { l.tail = new OrderedList(l.head, l.tail); l.head = e; } else { l.tail = new OrderedList(e, l.tail); } } } The non-generic version of this class has the problem that ordered lists are inherently hetero- geneous, as OO collections typically are. Thus it is possible to construct a list all of whose elements are comparable, but just not with each other. A list containing an integer, a string, a double and a float would be one such example. The classes Integer, String, Double and Float all implement java.lang.Comparable, but they are not pairwise comparable. /* File: programs/java/OrderedListTest.java */ public class OrderedListTest { public static void main (String[] args) { OrderedList s = new OrderedList(new Integer(3), null); s.insert("a string"); s.insert(new Double(3.0d)); s.insert(new Float(3.1415)); for( ; s != null ; s = s.tail) { System.out.println(s.head); } } } This program compiles without warnings but fails at run-time with a class cast exception. UG4 Advances in Programming Languages — 2005/2006 7

[slim]stg: java OrderedListTest Exception in thread "main" java.lang.ClassCastException at java.lang.Integer.compareTo(Integer.java:955) at OrderedList.insert(OrderedList.java:19) at OrderedListTest.main(OrderedListTest.java:6)

The compareTo method throws this exception when it is applied to an object of an inappropriate type. This problem does not arise with generic classes.

Comparable compared Non-generic Java defines a Comparable interface with only a single method: public abstract int compareTo(Object o) In generic Java this interface is replaced by a generic interface. Thus, if we want to have a class E whose instances are comparable to other instances of class E we have E extends Comparable In our example we want a list of such elements. /* File: programs/java/SortedList.java */ /* A bounded polymorphic sorted list class with generic class variable "E" */ public class SortedList> { /* Fields to hold the head of the list and the tail */ public E head; public SortedList tail;

/* A constructor for sorted lists */ public SortedList(E head, SortedList tail){ this.head = head; this.tail = tail; } /* Insert a new element, maintaining ordering */ public void insert (E e) { SortedList l = this; while(l.tail != null && l.head.compareTo(e) < 0) { l = l.tail; }; if (l.head.compareTo(e) >= 0) { l.tail = new SortedList(l.head, l.tail); l.head = e; } else { l.tail = new SortedList(e, l.tail); } } } This class is used as in the example below. UG4 Advances in Programming Languages — 2005/2006 8

/* File: programs/java/SortedListTest.java */ public class SortedListTest { public static void main (String[] args) { SortedList s = new SortedList(new Integer(3), null); s.insert(new Integer(1)); s.insert(new Integer(5)); s.insert(new Integer(7)); s.insert(new Integer(2)); for( ; s != null ; s = s.tail) { System.out.println(s.head); } // prints 1, 2, 3, 5, 7 } } Java 1.5’s autoboxing feature allows the constructor invocations to be omitted in the source file (they are automatically inserted by the compiler). /* File: programs/java/SortedListTest3.java */ public class SortedListTest3 { public static void main (String[] args) { SortedList s = new SortedList(3, null); s.insert(1); // autoboxing inserts s.insert(5); // the omitted uses s.insert(7); // of the Integer s.insert(2); // constructor. for( ; s != null ; s = s.tail) { System.out.println(s.head); } // prints 1, 2, 3, 5, 7 } }

Sorted lists in Objective Caml The generic sorted list class exercises the expressiveness of an object-oriented language quite thoroughly. Classes must be parameterised by a class variable (or a type variable) but this must support a comparison operator (such as the compareTo method). We now revisit this example in Objective Caml. The first step is to define the abstract (non- instantiable) class for comparable objects. Abstract classes are labelled virtual in Objective Caml. (* File: programs/caml/Comparable.ml *) class virtual comparable = object(self : ’a) method virtual compareTo : ’a -> int end ;; This example introduces some new notation. The first is the labelling of classes and methods as “virtual” (meaning the same as “abstract” in Java). The second is the reference of the object to itself (as self—or any other identifier—whereas we would use the reserved word this in Java). UG4 Advances in Programming Languages — 2005/2006 9

The third is the use of the type variable ’a to force the parameter of compareTo to be the same as the type of the object itself.

Constraints and open types Where Java uses inheritance in requiring E to extend Comparable, Caml takes a different ap- proach in generalising the comparable type to an open type, #comparable and then constraining the type parameter of the class to that. Caml infers the types of expressions, even expressions which use objects and method invo- cation. Consider the following function. let lessThan(a,b) = a#compareTo(b) < 0;; Caml infers the following type for this function. lessThan : < compareTo : ’a -> int; .. > * ’a -> bool Here the double dots (“..”) in the type of the left operand indicate that the type remains open. We know that these objects have a compareTo method but we do not know all of their other methods. This is the definition of sorted lists in Caml. (* File: programs/caml/SortedList.ml *) class [’a] sortedList (initial_value : ’a) = object val mutable elements = [initial_value] constraint ’a = #comparable method insert e = let rec ins e = function [] -> [e] | (head::tail) -> if head#compareTo(e) >= 0 then e::(head::tail) else head::(ins e tail) in elements <- ins e elements method getElements = elements end;; Now that we have sorted lists we define a class whose instances can be stored in these lists. We define a small integer class similar to Integer in Java. Our class has an intValue method to obtain the int value stored in an integer object. Our class implements compareTo as it is implemented in java.lang.Integer. (* File: programs/caml/integer.ml *) class integer (initialValue : int) = object(self) inherit comparable val i = initialValue method intValue = i method compareTo anotherInteger = let anotherVal = anotherInteger#intValue in let selfVal = self#intValue in if (selfVal < anotherVal) then -1 else if (selfVal = anotherVal) then 0 else 1 end;; UG4 Advances in Programming Languages — 2005/2006 10

Finally we can apply this definition to a small collection of abitrarily-chosen integers, as before. (* File : programs/caml/sortedlisttest.ml *) let i = new integer(3);; let s = new sortedList(i);; s#insert(new integer(1));; s#insert(new integer(5));; s#insert(new integer(7));; s#insert(new integer(2));; let l = s#getElements;;

List.iter (fun i -> print_int(i#intValue); print_newline()) l;;

Summary • Polymorphic classes are implemented differently in Java and Objective Caml, but both have the capability of defining bounded polymorphic classes and ensuring in operation.

• The implementation decisions made in the Java 1.5 compiler have been criticised by some.

• We took a sorted generic list class as our example and implemented this in both of these languages.