Advances in Programming Languages: Generics, Interoperability and Implementation
Total Page:16
File Type:pdf, Size:1020Kb
Advances in Programming Languages: Generics, interoperability and implementation Stephen Gilmore The University of Edinburgh February 1, 2007 Understanding generic code Generic Java extends Java with generic type variables which allow us to define generic classes and generic methods. Generic C# extends C# similarly. When adding genericity to an object- oriented programming language it is important to do this in a way which does not invalidate existing code. Legacy code (which does not use the generic features of the language) should continue to operate as before and there should be a smooth upgrade path from legacy code to generic code. In Generic Java (Java 1.5) generics are compiled out by the Java compiler. Type variables are replaced by Object and downcasts are inserted automatically by the compiler instead of being coded manually by the programmer. The implications of this are that generic code is never slower than non-generic code (but there is no reason to think that it will be faster) while being more reliable than non-generic code. The way-ahead-of-time optimising compiler gcj will be extended to deal with Java generics when GCC-4.3 is released. The gcj compiler will include the Eclipse Compiler for Java, ecj as its front-end. This extended gcj could produce faster code by eliminating casts. This example shows use of SUN’s javac. /* File: programs/java/Cell.java */ /* A generic, updateable cell */ class Cell<A> { A value; Cell(A value) { this.value = value; } A get() { return value; } void set (A value) { this.value = value; } } The above example is a simple generic cell which holds values. We now try comparing that generic Java class with its compiled bytecode representation. We can inspect the Java bytecode produced by the compiler by using the Java disassembler javap. We use the command javap -c Cell When compiled to Java bytecode we see that the cell is operating on objects of class java.lang.Object. 1 UG4 Advances in Programming Languages — 2005/2006 2 Compiled from "Cell.java" class Cell extends java.lang.Object{ java.lang.Object value; Cell(java.lang.Object); Code: 0: aload_0 1: invokespecial #1; //Method java/lang/Object."<init>":()V 4: aload_0 5: aload_1 6: putfield #2; //Field value:Ljava/lang/Object; 9: return java.lang.Object get(); Code: 0: aload_0 1: getfield #2; //Field value:Ljava/lang/Object; 4: areturn void set(java.lang.Object); Code: 0: aload_0 1: aload_1 2: putfield #2; //Field value:Ljava/lang/Object; 5: return } That was an example of how generic classes are defined. How are they used? /* File: programs/java/CellTest.java */ class CellTest { public static void main (String[] args) { Cell<String> sc1 = new Cell<String>("First"); String s1 = sc1.get(); /* No cast needed */ System.out.println(s1); } } As usual with generic classes, a cast is not written by the application developer, but it is automatically inserted by the compiler. Compiled from "CellTest.java" class CellTest extends java.lang.Object{ CellTest(); Code: 0: aload_0 1: invokespecial #1; //Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); UG4 Advances in Programming Languages — 2005/2006 3 Code: 0: new #2; //class Cell 3: dup 4: ldc #3; //String First 6: invokespecial #4; //Method Cell."<init>":(Ljava/lang/Object;)V 9: astore_1 10: aload_1 11: invokevirtual #5; //Method Cell.get:()Ljava/lang/Object; 14: checkcast #6; //class String 17: astore_2 18: getstatic #7; //Field java/lang/System.out:Ljava/io/PrintStream; 21: aload_2 22: invokevirtual #8; //Method java/io/PrintStream.println:(Ljava/lang/String;)V 25: return } Perhaps the overriding motivation for taking this approach to implementing generics in Java is that the Java Virtual Machine should be unaffected by this approach. From the JVM point of view the bytecode which it executes would look just like it came from a (typesafe) object polymorphic method. Because many manufacturers have implemented Java Virtual Machines it was a sensible design constraint to contain this change to the Java language within the compiler. Unfortunately SUN’s Java 1.5 compiler does not correctly fulfil the intended design because it compiles only for the 1.5 virtual machine, not for earlier ones. This largely undermines the approach of compilation by erasure and means that generics could have been supported in the JVM. A separate tool, Retroweaver, can convert Java 1.5 and higher class files to Java 1.4, 1.3 or 1.2, but depending on a third-party tool to repair the compiled class files is not attractive to most developers. We demonstrate how the problem is experienced by users trying to run modern bytecode on older JVMs. [scap]stg: export ALT=/etc/alternatives [scap]stg: alias java6c=$ALT/java_sdk_1.6.0/bin/javac [scap]stg: java6c -version javac 1.6.0 [scap]stg: java6c GenericStack.java [scap]stg: alias java4=$ALT/jre_1.4.2/bin/java [beetgreens]stg: java4 -version java version "1.4.2" gij (GNU libgcj) version 4.1.1 20060525 (Red Hat 4.1.1-1) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [scap]stg: java4 GenericStack Exception in thread "main" java.lang.ClassFormatError: GenericStack (unrecognized class file version) An alternative is to ask the Java compiler to generate Java 1.4 compliant bytecode. [scap]stg: java6c GenericStack.java [scap]stg: file GenericStack.class UG4 Advances in Programming Languages — 2005/2006 4 GenericStack.class: compiled Java class data, version 50.0 [scap]stg: java6c -target jsr14 GenericStack.java [scap]stg: file GenericStack.class GenericStack.class: compiled Java class data, version 48.0 [scap]stg: java4 GenericStack c b a Unfortunately it transpires that this use of the target flag is not recommended by the SUN compiler developers. I should not mention the javac flag ”-target jsr14”, which causes javac to gen- erate maximally-1.4-compatible code with the new language features enabled. That flag is useful for bootstrapping and testing javac, but you should not depend on it. If it doesn’t work as you want and you report a bug about it, you can expect the flag to be removed. Neal Gafter, SUN lead Java compiler developer Some of the complications are that, because of separate compilation, it is necessary for class files to carry generic type variable information. This is added to the metadata in the class file, specifically as a new “Signature” attribute for classes, methods and fields. Generics in C# are supported down into the Common Language Runtime, thus their im- plementation does not have the impact on typing which erasure does and can improve the performance of compiled code by removing casts and unnecessary “boxing” (compilation by erasure does not give any performance improvement). Polymorphic classes in Objective Caml Generic Java allows developers to parameterise classes by a generic class variable—the param- eter must be a reference type, not one of the built-in ground types (such as int, boolean, float or double). Polymorphic classes in Objective Caml are parameterised by a generic type variable. Classes can be parametric over a reference type or one of the ground types (such as int, bool or float). We now see how we would re-implement the simple generic cells from our Java example in Objective Caml. (* File: programs/caml/Cell.ml *) class [’a] cell (value_init : ’a) = object val mutable value = value_init method get = value method set (new_value) = value <- new_value end ;; This class definition is parameterised by a type variable ’a, and by the parameter to the con- structor of the class. Caml does not support overloading so a class has only a single constructor, unlike Java. The cell class is used in the Ocaml top-level loop thus. # let c = new cell(3);; (* create a new cell *) val c : int cell = <obj> UG4 Advances in Programming Languages — 2005/2006 5 # c#get ;; (* invoke the get method *) - : int = 3 # c#set(4) ;; (* invoke the set method *) - : unit = () # c#get;; (* invoke the get method again *) - : int = 4 Subclassing and specialisation If we found that we used cells of strings almost exclusively in our application it might be worthwhile to subclass the cell class and specialise it to a cell of strings. This can be done in a very similar fashion in Ocaml and Generic Java. (* File: programs/caml/StringCell.ml *) class stringCell (s : string) = object inherit [string] cell s end ;; /* File: programs/java/StringCell.java */ class StringCell extends Cell<String> { StringCell(String s) { super(s); } } Bounded polymorphic methods So far, we have used Java’s generic methods to define methods which are parameterised on a generic class variable. We know that in Java’s object polymorphism we can define methods which are parameterised on a specific class (and all of its subclasses). We can combine these to define methods where the generic class parameter is bounded above by a specific superclass. For example, class C<A extends B> defines a generic class C whose parameter (A) is a subclass of B. As an example we will define a class of sorted lists, whose elements must be comparable. Before we progress to defining such a class we first implement a non-generic version, discover a problem with it, and then progress to defining a generic version. The non-generic version is called OrderedList and the generic version is called SortedList. UG4 Advances in Programming Languages — 2005/2006 6 /* File: programs/java/OrderedList.java */ /* A non-generic version */ public class OrderedList { /* Fields to hold the head of the list and the tail */ public Comparable head; public OrderedList tail; /* A constructor for sorted lists */ public OrderedList(Comparable head, OrderedList tail){ this.head = head; this.tail = tail; } /* Insert a new element, maintaining ordering */ public void insert (Comparable e) { OrderedList l = this; while(l.tail != null && l.head.compareTo(e) < 0) { l = l.tail; }; if (l.head.compareTo(e) >= 0) { l.tail = new OrderedList(l.head, l.tail); l.head = e; } else { l.tail = new OrderedList(e, l.tail); } } } The non-generic version of this class has the problem that ordered lists are inherently hetero- geneous, as OO collections typically are.