<<

JaCIL: A CLI to JVM Technical Report

Almann T. Goo [email protected] Rochester Institute of Technology Department of Computer Science

July 25, 2006 Abstract

Both the .NET Framework via the Common Language Infrastructure (CLI) [15] and the Java Virtual Machine (JVM) [14] provide a managed, virtual exe- cution environment for running high-level virtual . As these two platforms become ubiquitous, it is of pragmatic and academic interest to have the capability of running CLI code on the JVM and vice-versa. This report describes JaCIL (pronounced “jackal”), a project that implements a CLI to JVM class file byte-code compiler framework. Also pro- vided by JaCIL, is a user-land tool interfacing this framework, and a run- time to support translated code. Contents

1 Introduction 11

1.1 The Java Virtual Machine ...... 11

1.2 The Common Language Infrastructure ...... 12

1.3 Related Work ...... 13

1.3.1 Previous Translation Efforts and Platform Comparisons 13

1.3.2 IKVM.NET Java Virtual Machine ...... 13

1.3.3 Visual MainWin for J2EE ...... 14

1.3.4 Retroweaver ...... 14

1.3.5 The and DotGNU Projects ...... 14

2 JaCIL Software Design 16

2.1 Functional Specification ...... 16

2.1.1 JaCIL Compiler Framework ...... 16

2.1.2 JaCIL Run-time Library ...... 17

2.2 Technical Design and Architecture ...... 17

2.2.1 Compiler Framework Overview ...... 18

2.2.2 The Compiler API ...... 18

2.2.3 The Context API ...... 21

2.2.3.1 Standard Context API Implementation .... 23

1 2.2.4 The Translator API ...... 23

2.2.4.1 Standard Translator API Implementation .. 26

2.2.5 The Utility API ...... 28

2.2.6 The jacilc Application ...... 28

3 CLI Feature Specification 30

3.1 Assemblies ...... 30

3.1.1 Assembly Translation Limitations ...... 31

3.2 Assembly Entry Points ...... 31

3.2.1 Entry Point Translation Limitations ...... 31

3.3 Modules ...... 32

3.4 Types ...... 32

3.4.1 Type Translation Limitations ...... 33

3.4.1.1 Name Mangling ...... 33

3.4.1.2 Array Support ...... 33

3.4.1.3 Generics ...... 34

3.4.1.4 Inner Classes ...... 34

3.4.1.5 Enumerations ...... 34

3.4.1.6 Delegates ...... 34

3.5 Managed Pointers ...... 34

3.5.1 Managed Pointer Translation Limitations ...... 35

3.6 Value Types ...... 35

3.6.1 Value Type Translation Limitations ...... 35

3.7 Type Definitions ...... 35

3.7.1 Value Type Definitions ...... 37

3.7.2 Type Definition Translation Limitations ...... 37

2 3.8 Field Definitions ...... 37

3.8.1 Field Definition Translation Limitations ...... 37

3.9 Method Definitions ...... 39

3.9.1 Method Name Mangling ...... 39

3.9.2 Local Variables ...... 41

3.9.3 Support ...... 42

3.9.4 Constructors ...... 42

3.9.5 Method Definition Translation Limitations ...... 43

3.10 JaCIL Implementation of Standard CLI API ...... 44

3.10.1 System.Object Implementation ...... 44

3.10.2 System.Type Implementation ...... 45

3.10.3 Other Classes ...... 45

3.11 Instruction Set Support ...... 45

3.11.1 Non-operation Instructions ...... 45

3.11.2 Stack Instructions ...... 48

3.11.2.1 Stack Duplication of Value Types ...... 48

3.11.3 Local Variable and Argument Instructions ...... 48

3.11.3.1 Implicit Conversions ...... 49

3.11.3.2 Loading and Storing to Boxed Local Variables 49

3.11.3.3 Value Type Loading ...... 50

3.11.4 Constant Loading Instructions ...... 50

3.11.5 Arithmetic Instructions ...... 50

3.11.5.1 Arithmetic Instruction Translation Limitations 52

3.11.6 Numeric Conversion Instructions ...... 53

3.11.7 Branch and Comparison Instructions ...... 53

3.11.7.1 Unconditional Branches ...... 55

3 3.11.7.2 The switch Instruction ...... 55

3.11.7.3 Comparison and Other Conditional Branches 55

3.11.8 Method Invocation and Return Instructions ...... 57

3.11.8.1 Method Invocation on Value Types ...... 57

3.11.8.2 Method Invocation Translation Limitations . 58

3.11.9 Static and Instance Field Access Instructions ...... 58

3.11.9.1 Field Access on Value Types ...... 59

3.11.10Object Creation and Initialization ...... 59

3.11.11Vector Instructions ...... 60

3.11.11.1Vector Element Access ...... 60

3.11.11.2Vector Creation ...... 62

3.11.11.3Vector Length ...... 62

3.11.12Exception Instructions ...... 62

3.11.12.1Exception Throwing ...... 62

3.11.12.2Leaving Protected Blocks ...... 62

3.11.12.3Leaving Finally Blocks ...... 63

3.11.13Indirection Instructions ...... 63

3.11.14Address Load Instructions ...... 64

3.11.14.1Local Variable Address Load ...... 64

3.11.14.2Static and Instance Field Address Load .... 65

3.11.14.3Array Element Address Load ...... 65

3.11.15Value Type and Reference Type Instructions ...... 65

3.11.15.1Boxing and Unboxing ...... 65

3.11.15.2The initobj instruction ...... 66

3.11.15.3Reference Type Checking and Conversion ... 66

3.12 Translation Examples ...... 67

4 3.12.1 Basic Operations ...... 67

3.12.2 Exceptions ...... 67

3.12.3 Value Types and Managed Pointers ...... 68

4 JaCIL User Guide 77

4.1 Building JaCIL ...... 77

4.2 Using the Compiler Tool ...... 78

4.3 Running Compiled Code ...... 80

5 Performance and Benchmarking 82

5.1 Blowfish Cipher ...... 83

5.2 MD5 Message Digest ...... 84

5.3 Line Drawing Algorithm with Value Types ...... 84

5.4 Bresenham’s Line Drawing Algorithm with Managed Pointers 85

6 Conclusions 86

6.1 Lessons Learned ...... 86

6.2 Future Work ...... 87

6.2.1 Missing Compiler Features ...... 87

6.2.2 CLI Standard API ...... 87

6.2.3 JaCIL on the JVM ...... 87

6.2.4 CLI Implementation ...... 87

5 List of Figures

2.1 Compiler API class relationships ...... 20

2.2 Context API interface hierarchy ...... 22

2.3 Cross-section of the Context API interface relationships .... 23

2.4 Standard Context API control flow ...... 24

2.5 Cross-section of Translator API interface relationships .... 25

2.6 IInstructionTranslator creation flow in standard factory implementation ...... 27

2.7 jacilc control flow ...... 29

3.1 Value type translation memory diagram for local variables .. 36

3.2 Sample # method definition with basic operations ...... 69

3.3 CIL code compiled from source in figure 3.2 ...... 70

3.4 Java byte-code generated by JaCIL from the code in figure 3.3 71

3.5 Sample C# method definition with a catch handler ...... 72

3.6 CIL code compiled from source in figure 3.5 ...... 73

3.7 Java byte-code generated by JaCIL from the code in figure 3.6 74

3.8 Sample C# method definition with value type operations ... 75

3.9 CIL code compiled from source in figure 3.8 ...... 75

3.10 Java byte-code generated by JaCIL from the code in figure 3.9 76

6 4.1 jacilc.exe output with -? switch ...... 79

4.2 Using jacilc.exe to compile an assembly to a JAR file ... 80

4.3 Running translated code ...... 80

4.4 Sample C# “Hello World” type program ...... 81

7 List of Tables

2.1 JaCIL Compiler Framework namespaces ...... 18

2.2 ICompiler primary methods ...... 19

2.3 ITypeTranslator translation methods ...... 26

2.4 JaCIL Compiler utility primary classes ...... 28

3.1 JaCIL built-in type mapping ...... 32

3.2 JaCIL managed pointer type mapping ...... 33

3.3 Supported CLI type attributes and their JVM translations .. 36

3.4 Supported CLI field attributes and their JVM translations .. 38

3.5 Supported CLI method attributes and their JVM translations 40

3.6 Supported CLI implementation attributes and their JVM trans- lations ...... 41

3.7 java.lang.Object method overrides in System.Object .. 44

3.8 JaCIL supported instructions add through ldelem.u1 .... 46

3.9 JaCIL supported instructions ldelem.u2 through xor .... 47

3.10 CIL Local variable and argument load instructions ...... 48

3.11 JVM Local variable load instructions ...... 49

3.12 CIL Local variable and argument store instructions ...... 49

3.13 JVM Local variable store instructions ...... 49

3.14 CIL field access instruction translation semantics ...... 50

8 3.15 JVM numeric instruction prefixes ...... 51

3.16 JaCIL translations for CIL arithmetic operations on integers and floating point values ...... 52

3.17 CIL Arithmetic Instructions for integer types only ...... 52

3.18 CIL conversion instruction target types ...... 53

3.19 CIL conversion instruction target JVM stack types ...... 54

3.20 Additional operations for small integer conversion ...... 54

3.21 Translation semantics for brtrue and brfalse ...... 55

3.22 Translation semantics for CIL comparison branch instructions 56

3.23 CIL supported comparison instructions and their branch coun- terparts ...... 57

3.24 CIL ret translation semantics ...... 58

3.25 CIL field access instruction translation semantics ...... 58

3.26 Supported CIL vector load instruction variants ...... 60

3.27 JVM array element access instructions ...... 61

3.28 Supported CIL vector store instruction variants ...... 61

3.29 JaCIL base managed pointer types in the JaCIL.Runtime.- Pointers package ...... 63

3.30 Supported CIL indirect load instruction variants ...... 64

3.31 Supported CIL indirect store instruction variants ...... 64

4.1 JaCIL NAnt build targets ...... 78

4.2 jacilc.exe command-line options ...... 79

5.1 Benchmarking systems used in JaCIL benchmarking ..... 83

5.2 Blowfish benchmarking results ...... 83

5.3 MD5 benchmarking results ...... 84

5.4 Value type line drawing benchmarking results ...... 84

9 5.5 Bresenham’s line drawing benchmarking results ...... 85

10 Chapter 1

Introduction

The concept of a Virtual Machine (VM) is far from a new concept, but un- doubtedly with the advent of Sun’s Java Platform it has become far more popular as application code can be compiled to a platform independent im- age and executed on any machine that has a Java Virtual Machine (JVM) implementation [11]. has since introduced its own competing .NET Framework platform that has similar aims.

With the influence that a major company like Microsoft has, it would seem destined that its platform shall also rise in popularity, creating two sets of applications and libraries that exist on one platform or the other. Even though both platforms have strikingly similar features [11], they are not binary compatible with one another. Thus, “platform independent” code from one execution environment is unavailable to the other.

Since each platform itself can be viewed as another machine, albeit much more high-level than a physical machine, it would seem reasonable then, that an implementation of either platform could be created on the other. This has in fact been done in the IKVM.NET [10] project, which implements the JVM on the CLI–this is discussed in section 1.3.2. With that said, the author knows of no such open implementation for other direction–a full CLI implementation on the JVM.

1.1 The Java Virtual Machine

As many have stated, the JVM was developed as an execution environ- ment for the Java Programming Language [11, 18, 12]. The JVM is an abstract stack-based machine with some of its major characteristics noted

11 below: [14]

• Direct operations for primitive integers and floating point values. • A notion of a reference which is an address, but no primitives to ma- nipulate the value of such an address (e.g. taking the address of a local variable or incrementing such an address). • A run-time heap, shared by all program threads, where all objects are allocated; complex types are never allocated in a stack frame. • A private per-thread stack to store frames of method executions, con- taining an activation’s operand stack and local variables.

JVM class files, the binary image of a loadable component (class), contain various that define the class, its fields, and its methods. This structure also defines symbolic information accessed by byte-code instruc- tions in method definitions [14].

1.2 The Common Language Infrastructure

The CLI, from a high level, has much in common with the description given of the JVM in section 1.1. It too, has a stack-based architecture, but has features that make it more “friendly” for various programming language paradigms to target [13]. Given its high-level similarities, some of the com- pelling differences are highlighted below: [8, 15, 11]

• The CLI instruction set tends to be more generalized for operations such as arithmetic–most equivalent JVM instructions have the type information embedded in them. • The CLI supports user defined value types that can be allocated di- rectly on the stack or directly within the memory of another object. • The concept of a managed pointer, which is like a JVM reference address, but with additional instruction support to manipulate such pointers. More specifically, the CLI provides instructions that take an address of a local variable, argument, field, or method, and instruc- tions to perform indirection on such a pointer. • Support for tail calls. The CLI provides support for invoking a method and discarding the stack frame of the caller. This is essential for lan- guages that depend on tail recursion as a looping mechanism (e.g. Scheme).

12 CLI assemblies are the binary image of a loadable component in the CLI. Unlike their Java equivalent, they can consist of numerous types (classes) and contain global metadata as well as metadata for each class [8]. In fact it is not difficult to draw the parallel between Java Archive (JAR) files and CLI assemblies [15]. CLI assemblies, much like their Java analog contain similar types of metadata.

1.3 Related Work

The following subsections describe related work and relevance to the JaCIL project.

1.3.1 Previous Translation Efforts and Platform Com- parisons

Perhaps the most prominent work in translating from the CLI to the JVM comes from Shiel and Bayley who have actually implemented a rudimen- tary translator from the CLI to the JVM [17]. The intent of their work was to discuss how the semantics of the CLI compare with the JVM and to use this study as a basis for formalizing CLI semantics [17].

These surveys provided some good insight as to various approaches that could be taken when translating CLI semantics. Shiel and Bayley’s high- level descriptions of translating certain CLI semantics were considered when implementing such semantics in JaCIL.

1.3.2 IKVM.NET Java Virtual Machine

IKVM.NET is an open source implementation of the JVM on the CLI. It provides a VM implementation consisting of a JVM to CLI JIT compiler, Java Standard API (via GNU Classpath), and support tools like an AOT compiler and an analog of the java command [10].

IKVM.NET, which is essentially the reverse implementation of the JaCIL project, served a very important role for this project. Besides serving as inspiration for the project itself, leveraging the IKVM.NET run-time in the JaCIL Compiler allowed JaCIL access to practically any Java API in addi- tion to the APIs already available in the CLI. In particular, the ObjectWeb ASM Framework (ASM) [6] is used via the IKVM.NET to generate JVM class files.

13 1.3.3 Visual MainWin for J2EE

Visual MainWin for J2EE is a commercial product that consists of an AOT byte-code compiler from CLI assemblies to JVM class files, a run-time layer, and an implementation of the .NET Framework Standard API (via the Mono Project) [7]. It is in essence, a commercial offering of what JaCIL is aiming to provide.

Unfortunately, not much can be leveraged from this product because it is a closed, proprietary system. Also, the target audience of this product seems to be Java 2 Enterprise Edition (J2EE) integration from Visual Studio [7], so it is not clear to the author how general purpose the product aims to be.

It should be noted that Mainsoft, the producer of this software, does make open source contributions to the Mono Project (see 1.3.5), specifically to Mono’s class libraries [7]. Such contributions could be leveraged in a future effort to use JaCIL to translate or port such APIs.

1.3.4 Retroweaver

This project provides a byte-code converter for Java 1.5 byte-code to Java 1.4 compatible byte-code. This project also provides a thin compatibility run-time emulating semantics not natively present on legacy JVM imple- mentations [1]. Retroweaver is similar to the JaCIL project in that its goals parallel this project–the key difference, of course, is that it is a JVM-to-JVM translation system.

1.3.5 The Mono and DotGNU Projects

Both the Mono [5] and the DotGNU [3] projects are open source projects providing an implementation of the .NET Framework including the CLI and the .NET Framework Standard API.

These projects point out implementations of the CLI other than that of Mi- crosoft and could be leveraged in future directions of this project. Specif- ically, having a working CLI to JVM byte-code compiler would enable the CLI byte-code portions of the libraries to be converted to JVM byte-code thus eliminating the need to directly implement a considerable portion of the CLI standard API in Java. This would leave only the “native” portions of the CLI standard API to port to Java–the end result being nearly a com- plete CLI standard API implementation in Java.

This is in fact the strategy the IKVM.NET JVM implementation employs

14 by using a statically JVM to CLI byte-code compiled version of the GNU Classpath Java Standard API implementation [10, 4].

Furthermore, the author used Mono 1.1.13 as the principal development environment for implementing the compiler and other code targeting the CLI for the project.

15 Chapter 2

JaCIL Software Design

This chapter describes the architecture and design of the JaCIL project.

2.1 Functional Specification

The JaCIL Project consists of a two major components: the JaCIL Compiler Framework and the JaCIL Run-time Library.

2.1.1 JaCIL Compiler Framework

The compiler framework consists of the byte-code compiler to translate CLI compiled assemblies into equivalent JVM class files. This framework trans- lates types (e.g. classes), fields, methods, and appropriate metadata de- fined in a CLI assembly into the equivalent JVM analog. This system also translates program code written in Common Intermediate Language (CIL) instructions–the byte-code language of the CLI–that comprises the code within method definitions.

The compiler framework provides a public CLI API for working with, ex- tending, and embedding the compiler. Public interfaces are provided by the framework’s API to allow a developer to embed the compiler. The frame- work also provides the concept of compiler profiles which allow a developer to extend or change the behavior of the compiler (this is discussed in detail in section 2.2)

A front-end application, jacilc, using the standard implementation and

16 profile of the framework is provided to be used as a compiler tool much like Java’s javac. This tool provides a command-line interface allowing usage of the framework by an end-user.

The JaCIL Compiler Framework does not leverage new features introduced in Microsoft’s .NET Framework 2.0. As such, it is known to compile and run on Mono 1.1.13 or later, Microsoft’s .NET Framework 1.1, and Microsoft’s .NET Framework 2.0.

Code translated by the compiler framework is designed to run on the Java 1.5 virtual machine–lower versions are not supported and are known not to be able to execute JaCIL translated code.

Chapter 3 provides the specification of how CLI semantics are translated by the standard implementation (i.e. standard profiles) of the framework. Furthermore, the binary distribution of JaCIL contains the complete frame- work API specification.

2.1.2 JaCIL Run-time Library

The run-time library is a Java “hosting” library that is used to provide the support routines needed to run translated classes on the JVM. This library is designed to target the Java 1.5 virtual machine.

The run-time library provides two major sets of support classes. The first is a subset of the classes found in the CLI standard API’s System namespace. The functional scope of this subset is limited to an implementation suitable for basic translation (e.g. a System.Object implementation is provided because all CLI types are derived from it). The second is a JaCIL only namespace, JaCIL.Runtime, which provides support classes to emulate semantics that do not have a direct JVM equivalent.

Chapter 3 specifies to what extent the standard CLI API is supported and the classes defined to emulate CLI features in the run-time library. Also, the binary distribution of JaCIL contains the complete run-time library API specification.

2.2 Technical Design and Architecture

The following subsections describe in detail the design of the JaCIL Com- piler Framework. Since the JaCIL Run-time Library is inextricably linked to the translation semantics, this system is discussed in further detail in chapter 3.

17 2.2.1 Compiler Framework Overview

The compiler framework is a single CLI assembly. The namespaces that are used to logically separate sub-components of the framework are listed in table 2.1.

Namespace Description JaCIL.Compiler Top-level API and system exceptions JaCIL.Compiler.Instruction Instruction and analysis API JaCIL.Compiler.Context The compiler context API interfaces JaCIL.Compiler.Translator The compiler translator API interfaces JaCIL.Compiler.Standard Standard implementation of the .Context context API JaCIL.Compiler.Standard Standard implementation of the .Translator translator API JaCIL.Compiler.Standard Instruction-level implementation of the .Translator.Instruction translator API JaCIL.Compiler.Util Compiler utility API

Table 2.1: JaCIL Compiler Framework namespaces

For reading the binary format of CLI assemblies, the compiler framework depends on the Mono Cecil Project [9]. To emit the binary format of the JVM, the Java based ObjectWeb ASM Framework [6] is used in conjunction with IKVM.NET [10].

2.2.2 The Compiler API

The JaCIL.Compiler namespace provides high-level interfaces to the JaCIL Compiler. The root compiler interface, ICompiler, defines two public meth- ods to invoke the compilation process (listed in table 2.2). This API also defines all exception classes used throughout the framework, as well as a compiler specific data structure and collection interface. The data struc- ture, ClassInfo, provides a data structure consisting of an ObjectWeb ASM ClassWriter and a string name for a generated class–this is struc- ture is provided because the ClassWriter class does not store the class name. The collection interface, IClassInfoSet, represents an aggrega- tion of ClassInfo objects and has a standard implementation (i.e. ClassInfoSet).

Also provided in this API is the ICompilerReferences interface. This interface defines functionality to look up information about CLI types that may or may not be within direct context of the compiler instance (e.g. types being used by an assembly, but not being compiled by a compiler instance).

18 Method Description CompileToClassWriter() Invokes the compiler on the named assemblies and returns the re- sulting compiled classes into IClassInfoSet data structure. CompileToJar() Invokes the compiler on the named assemblies and generates a JAR file directly.

Table 2.2: ICompiler primary methods

This interface is intended to provide additional information during the com- pilation process that is not otherwise available from a CLI assembly being translated (e.g. querying if a referenced type is an interface or a class). This functionality is similar to passing references to a C# compiler or using the class path when invoking the Java compiler. The standard implementation, CompilerReferences, allows a user to specify both assembly objects from the standard CLI reflection API and assembly definition objects from the Mono Cecil Framework. This implementation also allows for automatically using core, system references: the mscorlib and IKVM.GNU.Classpath (e.g. the Java standard API provided by IKVM.NET) assemblies.

The ICompiler interface serves as the top-level context for an entire com- piler session. It provides, to lower parts of the framework, the ICompiler- References that are to be used to during the compilation process, a mes- saging object for user output, the JVM class file version to generate, and the context profile (section 2.2.3) and translator profile (section 2.2.4) that drive the behavior of the compiler. An ICompiler implementation loads the CLI assemblies to be compiled and creates and delegates to contexts representing them (see section 2.2.3). The standard compiler implementa- tion, Compiler, uses the standard context and translator profiles, but can be overridden with user-defined instances of such profiles.

Figure 2.1 depicts the class relationship in this API.

19 1 org.objectweb.asm.ClassWriter ClassInfo 0..*

<> IClassInfoSet

ClassInfoSet <> ICompiler

<> 1 JaCIL.Context.IContextProfile Compiler

<> 1 JaCIL.Translator.ITranslatorProfile

<> 1 ICompilerReferences 0..* Mono.Cecil.AssemblyDefinition

0..* System.Reflection.Assembly CompilerReferences

Figure 2.1: Compiler API class relationships

20 2.2.3 The Context API

The JaCIL.Compiler.Context namespace provides the interfaces that abstract the logical structure of a CLI assembly. The main purpose of these interfaces is to provide a JaCIL specific abstraction over the Mono Cecil representation of an assembly. Implementations of the Context API also provide analysis and information for the compilation process, and delegate to JaCIL.Compiler.Translator interface implementations as needed to execute the actual translation.

There is a context for each logical aspect of a CLI assembly supported by the compiler. For example, a type definition in an assembly is represented by the ITypeContext interface. For each context type, there is a factory in- terface that defines the protocol for creating instances–for ITypeContext, this is ITypeContextFactory. Since there are numerous contexts that can be produced and a factory exists for each, a top-level context profile defines how to retrieve factories–this is the IContextProfile interface. This multi-tiered design allows for maximal flexibility and point customiza- tion. For instance, a developer could define a custom IContextProfile implementation that defined a completely different scheme for producing factories and instances or simply extend the standard ContextProfile and overload a single property to provide a new factory for generating ITypeContext instances. Figures 2.2 and 2.3 illustrate an example cross section of the interface relationships for the Context API.

Each context interface defines a TraverseXXX() method, where XXX is dependent on each context interface. These methods are invoked in an im- plementation specific manner during the compilation process. It is during traversal, that a context implementation creates a translator object for its context and delegates to it to generate the actual JVM translation for that context (see section 2.2.4 for details on the translator API).

One part of this API that does not follow the 1-to-1 relationship of CLI assembly construct to context object is the IInstructionAnalyzer inter- face. This interface is responsible for doing most of the analysis on the CLI instruction stream. The JaCIL.Compiler.Instruction namespace pro- vides instruction level primitives to support this analysis. This namespace includes classes to support operand stack analysis, branch source/target analysis, and a higher level abstraction of the CLI instruction stream. An IInstructionAnalyzer implementation specifies one method, Generate- InstructionStream(), that is required to perform such analysis.

21 <> IInstructionAnalyzer

<> <> IAssemblyContext IContext

<> IModuleContext

<> <> IInstructionContext ITypeContext

<> IMemberContext

<> <> IFieldContext IMethodContext

Figure 2.2: Context API interface hierarchy

22 <> IContextProfile

1 1 <> <> ITypeContextFactory IMethodContextFactory

Creates 0..* Creates 0..* <> <> ITypeContext IMethodContext

Figure 2.3: Cross-section of the Context API interface relationships

2.2.3.1 Standard Context API Implementation

JaCIL’s standard implementation of the Context API is defined in the JaCIL.- Compiler.Standard.Context namespace. This implementation provides the standard compilation flow for the compiler. This flow is depicted in fig- ure 2.4.

Because instruction translation from the CLI to the JVM is not a direct 1- to-1 process, this implementation also provides special implementations of IInstructionContext that are used to represent synthetic instructions that are particular to JVM translation. An example of this is the NewDup- InstructionContext, which is used for translating object creation op- codes in the CLI.

2.2.4 The Translator API

This API is represented by the JaCIL.Compiler.Translator namespace. Interfaces defined in this namespace represent the components of the trans- lator that generate JVM byte-code with ObjectWeb ASM Framework classes.

The structure of this API parallels the Context API very closely. Where

23 Create and Delegate to Instruction Translator Traverse Assembly Context

All Modules Translated Traverse to Create and Delegate to Instruction Context Assembly Translator in Instruction Stream

Untranslated Modules Remain

Untranslated Instructions Remain All Types Translated Create and Traverse to Module Context

Create and Delegate to Method Translator

Instructions Translated Create and Delegate to Create and Module Translator Delegate to Instruction Analyzer Untranslated Types Remain All Methods Translated

Create and Traverse to Create and Traverse to Type Context Method Context

Untranslated Methods Remain

All Fields Translated

Create and Delegate to Create and Delegate to Field Translator Type Translator

Untranslated Fields Remain

Create and Traverse to Field Context

Figure 2.4: Standard Context API control flow

24 the Context API provides the analysis and abstraction of the CLI assem- bly to be translated, the Translator API defines how each context is to be converted to JVM semantics.

For each Context API interface, IXXXContext (e.g. IMethodContext), there is an equivalent IXXXTranslator (e.g. IMethodTranslator) inter- face. This follows similarly for factory interfaces and the overall translator profile, ITranslatorProfile. Very similar to the Context API, figure 2.5 represents an example cross-section of this relationship.

<> ITranslatorProfile

1 1 <> <> IAssemblyTranslatorFactory IModuleTranslatorFactory

Creates 0..* Creates 0..* <> <> IAssemblyTranslator IModuleTranslator

Figure 2.5: Cross-section of Translator API interface relationships

Each translator interface provides a set of TranslateYYY() methods spe- cific to each translator. These methods are invoked in an implementa- tion specific manner during the compilation process from the translator’s context. As an example, table 2.3 shows the translation methods for the ITypeTranslator interface.

There is one interface that has no Context API analog, IEmitter. This interface represents general byte-code generation routines, and is intended to be context interface independent. There are numerous unrelated trans- lation points within the standard implementation that generate similar streams of JVM instructions. Instead of duplicating the code to emit these streams at each point, the standard implementation defines implementa- tions of IEmitter to centralize the related generation code.

25 Method Description TranslateExceptionTable() Generates the JVM exception table for the method being translated. TranslateInstructionStream() Rearranges the instruction stream for any JVM specific reasons. TranslateMethodDefinition() Defines the JVM method and re- turns the appropriate ClassInfo object for instructions to be emitted to. TranslateMethodSupportClasses() Emits any JVM classes that are needed to support the translation of the method.

Table 2.3: ITypeTranslator translation methods

2.2.4.1 Standard Translator API Implementation

The standard implementation of the Translator API is contained within two namespaces: JaCIL.Compiler.Standard.Translator and JaCIL.Compiler.- Standard.Translator.Instruction. The individual IInstruction- Translator and IEmitter implementations are contained within the lat- ter, while the rest of the standard implementation is contained within the former.

The IInstructionTranslatorFactory implementation, Instruction- TranslatorFactory, is probably of most interest within the standard implementation. This class is responsible for creating IInstruction- Translator objects based on the instruction represented by the IInstruction- Context being translated. Within this implementation, a switch is exe- cuted over the op-code of the instruction to generate a translator instance for. Instead of constructing the standard IInstructionTranslator from the switch target, a virtual, protected method that proxies the object cre- ation is invoked. The rationale for this approach is that a custom implemen- tation that wants to leverage most of the standard translation semantics need only override these proxy methods to instantiate custom implementa- tions of IInstructionTranslator. Figure 2.6 illustrates this.

26 Create Appropriate Instruction Translator

Invoke Appropriate Factory "Create" Method Create Instruction Translator for Instruction Translator for Instruction Context

Supported Instruction

Throw Translation Switch on Op-code Exception

Unsupported Instruction

Since the "gray" action is a virtual method, a custom implementation can simply override it, this way the switch block does not need to be re-implemented.

Figure 2.6: IInstructionTranslator creation flow in standard factory implementation

27 2.2.5 The Utility API

The JaCIL.Compiler.Util namespace contains classes that provide com- mon operations that are used by other parts of the framework and are not necessarily tied to the standard implementation. Table 2.4 describes briefly the major classes in this namespace.

Utility Class Description Mangler Provides the standard name man- gling from CLI names to JVM names (e.g. the mapping between the CLI constructor name, .ctor, and the JVM equivalent, ). TypeMapper Provides an abstraction on top of Mangler for mapping various CLI type information to JVM equiva- lents (e.g. mapping the CLI type name System.Int32 as JVM de- scriptor I). TypeMetrics Encapsulates various information that comprises a CLI type for use in translating into equivalent seman- tics on the JVM. Used by various parts of the compiler to preserve the semantics of the CLI primi- tives, some of which have no direct JVM equivalent.

Table 2.4: JaCIL Compiler utility primary classes

2.2.6 The jacilc Application

The front-end application for the compiler framework is a CLI assembly with a simple entry point. The command line arguments are parsed with the Mono.GetOptions library and based on user provided arguments, an instance of JaCIL.Compiler.Compiler is constructed, and CompileToJar() is invoked on the instance to run the translation process. Figure 2.7 shows a high-level control flow for the compiler tool.

Classes comprising the jacilc.exe CLI assembly are defined in the names- pace JaCIL.Compiler.Tool.

28 Parse Options

Well Formed Options Invalid Options Create Compiler Display References Error Message

Create Standard Compiler Instance

Invoke Compiler to Generate JAR

Figure 2.7: jacilc control flow

29 Chapter 3

CLI Feature Specification

The following sections describe the translation semantics of the JaCIL Com- piler. This chapter defines the extent of the CLI that can be translated, and the limitations (i.e. caveats) of such translations.

The following sections follow closely in structural order to topics listed in Partitions II and III of ECMA-335, Common Language Infrastructure (CLI) Specification [8].

3.1 Assemblies

A single deployable unit in the CLI is an assembly. The JaCIL project sup- ports this structure only as far as an assembly is a collection of modules (see section 3.3). With the exception of a user defined program entry point (see section 3.2), all metadata defined at the assembly level is not translated by the compiler, and is silently ignored.

The CLI also allows the notion top-level global fields and methods within an assembly. That is, fields and methods that do not have an enclosing class [8]. Although to the this feature gives the impression of global fields and methods, in reality these are members of a class named that is defined in a module contained within the assembly.

30 3.1.1 Assembly Translation Limitations

JaCIL does not make any special provisions for the special type, , and it is translated verbatim as the JVM type . Technically, the class generated is legal in Java 1.5 and can in fact be used, however the limitation of this is that classes that use the class will invariably run into a name clash if more than one translated assembly is used together under the same class loader.

A potential solution to this limitation is the mangling the name of to be unique per assembly. This format could be similar to how entry points are translated (see section 3.2), and be in the form CLI.AssemblyName.- Module. For this to work properly, the method mapping semantics in the compiler would also have to map references to to this mangled name.

3.2 Assembly Entry Points

The CLI defines a program entry point at the assembly level. Unlike the JVM, where any class can be run as an entry point provided it defines a “main” method of the proper form, the CLI uses metadata in the assembly to define the entry point for that assembly. As a result, a CLI assembly may have zero or one entry points. A CLI entry point method must be static, return nothing or a 32-bit integer, and have either no parameters or accept a single string array as its parameter. There is no restrictions in the CLI as to accessibility of the method or the name of the method.

When translating this feature, JaCIL synthesizes a class named CLI.- AssemblyName.Main containing a JVM entry point of the proper form.

3.2.1 Entry Point Translation Limitations

Currently, the system only supports translating CLI entry points that are publicly accessible outside of the class it is defined in. JaCIL will not emit an error or warning when translating entry points for classes that do not meet this criteria, but the resulting code will not run. A possible solution to this issue would be to always translate a method marked as the entry point public so it is accessible from the synthesized JVM entry point.

Also, for entry points that return an integer, this is currently discarded by the JVM translated entry point.

31 3.3 Modules

Each CLI assembly is comprised of one or more modules. A CLI module is considered to be a compilation unit comprising a collection of types (i.e. classes). JaCIL supports only this aspect of modules explicitly. As with assemblies, all module-level metadata is not directly translated by JaCIL.

3.4 Types

Table 3.1 defines the built-in CLI types that are translatable with JaCIL. Within this table, the JVM type indicates the operative storage type used to store values of the CLI type (e.g. in JVM method and field descriptors).

CLI Type JVM Type Description System.Boolean boolean The boolean type. System.Char char A 16-bit Unicode character. System.Single float A 32-bit floating point number. System.Double double A 64-bit floating point number. System.Byte byte An unsigned 8-bit integer. System.SByte byte A signed 8-bit integer. System.UInt16 short An unsigned 16-bit integer. System.Int16 short A signed 16-bit integer. System.UInt32 int An unsigned 32-bit integer. System.Int32 int A signed 32-bit integer. System.UInt64 long An unsigned 64-bit integer. System.Int64 long A signed 64-bit integer. System.Object java.lang.Object An object type. System.String java.lang.String A string type. System.ValueType System.ValueType A valuetype type. Type[] JVM Type[] A vector type. Type & See table 3.2 A managed pointer type.

Table 3.1: JaCIL built-in type mapping

The storage type mappings for managed pointers are defined in table 3.2. The JVM mapped classes for managed pointers are defined as abstract classes in the JaCIL Run-time Library.

All type names are converted to the equivalent native JVM representation. A type name will have any “.” characters converted to “/” characters.

Section 3.7 discusses in detail the translation semantics for type definitions.

32 CLI Managed Pointer Type JVM Type System.Boolean & JaCIL.Runtime.Pointers.BoolPtr System.Char & JaCIL.Runtime.Pointers.CharPtr System.Single & JaCIL.Runtime.Pointers.Float32Ptr System.Double & JaCIL.Runtime.Pointers.Float64Ptr System.Byte & JaCIL.Runtime.Pointers.UInt8Ptr System.SByte & JaCIL.Runtime.Pointers.Int8Ptr System.UInt16 & JaCIL.Runtime.Pointers.UInt16Ptr System.Int16 & JaCIL.Runtime.Pointers.Int16Ptr System.UInt32 & JaCIL.Runtime.Pointers.UInt32Ptr System.Int32 & JaCIL.Runtime.Pointers.Int32Ptr System.UInt64 & JaCIL.Runtime.Pointers.UInt64Ptr Other Type & JaCIL.Runtime.Pointers.ObjectPtr

Table 3.2: JaCIL managed pointer type mapping

3.4.1 Type Translation Limitations

The following subsections describe current limitations with regard to the general translation of types.

3.4.1.1 Name Mangling

Types with names that are legal in the CLI, but are not legal in the JVM are translated verbatim and are thus become invalid class files when compiled with JaCIL. This can be mitigated in the future by defining better name mangling semantics.

3.4.1.2 Array Support

JaCIL only supports zero-based index, single dimensional arrays–called vectors in the CLI. Implicitly, this means that jagged arrays (vectors of vec- tors) are supported. CLI multi-dimensional arrays or arrays with a non- zero lower bound are not supported. The JaCIL Compiler will not emit an error when translating such constructs, since there are no special instruc- tions for multi-dimensional arrays, but the class references emitted for such arrays do not exist in the run-time.

A possible solution for this could be to utilize a custom class loader on the run-time that generates the multi-dimensional arrays as needed by trans- lated code. This approach has the benefit of requiring no modification to

33 the compiler, but adds complexity to the run-time requirements of trans- lated code.

3.4.1.3 Generics

JaCIL does not support CLI generics. One potential approach, though po- tentially a problematic one, is to support generics in the same way that the JVM supports it–by type erasure.

3.4.1.4 Inner Classes

JaCIL does not support inner class definitions. Since the JVM supports this semantic, future support for this should be a straightforward task.

3.4.1.5 Enumerations

JaCIL does not support user-defined enumerations. Since current versions of the JVM supports this semantic, future support for this should be a straightforward task.

3.4.1.6 Delegates

JaCIL does not support user-defined delegates. From a CLI point of view, a delegate is a class definition that encapsulates a method pointer and an optional object instance. This object instance is used to bind the this pa- rameter when used on instance methods. In order to support delegates in the future, JaCIL will have to define a translation semantic for the ldftn and ldvirtftn instructions. In addition to supporting these instructions, JaCIL will have to synthesize the body of a delegate class’s invoke method, as such code is “provided by the run-time” [8].

3.5 Managed Pointers

JaCIL supports CLI managed pointers, which can point to local variables, arguments, fields, and array elements. Managed pointers are also distin- guished from unmanaged pointers in the CLI in that the operations allowed on them are severely restricted–arithmetic, for instance, is only allowed on managed pointers to array elements [8].

34 The extent that JaCIL supports managed pointers are limited to their use in method definitions (see section 3.9), address instructions (see section 3.11.14), and indirection instructions (see section 3.11.13).

3.5.1 Managed Pointer Translation Limitations

JaCIL does not support the limited arithmetic support for managed point- ers that has been previously mentioned. This support could be potentially added as an API change to the pointer classes defined in the JaCIL Run- time Library package JaCIL.Runtime.Pointers.

3.6 Value Types

JaCIL supports CLI value types, which are types whose memory is allocated directly (e.g. on the operand stack, local variable register, or user-defined type field).

JaCIL can translate value type definitions (see section 3.7.1), usage of value types in method definitions (see section 3.9), field definitions that are value types (see section 3.8), and value type semantics for instructions that oper- ate on them (see section 3.11).

3.6.1 Value Type Translation Limitations

Due to no native equivalent of value types in the JVM, JaCIL translates value types that are not primitive types as heap objects. This adds consid- erable overhead when using value types, as translated code clones the heap object when a copy is needed. Figure 3.1 illustrates the memory model for value types with local variables.

3.7 Type Definitions

JaCIL supports the translation of user-defined classes (i.e. non-interface reference types), interfaces, and value types. From the CLI perspective, all user-defined types are classes, it is manner in which they are declared in the CLI assembly that define whether they are a normal class, interface, or value type. Since all defined types in the JVM are also classes, JaCIL translates a class definition in the CLI as a JVM class definition.

35 Value Types as Local Variables in CLI Value Types as Local Variables in JVM Translation Local Variables Object Heap

Local 0 Local 1 Heap Object Overhead Heap Object Overhead

Value Type Value Type Value Type Value Type

Local Variables

Local 0 Local 1

Reference Reference

Figure 3.1: Value type translation memory diagram for local variables

Table 3.3 lists the CLI type attributes that are supported, and the JVM class attributes that they are translated to by JaCIL.

CLI Attribute JVM Attribute CLI Semantic abstract ACC ABSTRACT The defined type is abstract. interface ACC INTERFACE The defined type is an inter- face. sealed ACC FINAL The defined type cannot be de- rived. public ACC PUBLIC The defined type is accessible outside of the assembly. private ACC PUBLIC The defined type is only acces- sible within the assembly. (see section 3.7.2)

Table 3.3: Supported CLI type attributes and their JVM translations

For translated types, the CLI base class (i.e. superclass) defined for a type definition is translated directly as the JVM class’s superclass. The one ex- ception to this is interface definitions. JVM interfaces must always extend java.lang.Object, and CLI interfaces always extend System.Object, thus JaCIL translates such classes to conform to JVM requirements.

See section 3.8 for details on translation of field definitions, and section 3.9 for details on translation of method definitions.

36 3.7.1 Value Type Definitions

A value type is defined in the CLI as a type definition whose base class is System.ValueType. Such a type is translated directly to the JVM by JaCIL in addition to defining a public, parameterless constructor. This con- structor is generated because CLI value types are not ever null (as they are allocated in place) and are always “zero initialized.” Thus translated code can initialize such types before user code that uses such types is executed.

3.7.2 Type Definition Translation Limitations

JaCIL translates the type access modifier, private, as ACC PUBLIC in the translated JVM type. The rationale for this behavior is that the JVM has no equivalent access modifier. Besides public, a top-level class in the JVM can be declared as package protected. This is potentially too restrictive, as types in different namespaces could access each other even if declared as private; translating this scenario with package protected would not allow such types to access the other.

Furthermore, besides the properties listed in table 3.3, the definition of fields, and the definition of methods, no other metadata in a type definition is translated by JaCIL.

3.8 Field Definitions

All fields in a type definition are translated directly as JVM fields for that translated type. Table 3.4 lists the CLI field attributes that are supported and the JVM field attributes that they are translated to by JaCIL.

When translating literal fields, JaCIL also assigns the value of that field as the initial value of the translated field.

3.8.1 Field Definition Translation Limitations

JaCIL translates any field accessibility attributes that rely on assembly or compilation unit accessibility as ACC PUBLIC in the JVM. The rationale for this follows similarly to section 3.7.2. The JVM has no accessibility level that is more course grained than package protected, so, in order to retain accessibility across namespaces, public accessibility is used in the translation.

37 CLI Attribute JVM Attribute CLI Semantic initonly ACC FINAL The defined field cannot be modified after initialization. literal ACC FINAL The defined field is a con- ACC STATIC stant literal. static ACC STATIC The defined field is static. public ACC PUBLIC The defined field is publicly accessible. private ACC PRIVATE The defined field is accessi- ble only to the type. family ACC PROTECTED The defined field is accessi- ble only to the type and its derived types. (See section 3.8.1) assembly ACC PUBLIC The defined field is accessi- ble by the assembly. (See section 3.8.1) famorassem ACC PUBLIC The defined is accessible by the assembly or a derived type. (See section 3.8.1) famandassem ACC PUBLIC The defined field is accessi- ble only to a derived type in the assembly. (See section 3.8.1) compilercontrolled ACC PUBLIC The defined field is acces- sible only to a compilation unit (e.g. a module). (See section 3.8.1)

Table 3.4: Supported CLI field attributes and their JVM translations

38 Besides the properties listed in table 3.4, no other metadata for fields is translated by JaCIL.

JaCIL also does not make provisions for field names that are valid in the CLI but are not valid in the JVM. In such cases, JaCIL will generate in- valid classes. In the future, a name mangling solution could mitigate this limitation.

3.9 Method Definitions

All methods in a type definition are translated directly as JVM methods for that translated type. Table 3.5 lists the CLI method attributes that are supported, and the JVM method attributes that they are translated to by JaCIL. Furthermore, if a method does not have the virtual attribute and is not an instance constructor or class constructor, it is marked with the ACC FINAL attribute in its JVM translation.

The CLI defines calling conventions that are defined for method declara- tions. JaCIL only supports the instance modifier, which indicates that the method is an instance method and has an implied this argument. The only CLI calling kind that the system is required to support is the default one.

For methods that are not attributed as abstract, the implementation at- tributes defined in table 3.6 are supported by JaCIL. Specifically, JaCIL can only translate methods that have both the cil and managed implementa- tion attributes.

Methods that provide executable code defined in the Common Intermediate Language (CIL) instruction set of the CLI as part of their implementation are translated to JVM byte-code instructions in the code attribute of the translated method. Sections 3.9.2 and 3.9.3 describe additional translation semantics for method implementations that are translated by JaCIL. Sec- tion 3.11 describes the extent of the of the CIL instruction set that can be translated by JaCIL.

3.9.1 Method Name Mangling

JaCIL translates method names directly in the general case. The excep- tions to this are instance constructors and class constructors (i.e. static ini- tializers).

For type definitions that are not value types, the CLI instance constructor

39 CLI Attribute JVM Attribute CLI Semantic abstract ACC ABSTRACT The defined method is ab- stract. final ACC FINAL The defined virtual method cannot be overridden in a derived type. virtual None The defined method is vir- tual. (See section 3.9.5) static ACC STATIC The defined method is static. public ACC PUBLIC The defined method is pub- licly accessible. private ACC PRIVATE The defined method is acces- sible only to the type. family ACC PROTECTED The defined method is acces- sible only to the type and its derived types. (See section 3.9.5) assembly ACC PUBLIC The defined method is acces- sible by the assembly. (See section 3.9.5) famorassem ACC PUBLIC The defined method is acces- sible by the assembly or a derived type. (See section 3.9.5) famandassem ACC PUBLIC The defined method is acces- sible only to a derived type in the assembly. (See section 3.9.5) compilercontrolled ACC PUBLIC The defined method is acces- sible only to a compilation unit (e.g. a module). (See section 3.9.5)

Table 3.5: Supported CLI method attributes and their JVM translations

40 CLI Attribute JVM Attribute CLI Semantic cil Not Applicable The method is implemented using CIL byte-code. managed Not Applicable The method should be man- aged by the CLI. synchronized ACC SYNCHRONIZED The method should be syn- chronized

Table 3.6: Supported CLI implementation attributes and their JVM trans- lations name, .ctor, is translated as the JVM instance constructor name . For value type definitions, the CLI instance constructor is translated as a method named $$ctor. The reason for the difference with value types is that a parameterless constructor is always created (see section 3.7.1) and constructors are invoked like normal methods for such types in CIL code.

In all cases, the CLI class constructor name, .cctor, is always translated as the JVM static initializer name, .

3.9.2 Local Variables

Local variables usable within a method implementation are separated into two sets of “registers,” one set for arguments and the other for local vari- ables. Each argument or local used by a method consumes a register from one of these sets. In contrast, the JVM has only one set of “registers” for both arguments and local variables. In the JVM, the arguments come be- fore regular local variables when referencing this set of “registers.” To fur- ther complicate matters, if a local or argument is a 64-bit integer or 64-bit floating point variable, two local variable “registers” are consumed [14].

JaCIL translates CLI local variables to the JVM in a straightforward man- ner. Each argument is assigned to the equivalent JVM local variable index, using two “registers” when appropriate.

If a local variable is ever addressed with an address load instruction (see section 3.11.14), then instead of directly using the register for the variable, the index is used to store a reference to a managed pointer object (see ta- ble 3.2). In this case, the local variable or argument is stored within a box pointer object that is defined by the JaCIL Run-time Library and code is inserted in the beginning of the method to instantiate this box pointer. A box pointer is merely a class that extends a pointer type with an imple- mentation that encapsulates a value of the associated pointer type. For

41 each pointer type in table 3.2, a concrete subclass of the type is defined by the JaCIL Run-time Library. These types are prefixed with Box (e.g. for Int32Ptr, there is BoxInt32Ptr).

In cases where a local variable is a user-defined value type, code is also in- serted in the beginning of the method to instantiate the variable using the parameterless constructor. As discussed in 3.7.1, there is always a param- eterless constructor for all translated value types–initializing them in this way ensures that value type local variables are never null.

3.9.3 Exception Handling Support

The CLI exception handling system consists of the notion of protected blocks and an associated handler for each block. Each protected block can have one and only one handler associated with it [8]. JaCIL only supports two CLI handler types: catch and finally.

JaCIL translates all supported protected blocks as entries in the translated method’s exception table. The catch handler for such entries differ between the type of CLI handler being translated.

For catch handlers, code is inserted before the user-defined code of the han- dler to store the exception object into an unused local variable. This extra local variable is used only for implementing the semantics for the rethrow instruction (see 3.11.12).

With finally handlers, JaCIL inserts code to store the exception object, “jumps subroutine” to the user defined code for the handler, and then re- throws the exception that was originally caught. This is very similar to how Java finally blocks are compiled by the Java Compiler.

See also, section 3.11.12 for details on CIL instructions that work with ex- ceptions.

3.9.4 Constructors

Besides the name mangling semantics described in section 3.9.1. JaCIL has additional translation rules when translating instance and static construc- tors. For any type that contains instance fields that are value types, code is inserted to instantiate these fields in any constructor definitions that call the super constructor. This is done in addition to the behavior specified for local variables of value types described in section 3.9.2. For static fields of user-defined value types, the same type of code generation is inserted in the translated class constructor that is defined for the type. If static fields that

42 are user-defined value types exist without a class constructor being defined for the type definition, one is synthesized by the JaCIL compiler.

3.9.5 Method Definition Translation Limitations

JaCIL translates any method accessibility attributes that rely on assembly or compilation unit accessibility as ACC PUBLIC in the JVM. The rationale for this follows section 3.8.1 identically.

The CLI supports the concept of virtual and non-virtual instance meth- ods. In the JVM, all instance methods are virtual and JaCIL translates all instance methods as such. This limitation could potentially be avoided by translating non-virtual methods as static methods with an explicit this argument. This solution, however, would also have to incorporate name mangling to avoid name clashes with already established static methods with the same name and signature.

Besides the translation semantics for the properties defined for method definitions in section 3.9, no other metadata for methods is translated by JaCIL. This limitation also implies native and unmanaged methods are not supported by JaCIL.

There are instances where method signatures can name clash. JaCIL trans- lates unsigned types as the JVM signed types of the same size (see table 3.1) and pointers to user-defined types are translated as JaCIL.Runtime.- Pointers.ObjectPtr for method and field signatures. As a result, if two methods with the same name are defined as taking arguments that differ only in signed parameters or managed pointers to user-defined types, there will be a method signature clash on the translated JVM methods. JaCIL also does not make provisions for method names that are valid in the CLI but are not valid in the JVM. In such cases, JaCIL will generate invalid classes. In the future, a name mangling solution could mitigate these limi- tations.

JaCIL does not support the fault and filter exception handler types defined by the CLI. Both of these handler types have similar semantics to finally and catch and should be straightforward to translate in the future.

JaCIL does not support translation of methods that contain more code than the JVM permits in a single method. If a method to be translated generates more JVM byte-code than is allowed by the JVM, the class file generated will be invalid.

43 3.10 JaCIL Implementation of Standard CLI API

The JaCIL Run-time Library provides a subset of the standard CLI API. This implementation is not meant to be comprehensive, and is implemented to the extent that basic CLI translations can be supported on the JVM. For complete details, please refer to the API specification provided in the binary distribution of JaCIL.

The following subsections describe notable aspects of this implementation with regard to JVM and CLI inter-operation.

3.10.1 System.Object Implementation

The core object in the CLI is System.Object. As such, an implementation of this class is paramount for enabling any translation to work. This class is implemented in JaCIL as a subclass of java.lang.RuntimeException. The rationale for this is that all objects in the CLI can be thrown as an exception, and also because all exceptions in the CLI are unchecked. This implementation also implements java.lang.Cloneable. All objects in the CLI can be member-wise cloned by a standard method in System.- Object, implementing this interface allows this to be done by any type in the CLI hierarchy.

To assist in inter-operation with Java code, System.Object provides stan- dard Java object method overrides that delegate to the System.Object equivalent. This way, Java code invoking standard Java object methods on a JaCIL translated class instance will call the appropriate standard CLI object method. From the CLI code point of view, the standard Java object methods are invisible, and as such, are unaffected by this. These overrides are listed in table 3.7.

Overridden JVM Method System.Object Method Delegated To toString ToString equals Equals finalize Finalize hashCode HashCode clone MemberwiseClone

Table 3.7: java.lang.Object method overrides in System.Object

The JaCIL implementation of System.Object also provides an implemen- tation of MemberwiseClone that uses reflection to properly clone fields

44 that are value types. This is implemented in this way to preserve value type semantics, as such types are translated as JVM references by JaCIL.

3.10.2 System.Type Implementation

This class provides information about a type in the CLI. This is very sim- ilar to the standard Java class, java.lang.Class, and in fact, JaCIL’s implementation wraps java.lang.Class. The current implementation only provides a GetName() method, which is consistent with the CLI Ker- nel Profile [8].

3.10.3 Other Classes

The JaCIL Run-time Library also provides very minimal implementations of System.Exception, System.ArithmeticException, System.Console, and System.ValueType. The implementations are mostly place holders such that translated classes will run on the JVM. Since most, if not all, of the public interface of such classes is not implemented, calls to them by translated code will fail at run-time.

3.11 Instruction Set Support

The following subsections describe the Common Intermediate Language (CIL) instruction set supported by JaCIL for executable code in the CLI. Tables 3.8 and 3.9 list the instructions that are supported by JaCIL in al- phabetical order by mnemonic.

Within the subset of the CIL instruction set that is supported by JaCIL, only correct, verifiable uses of such instructions are supported. JaCIL as- sumes that the CIL being translated is being used in such a manner. Also, the CLI has special instructions, called prefixes, that are modifiers for in- structions that follow them. JaCIL does not support any of these instruc- tions.

3.11.1 Non-operation Instructions

The CIL instructions nop and break do not actually perform any action when executed. As such, these are translated directly as the JVM instruc- tion NOP.

45 0x58 add 0x5f and 0x3b beq 0x2e beq.s 0x3c bge 0x2f bge.s 0x3d bgt 0x30 bgt.s 0x3e ble 0x31 ble.s 0x3f blt 0x32 blt.s 0x40 bne.un 0x33 bne.un.s 0x8c box 0x38 br 0x2b br.s 0x01 break 0x39 brfalse 0x2c brfalse.s 0x3a brtrue 0x2d brtrue.s 0x28 call 0x6f callvirt 0x74 castclass 0xfe01 ceq 0xfe02 cgt 0xc3 ckfinite 0xfe04 clt 0xd3 conv.i 0x67 conv.i1 0x68 conv.i2 0x69 conv.i4 0x6a conv.i8 0x6b conv.r4 0x6c conv.r8 0xe0 conv.u 0xd2 conv.u1 0xd1 conv.u2 0x6d conv.u4 0x6e conv.u8 0x5b div 0x25 dup 0xdc endfinally 0xfe15 initobj 0x75 isinst 0xfe09 ldarg 0x02 ldarg.0 0x03 ldarg.1 0x04 ldarg.2 0x05 ldarg.3 0x0e ldarg.s 0xfe0a ldarga 0x0f ldarga.s 0x20 ldc.i4 0x16 ldc.i4.0 0x17 ldc.i4.1 0x18 ldc.i4.2 0x19 ldc.i4.3 0x1a ldc.i4.4 0x1b ldc.i4.5 0x1c ldc.i4.6 0x1d ldc.i4.7 0x1e ldc.i4.8 0x15 ldc.i4.m1 0x1f ldc.i4.s 0x21 ldc.i8 0x22 ldc.r4 0x23 ldc.r8 0xa3 ldelem.any 0x90 ldelem.i1 0x92 ldelem.i2 0x94 ldelem.i4 0x96 ldelem.i8 0x98 ldelem.r4 0x99 ldelem.r8 0x9a ldelem.ref 0x91 ldelem.u1

Table 3.8: JaCIL supported instructions add through ldelem.u1

46 0x93 ldelem.u2 0x8f ldelema 0x7b ldfld 0x7c ldflda 0x46 ldind.i1 0x48 ldind.i2 0x4a ldind.i4 0x4c ldind.i8 0x4e ldind.r4 0x4f ldind.r8 0x50 ldind.ref 0x47 ldind.u1 0x49 ldind.u2 0x8e ldlen 0xfe0c ldloc 0x06 ldloc.0 0x07 ldloc.1 0x08 ldloc.2 0x09 ldloc.3 0x11 ldloc.s 0xfe0d ldloca 0x12 ldloca.s 0x14 ldnull 0x71 ldobj 0x7e ldsfld 0x7f ldsflda 0x72 ldstr 0xdd leave 0xde leave.s 0x5a mul 0x65 neg 0x8d newarr 0x73 newobj 0x00 nop 0x66 not 0x60 or 0x26 pop 0x5d rem 0x2a ret 0xfe1a rethrow 0x62 shl 0x63 shr 0x64 shr.un 0xfe0b starg 0x10 starg.s 0xa4 stelem.any 0x9c stelem.i1 0x9d stelem.i2 0x9e stelem.i4 0x9f stelem.i8 0xa0 stelem.r4 0xa1 stelem.r8 0xa2 stelem.ref 0x7d stfld 0x52 stind.i1 0x53 stind.i2 0x54 stind.i4 0x55 stind.i8 0x56 stind.r4 0x57 stind.r8 0x51 stind.ref 0xfe0e stloc 0x0a stloc.0 0x0b stloc.1 0x0c stloc.2 0x0d stloc.3 0x13 stloc.s 0x81 stobj 0x80 stsfld 0x59 sub 0x45 switch 0x7a throw 0x79 unbox 0xa5 unbox.any 0x61 xor

Table 3.9: JaCIL supported instructions ldelem.u2 through xor

47 3.11.2 Stack Instructions

The operand stack can be directly manipulated by the CIL instructions dup, which duplicates the topmost item on the stack, and pop which discards the top-most item of the stack. In the case that the item on the top of the stack is not a 64-bit integer or 64-bit floating point number, this is translated as the JVM instructions DUP and POP respectively. Otherwise, the JVM instructions DUP2 and POP2 are emitted.

Static data flow analysis is used to determine which variant is appropriate.

3.11.2.1 Stack Duplication of Value Types

Since user-defined value types are represented as heap object references in JaCIL’s translation, when such references are duplicated on the stack, a copy operation must be done to preserve value type semantics. In this case, JaCIL generates code to invoke the clone() method, followed by a type cast to the type that was cloned.

3.11.3 Local Variable and Argument Instructions

The CLI provides a family of instructions for loading and storing local vari- ables and arguments. Table 3.10 lists the instructions that load a value from a local variable or argument onto the operand stack. Table 3.12 lists the instructions that store a value from the operand stack into a local vari- able or argument.

Load instructions are translated to a JVM local variable load instruction. The possible translations are listed in table 3.11. Unlike the CLI, JVM load instructions are prefixed with the type of local variable being loaded. Static analysis determines which JVM variant is required.

Store instructions operate similarly. JaCIL will translate a CIL store in- struction to a JVM instruction that is one of the instructions listed in table 3.13. As with load instructions, which JVM variant is emitted is determined by static analysis.

ldarg ldarg.0 ldarg.1 ldarg.2 ldarg.3 ldarg.s ldloc ldloc.0 ldloc.1 ldloc.2 ldloc.3 ldloc.s

Table 3.10: CIL Local variable and argument load instructions

48 ILOAD ILOAD 0 ILOAD 1 ILOAD 2 ILOAD 3 LLOAD LLOAD 0 ILOAD 1 LLOAD 2 ILOAD 3 FLOAD FLOAD 0 ILOAD 1 FLOAD 2 ILOAD 3 DLOAD DLOAD 0 ILOAD 1 DLOAD 2 ILOAD 3 ALOAD ALOAD 0 ILOAD 1 ALOAD 2 ILOAD 3

Table 3.11: JVM Local variable load instructions

starg starg.0 starg.1 starg.2 starg.3 starg.s stloc stloc.0 stloc.1 stloc.2 stloc.3 stloc.s

Table 3.12: CIL Local variable and argument store instructions

ISTORE ISTORE 0 ISTORE 1 ISTORE 2 ISTORE 3 LSTORE LSTORE 0 ISTORE 1 LSTORE 2 ISTORE 3 FSTORE FSTORE 0 ISTORE 1 FSTORE 2 ISTORE 3 DSTORE DSTORE 0 ISTORE 1 DSTORE 2 ISTORE 3 ASTORE ASTORE 0 ISTORE 1 ASTORE 2 ISTORE 3

Table 3.13: JVM Local variable store instructions

3.11.3.1 Implicit Conversions

In addition to the straight translation of a load instruction itself, additional instructions may be emitted after the load instruction. This happens when loading local variables that are unsigned integers that are narrower than 32 bits and not of the System.Char type (the JVM has native support for unsigned 16-bit character types). Since these narrow, unsigned types are translated as their signed counterparts, they will be sign extended when loaded onto the operand stack in the JVM. In order to retain the unsigned semantics, code is generated to truncate the value loaded onto the stack to the appropriate width of the variable being loaded.

3.11.3.2 Loading and Storing to Boxed Local Variables

As discussed in section 3.9.2, in the case where an addressing operation is ever done on a local variable, a box managed pointer object is used to repre- sent the storage space of the local variable. When a load or store operation is performed on local variables or arguments of this sort, the reference to the box pointer is loaded and an indirection operation is performed on it. This behavior is described in further detail in section 3.11.13.

49 3.11.3.3 Value Type Loading

Since JaCIL emulates user-defined value types with JVM heap objects, a clone operation is performed when loading a type of this sort from a local variable. This is done as specified in section 3.11.2.1.

3.11.4 Constant Loading Instructions

The CLI provides various instructions for loading constants onto the operand stack. Table 3.14 illustrates the translation semantics for these instruc- tions. As the table shows, only the LDC, LDC W, and LDC2 W instructions are used in the JVM translations for numeric and string constant load. This is clearly not the most efficient translation and future versions of JaCIL could be optimized to produce constant load with more efficient instructions like ICONST 0, BIPUSH, etc.

CIL Instruction JVM Operation ldc.i4 LDC or LDC W ldc.i4.0 LDC or LDC W ldc.i4.1 LDC or LDC W ldc.i4.2 LDC or LDC W ldc.i4.3 LDC or LDC W ldc.i4.4 LDC or LDC W ldc.i4.5 LDC or LDC W ldc.i4.6 LDC or LDC W ldc.i4.7 LDC or LDC W ldc.i4.8 LDC or LDC W ldc.i4.m1 LDC or LDC W ldc.i4.s LDC or LDC W ldc.i8 LDC2 W ldc.r4 LDC or LDC W ldc.r8 LDC2 W ldstr LDC or LDC W ldnull ACONST NULL

Table 3.14: CIL field access instruction translation semantics

3.11.5 Arithmetic Instructions

The CIL provides a set of instructions for performing binary and unary arithmetic operations. With the exception of the CIL not instruction, JaCIL

50 does not translate unary operations any differently from binary operations. Both binary and unary instructions on the JVM expect the same stack operand state as their CIL equivalent so JaCIL can translate arithmetic in an almost one-to-one manner.

Table 3.16 indicates the CIL instructions that operate on both integer and floating point types. As with local variable instructions (section 3.11.3), JVM arithmetic operations are prefixed depending on the type being oper- ated on. The prefix in this table, denoted by T, is one of the prefixes listed in table 3.15. Static data flow analysis by JaCIL selects the appropriate instruction prefix.

CIL arithmetic instructions are “polymorphic” with the operand stack types. That is, for a given arithmetic instruction, it can be used regardless of the stack type or types being operated on–provided it is legal to use that op- eration on the type of the operand or operands in question. The CLI also restricts the mixing of types with binary operations; for these operations (except shifting), the operands must be of the same stack type. Since the CLI operand stack does not track signed or unsigned integers differently, there are instruction variants for operations that have different behavior for signed and unsigned interpretation (e.g. div and div.un). JaCIL cur- rently does not support any of these variants.

Table 3.17 indicates the CIL instructions that operate only on integer types. The CLI specification dictates that these instructions only operate on in- teger types so no extra provisions are made by JaCIL to accommodate them [8].

JVM Instruction Prefix Meaning I 32-bit integer L 64-bit integer F 32-bit floating point D 64-bit floating point

Table 3.15: JVM numeric instruction prefixes

As noted previously, the CIL not operation is not translated directly as the other arithmetic instructions are. This is because the JVM does not have a native instruction for this operation. In this case, JaCIL emits code to push a constant of −1 for the operating type (either 32-bit or 64-bit integer), and the JVM I XOR or L XOR is emitted depending on the operating type.

Additionally, the CIL ckfinite instruction is supported by JaCIL. Unlike normal arithmetic operations, the ckfinite instruction simply returns its operand to the stack if the value, which must be floating point, is finite. In the case that it is not a finite value, an exception is thrown. JaCIL

51 CIL Instruction JVM Translation add T ADD sub T SUB mul T MUL div T DIV rem T MUL neg T NEG

Table 3.16: JaCIL translations for CIL arithmetic operations on integers and floating point values

CIL Instruction JVM Translation and T AND or T OR xor T XOR shl T SHL shr T SHR shrun T USHR not T XOR with −1

Table 3.17: CIL Arithmetic Instructions for integer types only emulates this instruction with a static call to the JaCIL.Runtime.Math.- CheckFinite() method in the JaCIL Run-time Library.

3.11.5.1 Arithmetic Instruction Translation Limitations

Unlike the JVM, the CLI tracks all floating point values on the operand stack as a single type, F, that must “have precision and range greater than or equal to the nominal type” [8]. As a result, it is correct and verifiable to push a 32-bit float, followed by a 64-bit float and perform any floating point binary arithmetic operation. This usage is currently not supported, as the JVM tracks floating point types with more granularity, and cannot operate on two differently sized floating point types in its arithmetic instructions. Future development could mitigate this limitation by emitting a JVM F2D instruction to normalize the smaller floating point operand.

52 3.11.6 Numeric Conversion Instructions

The CLI provides instructions that are used for converting numeric types to other numeric types. The instructions, listed in table 3.18, indicate only the resulting type of the conversion. The JVM also provides similar conversion instructions, but both the source type and target type of the conversion is encoded into the instruction.

CIL Instruction JVM Target Value conv.i 32-bit signed integer conv.i1 8-bit signed integer conv.i2 16-bit signed integer conv.i4 32-bit signed integer conv.i8 64-bit signed integer conv.u 32-bit unsigned integer conv.u1 8-bit unsigned integer conv.u2 16-bit unsigned integer conv.u4 32-bit unsigned integer conv.u8 64-bit unsigned integer conv.r4 32-bit floating point conv.r8 64-bit floating point

Table 3.18: CIL conversion instruction target types

When translating these instructions, JaCIL performs static analysis of the operand being converted. If the stack type (as translated to the JVM) of the operand is different from the stack type that is to be produced by conver- sion, a JVM X2Y instruction is emitted, where X and Y is one of the prefixes listed in table 3.15. X is the source type being converted from and Y is the target type being converted to.

Furthermore, if the type being converted to is an integer value that is nar- rower than 32 bits, and additional conversion operation is emitted. These are listed in table 3.20

Additionally, when conv.u8 operates on a 32-bit integer, JaCIL emits a LAND with 0xFFFFFFFF to eliminate the sign extension of the previously emitted I2L instruction.

3.11.7 Branch and Comparison Instructions

The CIL instruction set contains a variety of comparison and branching instructions. The following subsections describe the translation semantics

53 CIL Instruction JVM Target Stack Type conv.i 32-bit integer conv.i1 32-bit integer conv.i2 32-bit integer conv.i4 32-bit integer conv.i8 64-bit integer conv.u 32-bit integer conv.u1 32-bit integer conv.u2 32-bit integer conv.u4 32-bit integer conv.u8 64-bit integer conv.r4 32-bit floating point conv.r8 64-bit floating point

Table 3.19: CIL conversion instruction target JVM stack types

CIL Instruction JVM Operation conv.i1 I2B conv.i2 I2S conv.u1 IAND with 0xFF conv.u2 I2C

Table 3.20: Additional operations for small integer conversion

54 for the subset of these instructions that are supported by JaCIL.

3.11.7.1 Unconditional Branches

The CIL instruction set provides unconditional branching operations with the br and br.s instructions. JaCIL translates instances of these instruc- tions as GOTO and GOTO W as appropriate.

3.11.7.2 The switch Instruction

The switch instruction in the CIL instruction set is the translated directly as the JVM instruction TABLESWITCH by JaCIL. The switch instruction, unlike its JVM counterpart, always provides a jump table that is indexed starting from zero.

Note that the C# switch construct does not necessarily emit this instruc- tion. The C# syntax allows for more arbitrary uses of switch, which in- cludes switching on string literals. In such cases, as well as cases that have sparse values for case labels, it is likely that a C# compiler will emit con- ditional branch instructions to implement the switch construct.

3.11.7.3 Comparison and Other Conditional Branches

Within the CIL instruction set, there are two instructions that branch on a comparison to an integral zero or null reference. The brtrue instruc- tion branches if its operand is not zero or null, and the brfalse instruc- tion branches if its operand is zero or null. To translate this to the JVM, JaCIL performs data flow analysis to determine the operating type. Table 3.21 illustrates the translation JaCIL performs for brtrue and brfalse depending on this analysis.

CIL Instruction Operand Stack Type JVM Operation brtrue 32-bit integer IFNE brtrue 64-bit integer LCMP with 0, IFEQ brtrue Object reference IFNONNULL brfalse 32-bit integer IFEQ brfalse 64-bit integer LCMP with 0, IFNE brfalse Object reference IFNULL

Table 3.21: Translation semantics for brtrue and brfalse

55 The CIL instruction set also provides binary comparison branching instruc- tions. These instructions branch to their intended target if the comparison operation on their operands is true. Since JVM comparison and conditional branching operations operate on specific stack types, static analysis deter- mines the operating type. Table 3.22 describes the translation semantics for the supported binary comparison branching operations that depend on the operating type of the operands.

As with some arithmetic operations (see section 3.11.5), comparison and conditional branching operations have unsigned variants. JaCIL does not currently implement translation for these instructions.

CIL Instruction Operand Stack Type JVM Operation beq 32-bit integer IF ICMPEQ beq 64-bit integer LCMP, IFEQ beq 32-bit floating point FCMPL, IFEQ beq 64-bit floating point DCMPL, IFEQ beq Object reference IF ACMPEQ bne 32-bit integer IF ICMPNE bne 64-bit integer LCMP, IFNE bne 32-bit floating point FCMPL, IFNE bne 64-bit floating point DCMPL, IFNE bne Object reference IF ACMPNE blt 32-bit integer IF ICMPLT blt 64-bit integer LCMP, IFLT blt 32-bit floating point FCMPG, IFLT blt 64-bit floating point DCMPG, IFLT ble 32-bit integer IF ICMPLE ble 64-bit integer LCMP, IFLE ble 32-bit floating point FCMPG, IFLE ble 64-bit floating point DCMPG, IFLE bgt 32-bit integer IF ICMPGT bgt 64-bit integer LCMP, IFGT bgt 32-bit floating point FCMPG, IFGT bgt 64-bit floating point DCMPG, IFGT bge 32-bit integer IF ICMPGE bge 64-bit integer LCMP, IFGE bge 32-bit floating point FCMPL, IFGE bge 64-bit floating point DCMPL, IFGE

Table 3.22: Translation semantics for CIL comparison branch instructions

In addition to CIL instructions that branch on comparison, the CLI also provides instructions for performing a comparison and pushing the result of the comparison on to the stack as an integer: 0 on failure, 1 on success. The

56 translation of these instructions by JaCIL are translated like the branch instructions in table 3.22. The principal difference is that, in the success case, the branch target is generated code that pushes an integer constant of 1 onto the stack and continues to the next logical instruction. Furthermore, in the failure case, the fall-through code pushes an integer constant of 0 and unconditionally branches to the next logical instruction. Table 3.23 shows the comparison operations supported, and the branch operation analog for the purposes of translations by JaCIL.

Compare Instruction Branch Instruction Equivalent ceq beq clt blt cgt bgt

Table 3.23: CIL supported comparison instructions and their branch coun- terparts

3.11.8 Method Invocation and Return Instructions

Method invocation in the CLI is done through the CIL instructions, call, a static or direct method call, and callvirt, a virtual method call. When the call instruction is used to invoke a static method, the INVOKESTATIC instruction is generated by JaCIL. For instance methods, JaCIL translates this as a call to INVOKESPECIAL, if its usage is valid by JVM verification rules, otherwise, the INVOKEVIRTUAL instruction is emitted. The callvirt instruction is translated as either the INVOKEVIRTUAL instruction, for in- vocation on normal class types, or the INVOKEINTERFACE instruction, for invocation on interface types.

Method return is provided by the CIL instruction ret. Again, in this case, the JVM has variants of its method return instructions depending on the type being returned. Table 3.24 lists the translation semantics for this in- struction.

3.11.8.1 Method Invocation on Value Types

Method invocation on value types is supported by the CLI by passing the this argument as a managed pointer to the value type. Since JaCIL trans- lates value types as JVM heap objects, in the case that the this argument is a managed pointer, JaCIL inserts an indirection operation (see section 3.11.13) at the point after the managed pointer is loaded onto the stack. If data flow analysis determines that this is not possible (e.g. the pointer

57 Method Return Type JVM Operation Void return RETURN 32-bit integer or narrower IRETURN 64-bit integer LRETURN 32-bit floating point FRETURN 64-bit floating point DRETURN User-defined type ARETURN Managed pointer ARETURN

Table 3.24: CIL ret translation semantics pushed is potentially used in a different way), JaCIL inserts code before the method invocation to pop all arguments into unused local variables, ex- tract the reference from the pointer, and push the arguments back onto the stack.

3.11.8.2 Method Invocation Translation Limitations

JaCIL does not properly support all cases where non-virtual instance method invocation is used with the call instruction. This is because the use of INVOKESPECIAL is severely restricted in the JVM. As mentioned in section 3.9.5 a name mangling approach may be a solution to this limitation in the future.

3.11.9 Static and Instance Field Access Instructions

For loading and storing static and instance fields, the CIL instruction set provides instructions that have a direct JVM counterpart. These instruc- tions and their JVM translations are given in table 3.25.

CIL Instruction JVM Operation ldfld GETFIELD stfld PUTFIELD ldsfld GETSTATIC stsfld PUTSTATIC

Table 3.25: CIL field access instruction translation semantics

In addition to translating these instructions directly, an implicit conver- sion operation is done with the same semantics as defined for local variable

58 loading (see section 3.11.3.1). Likewise, when loading fields that are user- defined value types a cloning operation is performed exactly as is done for local variable loading (see section 3.11.3.3).

3.11.9.1 Field Access on Value Types

As with method invocation on value types (see section 3.11.8.1), field ac- cess on value types use a managed pointer as the this argument. In the case where this occurs for a ldfld instruction, an indirection operation is performed, as described by section 3.11.13, before the GETFIELD is emit- ted. For instances where stfld is used, JaCIL emits stack manipulation instructions to get the this pointer to the top of the stack, instructions to perform the indirection on the pointer, stack manipulation instructions to rearrange the stack to a state suitable for storing to the field, and then finally the PUTFIELD instruction.

3.11.10 Object Creation and Initialization

For user-defined reference types, the CIL instruction set provides the newobj instruction. This instruction creates and initializes new objects on the man- aged heap, and returns a reference to this object on the top of the operand stack. The newobj instruction is used much in the same way as the call or callvirt instruction, except that a constructor method is always specified as the immediate operand.

This instruction cannot be translated directly to the JVM, because the JVM lacks a single instruction that both allocates and initializes an object refer- ence. At the site of the newobj instruction, JaCIL emits an INVOKESPECIAL instruction to call the constructor; this takes care of initialization. To trans- late the allocation, JaCIL performs analysis to find a common point in the instruction stream before the first parameter is pushed onto the operand stack. If such a point exists, and all code paths from that point flow to the newobj instruction, then a NEW and DUP is emitted at that point to per- form the allocation. Otherwise, before the INVOKESPECIAL, instructions are emitted to pop all arguments to the constructor to unused local vari- ables; this is followed by a NEW/DUP combination, and a sequence of opera- tions to push back the arguments.

In C# code, there is no way to use newobj in such a way that the less efficient translation is done. It is still however possible to create a verifiable CIL instruction stream that does cause the pop/push translation to occur.

Since a new reference is generated when using newobj, there is no need for

59 a cloning operation when it is used on user-defined value types.

3.11.11 Vector Instructions

The CIL instruction set, like the JVM instruction set, contains instructions that operate on single-dimension, zero-indexed arrays, called vectors in the CLI specification [8].

3.11.11.1 Vector Element Access

JaCIL supports vector element loading instructions, detailed in table 3.26. When translating a vector element load, static analysis is used by JaCIL to determine the type of vector that is being accessed. Depending on the underlying JVM array type, the load instruction is emitted from table 3.27. Furthermore, if the target type loaded is an unsigned type that is narrower that 32 bits, additional conversion is done to eliminate the sign extension (see section 3.11.3.1).

CIL Instruction Target Load Type ldelem.any Underlying element type ldelem.i1 Signed 8-bit integer ldelem.i2 Signed 16-bit integer ldelem.i4 Signed 32-bit integer ldelem.i8 Signed 64-bit integer ldelem.r4 32-bit floating point ldelem.r8 64-bit floating point ldelem.ref Object reference ldelem.u1 Unsigned 8-bit integer ldelem.u2 Unsigned 16-bit integer

Table 3.26: Supported CIL vector load instruction variants

Similarly, vector element store instructions that are supported by JaCIL are listed in table 3.28. As with load operations, static analysis determines the JVM array store instruction to use from table 3.27.

With vector element loading of user-defined value types, a cloning operation is performed. The translation semantics for this is exactly the same as with local variable loading (see section 3.11.3.3 for details).

60 JVM Array Type Load Instruction Store Instruction boolean[] BALOAD BASTORE byte[] BALOAD BASTORE char[] CALOAD CASTORE short[] SALOAD SASTORE int[] IALOAD IASTORE long[] LALOAD LASTORE float[] FALOAD FASTORE double[] DALOAD DASTORE user-defined type[] AALOAD AASTORE

Table 3.27: JVM array element access instructions

CIL Instruction Target Store Type stelem.any Underlying element type stelem.i1 Any 8-bit integer stelem.i2 Any 16-bit integer stelem.i4 Any 32-bit integer stelem.i8 Any 64-bit integer stelem.r4 32-bit floating point stelem.r8 64-bit floating point stelem.ref Object reference

Table 3.28: Supported CIL vector store instruction variants

61 3.11.11.2 Vector Creation

The CLI provides the newarr instruction to create new vectors on the heap. If the type of the array being created is a primitive type, than the equivalent JVM NEWARRAY instruction is emitted, if the type of array is a user-defined type, ANEWARRAY is emitted.

Value type semantics dictate that an array of value types are zero initial- ized. Since user-defined value types are translated as references in the JVM, when creating an array of user-defined value types, JaCIL generates a code sequence after the array creation instruction to loop through the array, and create instances of the value type to store in each element.

3.11.11.3 Vector Length

The CIL instruction, ldlen, which returns the number of elements a vector object contains, is directly translated to the JVM instruction ARRAYLENGTH.

3.11.12 Exception Instructions

Besides the semantics described in section 3.9.3 with regard to overall ex- ception handling semantics, the following subsections describe the opera- tions of the CIL instruction set that is supported by JaCIL.

3.11.12.1 Exception Throwing

JaCIL supports the CIL exception throwing instructions, throw and rethrow. The throw instruction is translated directly as the JVM instruction ATHROW. When rethrow is translated, JaCIL inserts a local variable load for the ex- ception object that is stored in the beginning of the handler (see section 3.9.3) before emitting ATHROW.

3.11.12.2 Leaving Protected Blocks

The CLI specifies that the only way to transfer control from within a pro- tected block to outside of one is by the leave and leave.s instructions. These instructions function as unconditional branches with the added se- mantic that any protected blocks with finally handlers that are being ex- ited are executed. JaCIL translates these instructions directly as GOTO and

62 GOTO W. In addition to this, for every protected block with a finally han- dler being exited, JaCIL emits a JVM subroutine jump, JSR, to the handler. This is done in order from the innermost finally block to the outermost fi- nally block being exited.

3.11.12.3 Leaving Finally Blocks

The CIL instruction, endfinally, is the only instruction used for leaving finally blocks in the CLI. Since JaCIL translates finally blocks as JVM sub- routines, JaCIL translates this instruction as a JVM RET.

3.11.13 Indirection Instructions

The CIL instruction set defines a set of instructions to perform indirection on managed pointers. These instructions are used to store and load the value pointed to by such a pointer. Though table 3.2 indicates the JVM types that represent pointers for storage purposes, there are five base, ab- stract pointer classes that correspond to JVM stack types. All other pointer types are either one of these or inherit from one them. Table 3.29 lists these base pointer types.

JaCIL Managed Pointer Type JVM Stack Type Int32Ptr 32-bit integer Int64Ptr 64-bit integer Float32Ptr 32-bit floating point Float64Ptr 64-bit floating point ObjectPtr Object reference

Table 3.29: JaCIL base managed pointer types in the JaCIL.Runtime.- Pointers package

Instruction set support for indirect loading of values from managed pointers is listed in table 3.30. Since JaCIL translates managed pointers as pointer objects, the indirect load operation is translated as a virtual method call to the GetValue() method of the pointer type being loaded from. As with local variables (see 3.11.3.1), conversion code is also emitted for narrow, non-native, unsigned integer loads. As with other instructions that load user-defined value types to the stack (see section 3.11.3.3), JaCIL emits a clone operation when loading user-defined value types from managed point- ers.

Indirect store operations of the CLI are supported similarly to indirect load. Table 3.31 lists the indirect store operations supported by JaCIL.

63 CIL Instruction Target Load Type ldind.i1 Signed 8-bit integer ldind.i2 Signed 16-bit integer ldind.i4 Signed 32-bit integer ldind.i8 Signed 64-bit integer ldind.r4 32-bit floating point ldind.r8 64-bit floating point ldind.ref Object reference ldobj Object reference ldind.u1 Unsigned 8-bit integer ldind.u2 Unsigned 16-bit integer

Table 3.30: Supported CIL indirect load instruction variants

Similarly to indirect load, this is translated as a virtual method call to the SetValue() method of the associated pointer type.

CIL Instruction Target Store Type stind.i1 Any 8-bit integer stind.i2 Any 16-bit integer stind.i4 Any 32-bit integer stind.i8 Any 64-bit integer stind.r4 32-bit floating point stind.r8 64-bit floating point stind.ref Object reference stobj Object reference

Table 3.31: Supported CIL indirect store instruction variants

3.11.14 Address Load Instructions

The CLI provides three types of CIL instructions for loading an address as a managed pointer. JaCIL support for these instructions is described in the subsections below.

3.11.14.1 Local Variable Address Load

In the CLI a local variable or argument can be addressed by the ldloc.a and ldarg.a instructions. As stated in section 3.9.2, if this instruction appears within the executable code of a method, the local variable index

64 assigned in the JVM translation is used to store the box pointer object. Because of this behavior, this instruction is translated as simply a JVM variable load on the translated local variable index.

3.11.14.2 Static and Instance Field Address Load

The CIL instruction set contains two instructions for addressing instance and static fields: ldfld.a and ldsfld.a respectively. In both of these cases, JaCIL emits code to construct a field pointer object. For every pointer type in table 3.2, there exists a class in JaCIL.Runtime.Pointers that is used to represent a pointer to a field of that type. These types are prefixed with Field (e.g. for Int32Ptr, there is FieldInt32Ptr).

The JaCIL Run-time Library implements these pointer classes by using Java standard reflection capabilities.

3.11.14.3 Array Element Address Load

The CLI also provides an instruction to take the address of an array ele- ment, ldelem.a. This instruction is translated by emitting code to create an instance of an array element pointer object. As with the other concrete pointer implementations, for every pointer type in table 3.2, there exists a class in JaCIL.Runtime.Pointers that is used to represent a pointer to an element of that type. These types are prefixed with ArrayElement (e.g. for Int32Ptr, there is ArrayElementInt32Ptr).

The JaCIL Run-time Library implements these pointer classes by storing a reference to the array object whose element is being pointed to, and the index of the element being pointed at.

3.11.15 Value Type and Reference Type Instructions

The following subsections discuss the translation semantics for supported CIL instructions that operate strictly on reference types or value types.

3.11.15.1 Boxing and Unboxing

To provide interoperability between value types and reference types in the CLI, the box, unbox, and unbox.any instructions are provided.

65 No code is emitted for the box instruction if a user-defined value type is being boxed. This is because JaCIL translates user-defined value types as JVM reference types, so these references are already the “box.” However, when used with primitive types, which are value types by the CLI spec- ification, they are translated differently. Primitive types are boxed with standard Java API wrapper object for the type being boxed. For example, if a 32-bit integer was being boxed, the result of the operation would be an instance of java.lang.Integer.

When translating the unbox instruction, the result is a managed pointer to the value type. If the type being unboxed is a primitive type, code to in- voke the appropriate wrapper method is emitted to retrieve the value of the primitive type. In any case, JaCIL then emits code to create a box pointer object to wrap the value as a managed pointer. Note that this behavior is in fact legal by the CLI specification and will affect code that makes the as- sumption that the managed pointer points to the value within the original boxed object [8].

The translation for unbox.any is more straightforward than unbox. This instruction is required to return the value to the operand stack, not a pointer to it. If the boxed type is a primitive, it is unwrapped as per the translation of unbox. If the boxed type is a user-defined value type, it is simply cloned (to maintain the copy semantic). For regular reference types, this operation is a non-operation as specified in the CLI specification [8].

3.11.15.2 The initobj instruction

The CIL instruction, initobj is provided to zero initialize value types. The operand for this instruction is a managed pointer to the address of the value type to be initialized. Since JaCIL translates value types as JVM reference types, initobj is translated by constructing a new instance of the value type and performing an indirect store (see section 3.11.13) to the managed pointer destination.

3.11.15.3 Reference Type Checking and Conversion

The CIL instruction, castclass, attempts to convert one type to another. This is translated directly by JaCIL as the JVM instruction, CHECKCAST.

In addition the CLI defines the isinst instruction that tests for type con- formance. This instruction returns its operand on success, and returns a null reference on failure. JaCIL translates this by emitting the INSTANCEOF instruction. Since this instruction returns 0 on failure and 1 on success, ad- ditionally, JaCIL also emits a conditional branch to code that re-pushes the

66 operand on success and pushes a null constant on failure.

3.12 Translation Examples

To help illustrate the semantics defined in previous sections, the following subsections depict examples of JaCIL byte-code translations. The examples assume that the reader has a basic understanding of C#, the CIL instruc- tion set, and the JVM byte-code instruction set.

3.12.1 Basic Operations

Figure 3.2 shows a C# source code definition of a method that has a variety of basic operations. This code is compiled to the CIL code listed in figure 3.3. Figure 3.4 shows the javap output from the JaCIL translation of the CIL code.

Note that CIL local variable and argument load/store instructions are trans- lated as the equivalent JVM load/store instructions. The mapping between the two sets of “registers” for local variables and arguments is evident as stloc.0 is translated as ASTORE 4. The static analysis that JaCIL per- forms is also apparent as the correct selection of the JVM local variable load/store instruction is made (e.g. using ILOAD 0 for ldloc.0).

This example also contains instances of instructions that are polymorphic. All uses of the add instruction, for instance, are on 32-bit integers, which JaCIL correctly translates as the JVM instruction IADD. Furthermore, the conditional branch instruction, blt, is used to implement the for loop. JaCIL translates this usage as the IF ICMPLT instruction which branches conditionally on an 32-bit integer comparison.

This example also demonstrates object creation and initialization. The CIL instruction, newobj, performs both, object creation and initialization. JaCIL appropriately translates the initialization in place as a INVOKESPECIAL on the constructor. Also, JaCIL inserts the JVM NEW and DUP instructions to properly create the object. As is shown, JaCIL inserts these instructions before the arguments of the constructor are pushed onto the stack.

3.12.2 Exceptions

Figure 3.5 shows a C# source code definition of a method that has a simple catch and finally block. This source is compiled to the CIL code listed in

67 figure 3.6. Figure 3.7 shows the javap output from the JaCIL translation of the CIL code.

In this example, JaCIL directly translates the protected blocks of code as JVM exception table entries. The throw at IL 000d is also translated di- rectly as ATHROW.

Structurally speaking, the catch handler is translated almost directly to its JVM counterpart. The finally block, on the other hand, is translated as “catch any” entry in the JVM exception table, with a jump to the subroutine at offset 44, followed by a re-throwing of the exception. The finally block subroutine is also jumped to by the translation of the leave instructions shown at offsets 15 and 30 in figure 3.7.

3.12.3 Value Types and Managed Pointers

Figure 3.8 shows a C# source code definition of a method that has value type operations involving boxing. The type Point is a user-defined value type, and IPoint is an interface that Point implements. This source is compiled to the CIL code listed in figure 3.9. Figure 3.10 shows the javap output from the JaCIL translation of the CIL code.

The copy semantics of value type loading is shown in the translation of ldarg.0. In addition to the ALOAD 0 instruction, JaCIL emits invocation to clone() and a CHECKCAST.

The unbox operation is translated to the instructions from offset 25 through 31. In order to produce a managed pointer to the value type, the reference is copied into a new box pointer object. Offsets 34 through 43 show the translation of the ldobj which performs indirection on this pointer.

Also, annotated is the translation of box. This is a non-operation in this example, so no instructions are emitted by JaCIL.

68 public static Foobar[] CreateIntoArray( int size, int x, int y, int z ) { Foobar[] f = new Foobar[ size ]; for ( int i = 0; i < size; i += 1 ) { f[ i ] = new Foobar( new Foobaz( x + i, y + i, z + i ) ); } return f; }

Figure 3.2: Sample C# method definition with basic operations

69 IL_0000: ldarg.0 IL_0001: newarr Foobar IL_0006: stloc.0 IL_0007: ldc.i4.0 IL_0008: stloc.1 IL_0009: br IL_0028

IL_000e: ldloc.0 IL_000f: ldloc.1 IL_0010: ldarg.1 IL_0011: ldloc.1 IL_0012: add IL_0013: ldarg.2 IL_0014: ldloc.1 IL_0015: add IL_0016: ldarg.3 IL_0017: ldloc.1 IL_0018: add IL_0019: newobj instance void class Foobaz::.ctor(int32, int32, int32) IL_001e: newobj instance void class Foobar::.ctor(class Foobaz) IL_0023: stelem.ref IL_0024: ldloc.1 IL_0025: ldc.i4.1 IL_0026: add IL_0027: stloc.1 IL_0028: ldloc.1 IL_0029: ldarg.0 IL_002a: blt IL_000e

IL_002f: ldloc.0 IL_0030: ret

Figure 3.3: CIL code compiled from source in figure 3.2

70 0: iload_0 1: anewarray #2; //class Foobar 4: astore 4 6: ldc #28; //int 0 8: istore 5 10: goto 51 13: aload 4 15: iload 5 17: new #2; //class Foobar 20: dup 21: new #18; //class Foobaz 24: dup 25: iload_1 26: iload 5 28: iadd 29: iload_2 30: iload 5 32: iadd 33: iload_3 34: iload 5 36: iadd 37: invokespecial #31; //Method Foobaz."":(III)V 40: invokespecial #33; //Method "":(LFoobaz;)V 43: aastore 44: iload 5 46: ldc #34; //int 1 48: iadd 49: istore 5 51: iload 5 53: iload_0 54: if_icmplt 13 57: aload 4 59: areturn

Figure 3.4: Java byte-code generated by JaCIL from the code in figure 3.3

71 public void DoException( bool err ) { int a = 0; try { if ( err ) { throw new Exception(); } } catch ( Exception e ) { a++; } finally { a++; } }

Figure 3.5: Sample C# method definition with a catch handler

72 IL_0000: ldc.i4.0 IL_0001: stloc.0 .try { // 1 .try { // 0 IL_0002: ldarg.1 IL_0003: brfalse IL_000e

IL_0008: newobj instance void class [mscorlib]System.Exception::.ctor() IL_000d: throw IL_000e: leave IL_0022

} // end .try 0 catch [mscorlib]System.Exception { // 0 IL_0013: stloc.1 IL_0014: ldloc.0 IL_0015: ldc.i4.1 IL_0016: add IL_0017: stloc.0 IL_0018: leave IL_0022

} // end handler 0 } // end .try 1 finally { // 1 IL_001d: ldloc.0 IL_001e: ldc.i4.1 IL_001f: add IL_0020: stloc.0 IL_0021: endfinally } // end handler 1 IL_0022: ret

Figure 3.6: CIL code compiled from source in figure 3.5

73 0: ldc #13; //int 0 2: istore_2 3: iload_1 4: ifeq 15 7: new #12; //class System/Exception 10: dup 11: invokespecial #14; //Method System/Exception."":()V 14: athrow 15: jsr 44 18: goto 53 21: dup 22: astore 4 24: astore_3 25: iload_2 26: ldc #15; //int 1 28: iadd 29: istore_2 30: jsr 44 33: goto 53 36: astore 5 38: jsr 44 41: aload 5 43: athrow 44: astore 6 46: iload_2 47: ldc #15; //int 1 49: iadd 50: istore_2 51: ret 6 53: return Exception table: from to target type 3 21 21 Class System/Exception

3 36 36 any

Figure 3.7: Java byte-code generated by JaCIL from the code in figure 3.6

74 public static Point UseIPoint( Point p ) { IPoint i = p; i.SetX( 30 ); i.SetY( 40 ); return ( Point ) i; }

Figure 3.8: Sample C# method definition with value type operations

IL_0000: ldarg.0 IL_0001: box Point IL_0006: stloc.0 IL_0007: ldloc.0 IL_0008: ldc.i4.s 0x1e IL_000a: callvirt instance void class IPoint::SetX(int32) IL_000f: ldloc.0 IL_0010: ldc.i4.s 0x28 IL_0012: callvirt instance void class IPoint::SetY(int32) IL_0017: ldloc.0 IL_0018: unbox Point IL_001d: ldobj Point IL_0022: ret

Figure 3.9: CIL code compiled from source in figure 3.8

75 0: aload_0 1: invokevirtual #38; //Method System/Object.clone:()Ljava/lang/Object; 4: checkcast #8; //class Point (No actual operation for box) 7: astore_1 8: aload_1 9: ldc #57; //int 30 11: invokeinterface #60, 2; //InterfaceMethod IPoint.SetX:(I)V 16: aload_1 17: ldc #61; //int 40 19: invokeinterface #64, 2; //InterfaceMethod IPoint.SetY:(I)V 24: aload_1 25: new #23; //class JaCIL/Runtime/Pointers/BoxObjectPtr 28: dup_x1 29: dup_x1 30: pop 31: invokespecial #26; //Method JaCIL/Runtime/Pointers/BoxObjectPtr."": (Ljava/lang/Object;)V 34: invokevirtual #35; //Method JaCIL/Runtime/Pointers/ObjectPtr.GetValue: ()Ljava/lang/Object; 37: checkcast #8; //class Point 40: invokevirtual #38; //Method System/Object.clone:()Ljava/lang/Object; 43: checkcast #8; //class Point 46: areturn

Figure 3.10: Java byte-code generated by JaCIL from the code in figure 3.9

76 Chapter 4

JaCIL User Guide

This chapter provides end-user documentation for building the system, us- ing the compiler tool and running compiled code.

The following sections assume that contents of the binary distribution of JaCIL is unpacked in a folder named jacil.

4.1 Building JaCIL

In order to build JaCIL from source, the following software packages are required.

• A Common Language Infrastructure Implementation. – Microsoft .NET Framework 1.1 or higher. – Mono 1.1.13 or higher. • A Java Developer Kit Implementation. – Sun JDK 1.5 or higher. – IBM Java SDK 1.5 or higher. • NAnt 0.85 RC3 or higher. • Apache Ant 1.6.5 or higher.

It is assumed that these tools are installed correctly and that the nant and ant executables are within the search path for your shell.

77 Once these prerequisites are satisfied, from the root of the JaCIL source distribution, the nant tool can be invoked to run the build process. The ant tool is never invoked directly by the user; nant invokes it automatically. Table 4.1 lists the primary build targets that a user would be interested in. The most straightforward way to build the JaCIL system is done by invoking nant by itself.

NAnt Target Description build Builds the JaCIL system. This will cre- ate a build directory that will contain the directory layout of the binary distri- bution. Only the compiler framework, the run-time library, and dependencies are built and copied to this directory structure. doc Invokes build and generates the em- bedded API specification documentation in the JaCIL source code. This documen- tation is placed in the build/doc direc- tory. test Invokes build and compiles and runs the JaCIL test suite. examples Invokes build and compiles exam- ples, including benchmarking applica- tions into the build/examples direc- tory. dist Invokes build, doc, test, and examples. This target then pack- ages the build directory in to binary distribution zip files placed in the dist directory.

Table 4.1: JaCIL NAnt build targets

4.2 Using the Compiler Tool

The command-line tool, jacilc.exe, provides a simple interface to the JaCIL Compiler API. Table 4.2 lists the available options for the tool and Figure 4.1 shows an example run of the tool with the help switch.

Besides the options available, the tool takes as arguments, the assembly images that are to be compiled by the system. These images are compiled to a JAR file named by the -o option. Figure 4.2 demonstrates compiling

78 Option Meaning -? Shows the help text and exits. -o file Output compiled JAR to file named file. -out:file Output compiled JAR to file named file. -v level Set verbosity level of compiler messages; 0 is silent and 3 is very high. -verbose:level Set verbosity level of compiler messages; 0 is silent and 3 is very high.

Table 4.2: jacilc.exe command-line options

$ ./jacil/bin/jacilc.exe -? JaCIL Byte-code Compiler 0.7.0.0 - Copyright (C) 2006 Almann T. Goo CLI (.NET) Assembly to Java Class File Compiler

Usage: jacilc [options] ASMFILE... Translates the given ASMFILE(s) to Java byte-code. Options: -? -help Show this help list -help2 Show an additional help list -o -out:file The output JAR file (required) -usage Show usage syntax and exit -v -verbose:level Verbose level (Default is 1, 0 is quiet, 3 is high)

ASMFILE(s) must be valid CLI images (assemblies).

Figure 4.1: jacilc.exe output with -? switch

79 an assembly into a single JAR file.

$ ./jacil/bin/jacilc.exe -o Hello.jar Hello.exe Warning: is assembly private, translating as public

Figure 4.2: Using jacilc.exe to compile an assembly to a JAR file

4.3 Running Compiled Code

Once compiled, a JAR file produced by jacilc.exe can be loaded by the JVM. In order to use such a JAR file, the class path must have the trans- lated JAR in addition to jacil-rt.jar specified. This file contains the implementation of the JaCIL Run-time API.

If any of the assemblies compiled into a JAR file from jacilc.exe con- tains an entry point, this is translated as a Java entry point in a class called CLI.AssemblyName.Main, where AssemblyName is the name of the assembly containing an entry point. Figure 4.3 illustrates the usage of the java command running such an entry point from an assembly named Hello (see figure 4.4 for the C# code listing used for this example).

$ java -cp jacil/lib/jacil-rt.jar:Hello.jar \ CLI.Hello.Main Bill Hello Bill

Figure 4.3: Running translated code

Compiled JAR files from jacilc.exe can also be used as libraries that can be invoked from native Java code. Such uses only require the Java class path to be set with the compiled JAR files and the jacil-rt.jar library. In order to compile code that uses CLI methods and classes, the jacilc.exe compiled JAR file must be in the class path of javac. If any of the JaCIL Run-time API is used from user-defined Java sources, jacil-rt.jar must also be included in this class path.

80 using System; public class Hello { public static void Main( String[] args ) { if ( args.Length != 1 ) { Console.WriteLine( "Usage: Hello name" ); return; }

Console.Write( "Hello " ); Console.WriteLine( args[ 0 ] ); } }

Figure 4.4: Sample C# “Hello World” type program

81 Chapter 5

Performance and Benchmarking

Although the focus of the project was functionality, it is of interest to at least examine generally, the performance of JaCIL compiled code. The following sections present the results of the following benchmarking tests written in C#:

• An implementation of the Blowfish Cipher.

• An implementation of the MD5 Message Digest Algorithm.

• An implementation of a line drawing algorithm using value types.

• An implementation of Bresenham’s Algorithm–an integer line draw- ing algorithm using managed pointers.

These tests where chosen, because it was thought that they represented a good set of “real world” examples. Also, these programs are not dependent on any standard APIs for the benchmarked portions–eliminating compar- isons between efficiency of standard API implementations.

All implementations consist of a main driver program, which instantiates and runs a “kernel” object that contains the algorithm being tested. The number of iterations is controllable from the driver program. Furthermore, each “kernel” object has a non-operation mode where executing the algo- rithm becomes a non-operation. This non-operation mode is provided so that the benchmarking trials can be executed with the algorithm internals turned “on” and “off” so that overhead can be accounted for.

82 The systems used for benchmarking are given by table 5.1. These are the hardware, OS, and execution environments that have been benchmarked. Note that when the benchmark is using a Java Virtual Machine, JaCIL code is being benchmarked. In all other cases, the original Common Language Infrastructure code is being benchmarked.

ID Hardware VM 1 Athlon X2 3800+, -64 2.6.15 Mono 1.1.13.6 4GB RAM 2 Athlon X2 3800+, x86-64 Linux 2.6.15 Sun Java 1.5.0 06- 4GB RAM b05 64-bit Server VM

Table 5.1: Benchmarking systems used in JaCIL benchmarking

All benchmark results are given as the average, effective times in seconds. All result times are taken as the average of 10 runs. The effective time is calculated by taking the difference of the result times running the appli- cation normally and result times running the application in non-operation mode.

5.1 Blowfish Cipher

This benchmark application implements the Blowfish cipher, a symmetric block cipher [16]. The implementation is a straightforward implementation of the algorithm. The driver program for this benchmark re-encrypts a dummy 64-bit block of data for a user-defined amount of iterations.

ID # of Blocks Wall Time User Time System Time 1 1000000 0.517 0.516 0.000 2 1000000 0.205 0.223 0.002 1 10000000 5.152 5.137 0.004 2 10000000 2.053 2.039 0.000

Table 5.2: Blowfish benchmarking results

From the results of the benchmark trial runs in table 5.2, surprisingly the Linux Sun JVM translation outperforms the original code on Linux Mono. The author suspects that Sun’s Hotspot JIT compiler is probably more ma- ture than the one used in the younger Mono Project. Having a more efficient JIT compiler between the different VM implementations could account for this performance gap.

83 5.2 MD5 Message Digest

This benchmark application implements the MD5 message digest algorithm as specified by RFC 1321 [2]. As with the Blowfish implementation, this implementation is a straightforward one based on the description of the al- gorithm in the RFC. The driver program for this benchmark hashes a user defined number of 512-bit blocks of zeroed data. ID # of Blocks Wall Time User Time System Time 1 100000 0.208 0.203 0.005 2 100000 0.345 0.201 0.068 1 1000000 2.027 2.014 0.005 2 1000000 2.062 1.861 0.217

Table 5.3: MD5 benchmarking results

Unlike the Blowfish benchmark, the MD5 results in table 5.3 show a much closer performance gap. It seems evident that the translation performance is roughly equivalent to the original code on the platforms being compared.

5.3 Line Drawing Algorithm with Value Types

This benchmark application uses CLI value types to represent a point. The “kernel” is an implementation of a simple floating point based line render- ing algorithm that uses these value types within its computations. The driver program for this benchmark renders an array of points for a line from a given start point to a given end point. The user also specifies the number of iterations that the line is to be rendered for. ID # of Renderings Wall Time User Time System Time 1 100 0.011 0.015 0.007 2 100 2.436 3.054 0.222 1 1000 0.059 0.050 0.005 2 1000 21.859 29.595 0.682

Table 5.4: Value type line drawing benchmarking results

The results in table 5.4 are based on rendering a line from (10, 10) to (2000, 2000) for a repeated number of iterations. In this case, JaCIL’s emulation of value types using the heap severely cripples performance by a large factor. Ar- guably, the heap allocations and the possible garbage collection operations are causing most of the performance loss.

84 5.4 Bresenham’s Line Drawing Algorithm with Managed Pointers

This benchmark is similar to the previous one, but employs Bresenham’s algorithm, an integer based line rendering algorithm. This algorithm was chosen as it makes for a case where managed pointers can be used in a practical way. The “kernel” of this benchmark generates an array of inte- gers, with pairs of elements representing rendered points instead of using an array of value types. The driver program for this benchmark renders an array of points for a line from a given start point to a given end point. The user also specifies the number of iterations that the line is to be rendered for.

ID # of Renderings Wall Time User Time System Time 1 1000 0.036 0.033 0.000 2 1000 0.059 0.058 0.011 1 100000 2.937 2.815 0.107 2 100000 2.806 2.667 0.112

Table 5.5: Bresenham’s line drawing benchmarking results

The results in table 5.5 are based on rendering a line from (10, 10) to (2000, 2000) for a repeated number of iterations. As with the MD5 benchmark, perfor- mance is equivalent on the platforms being compared. This would suggest that the Sun JVM is able to in-line the method calls used by the pointer objects in the JaCIL translation.

85 Chapter 6

Conclusions

In its current state, JaCIL is able to translate a sizable subset of the Com- mon Language Infrastructure. With its current feature set, it is already possible to write Java programs in the C# language. In a sense, JaCIL has started to open up support for CLI languages to be compiled for the JVM much in the same way as the IKVM.NET project has done for JVM lan- guages on the CLI. It is clearly evident to the author after undertaking this endeavor, that the feasibility of having a full CLI implementation on the JVM is a realistic one.

6.1 Lessons Learned

Developing JaCIL has given the author a very strong understanding of the low-level aspects of both the CLI and the JVM. Having written the trans- lation semantics, one can look past the superficial similarities in both sys- tems and really understand their differences. Furthermore, working at this level, in the author’s opinion, makes a person a far better user of the higher level languages used for developing applications targeting such platforms.

The CLI, being a newer system than the JVM, has clearly a richer pro- gramming model–with pointers and value types in particular. While this may not make the CLI more powerful than the JVM, it is certainly more expressive than the JVM. The author had many days without writing any program code, spent just conceptualizing how certain CLI semantics would be translated into the JVM programming model.

86 6.2 Future Work

The JaCIL project is just the first step on a long path of having a full Com- mon Language Infrastructure implementation for the JVM. Some examples of future directions follow in the subsections below.

6.2.1 Missing Compiler Features

As noted throughout chapter 3, there are numerous limitations that exist that are fixable. Furthermore, there are about 50 instructions that are unsigned or overflow checked variants of already supported instructions, these should be straightforward to implement.

6.2.2 CLI Standard API

The CLI’s standard API is quite a complex and large specification, in or- der to run arbitrary CLI code in Java, an implementation of the standard API would need to exist on the JVM. While the JaCIL Run-time Library provides a skeleton implementation, developing the rest of the API in Java would be a sizable task. Instead, when JaCIL becomes mature enough, it may be possible to leverage a project like Mono, that already has a CLI standard API implementation, to translate the API to the JVM. Even using such an implementation, the porting effort would still be a sizable one.

6.2.3 JaCIL on the JVM

Once a CLI standard API implementation exists on the JVM and JaCIL’s feature set becomes mature, it would be of interest to see if JaCIL can compile itself. This would have the effect of bringing the JaCIL Compiler Framework to the JVM. This is an important step, because such a compiler could be used to load assemblies at run-time on the JVM–a JIT compiler.

6.2.4 CLI Implementation

Once the standard API and a JIT compiler exists for the CLI on the JVM, the next step would be to implement all of the remaining run-time seman- tics, like security, and application domains. This is the ultimate goal for JaCIL’s future work, and such an undertaking, like all steps before it, will be a considerable one.

87 Bibliography

[1] Retroweaver. http://retroweaver.sourceforge.net/. Re- trieved May, 2006. [2] RFC 1321. http://rfc.net/rfc1321.html. Retrieved August, 2006. [3] The DotGNU Project. http://dotgnu.org/. Retrieved May, 2006. [4] The GNU Classpath Project. http://www.gnu.org/software/ classpath/. Retrieved May, 2006. [5] The Mono Project. http://www.mono-project.com/. Retrieved May, 2006. [6] The ObjectWeb ASM Framework. http://asm.objectweb.net/. Retrieved May, 2006. [7] Visual MainWin for J2EE. http://dev.mainsoft.com/Default. aspx?tabid=130. Retrieved May, 2006. [8] Standard ECMA-335, Common Language Infrastructure. http://www.ecma-international.org/publications/ standards/Ecma-335.htm, 2005. [9] J. Evain. The Mono Cecil Project. http://www.mono-project.com/ Cecil. Retrieved May, 2006. [10] J. Frijters. IKVM.NET Home Page. http://www.ikvm.net/. Re- trieved May, 2006. [11] K. Gough. Stacking them up: A comparison of virtual machines. In Australasian Computer Systems Architecture Conference, Goldcoast, Queensland Australia, pages 52–59. IEEE Computer Society Press, 2000. [12] K. J. Gough. Parameter Passing for the Java Virtual Machine. In Computer Science Conference, 2000. ACSC 2000. 23rd Australasian, pages 81–87, 2000.

88 [13] J. Hamilton. Language integration in the . SIGPLAN Not., 38(2):19–28, 2003.

[14] T. Lindholm and F. Yellin. The JavaTMVirtual Machine Specification. Addison-Wesley, 2nd edition, 1999.

[15] E. Meijer and J. Gough. Technical Overview of the Common Language Runtime. http://research.microsoft.com/∼emeijer/Papers/ CLR.pdf, 2000.

[16] B. Schneier. Blowfish. http://www.schneier.com/blowfish. html. Retrieved August, 2006.

[17] S. Shiel and I. Bayley. A Translation-Facilitated Comparison Between the Common Language Runtime and the Java Virtual Machine. In Proceedings of the First Workshop on Semantics, Verification, Analysis and Transformation (BYTECODE 2005), pages 35–52, June 2005.

[18] J. Singer. JVM versus CLR: A Comparative Study. In Proceedings of the 2nd International Conference on Principles of Programming in Java, pages 167–169. Computer Science Press, 2003.

89