Computer Science Technical Report Compiling Java to SUIF

Total Page:16

File Type:pdf, Size:1020Kb

Computer Science Technical Report Compiling Java to SUIF Computer Science Technical Report Compiling Java to SUIF: Incorporating Support for Object-Oriented Languages Sumith Mathew, Eric Dahlman and Sandeep Gupta Computer Science Department Colorado State University, Fort Collins, CO 80523 g fmathew, dahlman, gupta @cs.colostate.edu July 1997 Technical Report CS-97-114 Computer Science Department Colorado State University Fort Collins, CO 80523-1873 Phone: (970) 491-5792 Fax: (970) 491-2466 WWW: http://www.cs.colostate.edu Compiling Java to SUIF: Incorp orating Supp ort for Ob ject-Oriented Languages Sumith Mathew, Eric Dahlman and Sandeep Gupta Computer Science Department Colorado State University,Fort Collins, CO 80523 fmathew, dahlman, [email protected] Abstract A primary ob jective in the SUIF compiler design has b een to develop an infrastructure for research in a variety of compiler topics including optimizations on ob ject-oriented languages. However, the task of optimizing ob ject-oriented languages requires that high- level ob ject constructs b e visible in SUIF. Java is a statically-typ ed, ob ject-oriented and interpreted language that has the same requirements of high-p erformance as pro cedural languages. Wehave develop ed a Java front-end to SUIF, j2s, for the purp ose of carrying out optimizations. This pap er discusses the compilation issues in translating Java to SUIF and draws up on those exp eriences in prop osing the solutions to incorp orating ob ject-oriented supp ort in SUIF as well as the extensions that need to be made to supp ort exceptions and threads. 1 Intro duction Ob ject-oriented languages with their features of inheritance and abstraction equip programmers with the to ols to write reusable programs quickly and easily. However, the use of these features results in co de that is quite di erent in structure from pro cedural co de. Ob ject-oriented programs tend to have smaller metho d sizes, data-dep endent control ow due to dynamic dispatch and a large numb er of p otential aliases [ZCG94]. Heavy use of the feature of dynamic dispatch and the fact that metho ds tend to be small in size imp ose p erformance p enalties when the programs are compiled using traditional intrapro cedural techniques [ASU86]. + Optimization techniques [Dea96, DDG 96, Ple96, Cha92]have b een prop osed whichhave a direct b ene t of eliminating the overhead of dynamic dispatchby statically determining the receiver of a metho d call. Moreover greater indirect b ene ts are obtained by enabling inlining and hence further intrapro cedural optimizations. However, in order to evaluate the relative costs and b ene ts of these optimization techniques, it is necessary to have a common framework on which these optimizations are p erformed. Also, the framework needs to b e language-indep endent so that programs of di erent ob ject-oriented languages can get comparable treatment. An intermediate representation can act as the interface b etween di erent language front-ends and the optimizing back-end. SUIF [Gro94] is a compiler infrastructure that allows easy exp erimentation and incremental opti- mization through multiple passes. The feature of extensibility [Smi96] makes it p ossible for SUIF to supp ort many user-de ned constructs. Also, SUIF has built in supp ort for interpro cedural op- timizations [Pan96] and parallelization. We prop ose to implement an optimizer for ob ject-oriented languages on the SUIF representation of the program. This pro ject is supp orted byFaculty Grant 168597 from Colorado State University 1 A necessary requirement of the intermediate representation is that it must allow the di erent front- ends to describ e the structure of the source program in such a way that optimizations may be p erformed. Ob ject-oriented languages use constructs like metho d calls, eld accesses, ob ject instan- tiations etc., which are not found in pro cedural languages. To p erform static binding of dynamically- dispatched calls, it is necessary that these constructs b e easily identi able in the intermediate rep- resentation. However, the current version of SUIF 1.x do es not have any front-ends for ob ject- oriented languages and do es not supp ort ob ject-oriented constructs. Therefore, a ma jor endeavor of this pro ject has b een to identify the extensions, if any, that need to b e made to SUIF to supp ort these features. This pap er deals with the issues in compiling Java, a statically-typ ed and ob ject-oriented language, to SUIF. As Javaisaninterpreted and dynamic language wehave had to make some changes to the compilation mo del. Java programs are compiled to byteco de b efore b eing translated to the SUIF representation. We draw up on our exp eriences in building the translator, j2s,toshowhow SUIF can b e used to supp ort ob ject-oriented features. We also prop ose solutions for supp orting exceptions and threads in SUIF. The rest of this pap er is organized as follows. Section 2 deals with the issues in translating byteco de to SUIF in detail. It describ es the instruction format, data-typ es and ob ject-oriented constructs that are supp orted by the Java Virtual Machine. In translating Java to SUIF wehave had to deal with the issue of supp orting ob ject-oriented features in SUIF. Section 3 outlines the features that can b e supp orted in the existing version of SUIF as also the extensions that need to b e made. The issues of exception handling and thread libraries are also discussed b ecause they are present in most ob ject-oriented languages. The conclusions and future work are describ ed in section 4. 2 Translating Byteco de to SUIF Translating byteco de to SUIF involves handling the di erent data typ es and byteco de instructions without any alteration in the semantics of the program. The byteco de is to b e nally compiled to native co de. Currently, the compilation mo del requires that all the byteco de les be available at compile time as it do es not supp ort dynamic linking. The currentversion of the translator compiles eachbyteco de .class le into a SUIF le. There- fore, programs with multiple .class les will have a corresp onding numb er of SUIF les. Eachof set entry per fileset. In a later pass the multiple file set entries the SUIF les have one file are linked and placed in a single fileset for the purp oses of interpro cedural analysis. 2.1 Compilation Mo del The Java to SUIF translator, j2s, is one step in the compilation framework for Java programs which is outlined in Figure 1 . Java source co de is rst compiled to byteco de b efore it is translated into SUIF. The reasons for cho osing byteco de as the source language are many. As byteco des do not have to b e parsed, unlike source co de, a simple reader is sucient to read in the information from a byteco de le. Further, byteco des provide a suciently high level of representation of the constructs used in the language. This is imp ortant for the purp oses of optimization where all the ob ject-oriented constructs from the source program need to b e retained. We are currently considering three alternatives for the execution strategy. The approaches can b e classi ed as those interfacing with the Java Virtual Machine Interface and those that take a stand alone approach. In either case all the comp onentbyteco de les are required to b e available at compile time. Dynamic loading is not currently supp orted. There are twoways of interfacing compiled co de with Sun's Java Virtual Machine - using the Native Metho d Interface or the JIT Just-In-Time 2 Java Source JavaC Bytecode j2s SUIF Optimizer C Code Native Method/ Fat-Class File Figure 1: Compilation Framework. API [Yel96]. The stand alone approachwould require a run-time environment that duplicates many of the services that are provided by the Virtual Machine. 2.2 Converting Stack-Based Instructions to Three Address Co de AJava Virtual Machine instruction [LY97] consists of a one-byte op co de sp ecifying the op eration followed by zero or more op erands supplying the arguments. The byteco de instructions are stack- based instructions. The translation pro cedure converts it to three-address co de SUIF expression tree. The inner lo op of the translation pro cedure is given in Figure 2. do { read the opcode read the arguments from stack and bytecode array create appropriate SUIF objects for arguments create appropriate SUIF instruction object if (no more instructions in expression tree) emit instruction else place address of instruction on SUIF stack } while (more instructions available) Figure 2: Converting Stack-Based Instructions to Tree Expressions Each expression tree is lab eled with the number of the instruction in the Java byteco de using annotations to the mrk instructions. These are used in determining the targets of jump instructions. 3 The byteco de instructions for jumps represent the target as an o set within the given metho d. We do not currently supp ort jumps for exception handling using the catch and finally clauses. Patching the right SUIF instruction to jump to requires two passes. The rst pass annotates the branch instructions with the numb er of the Javabyteco de instruction. The second pass will patch in the destination op erand of the SUIF jump instructions. long a,b; 1: mrk 0 ldc2_w #4 <Long 2> ["line': 0 "ClassA.bc"] a = 2; 3 lstore_1 2: str main.lv1 = e1 4 lload_1 3: e1: ldc t:g5 (i.64) 2 b = a * 3; 5 ldc2_w #6 <Long 3> 4: mrk 8 mul ["line": 8 "ClassA.bc"] 9 lstore_3 5: str main.lv2 = e1 10 return 6: e1: mul t:g5 (i.64) e2, e3 7: e2: lod t:g5 (i.64) main.lv1 8: e3: ldc t:g5 (i.64) 3 9: mrk ["line": 10 "ClassA.bc"] 10: ret <nullop> Java Source Java Bytecode SUIF representation Figure 3: Example of a Simple Expression Tree Figure 3 lists an example output from the translation pro cedure that converts stack-based co de to an expression tree.
Recommended publications
  • Truffle/C Interpreter
    JOHANNES KEPLER UNIVERSITAT¨ LINZ JKU Faculty of Engineering and Natural Sciences Truffle/C Interpreter Master’s Thesis submitted in partial fulfillment of the requirements for the academic degree Diplom-Ingenieur in the Master’s Program Computer Science Submitted by Manuel Rigger, BSc. At the Institut f¨urSystemsoftware Advisor o.Univ.-Prof. Dipl.-Ing. Dr.Dr.h.c. Hanspeter M¨ossenb¨ock Co-advisor Dipl.-Ing. Lukas Stadler Dipl.-Ing. Dr. Thomas W¨urthinger Xiamen, April 2014 Contents I Contents 1 Introduction 3 1.1 Motivation . .3 1.2 Goals and Scope . .4 1.3 From C to Java . .4 1.4 Structure of the Thesis . .6 2 State of the Art 9 2.1 Graal . .9 2.2 Truffle . 10 2.2.1 Rewriting and Specialization . 10 2.2.2 Truffle DSL . 11 2.2.3 Control Flow . 12 2.2.4 Profiling and Inlining . 12 2.2.5 Partial Evaluation and Compilation . 12 2.3 Clang . 13 3 Architecture 14 3.1 From Clang to Java . 15 3.2 Node Construction . 16 3.3 Runtime . 16 4 The Truffle/C File 17 4.1 Truffle/C File Format Goals . 17 4.2 Truffle/C File Format 1 . 19 4.2.1 Constant Pool . 19 4.2.2 Function Table . 20 4.2.3 Functions and Attributes . 20 4.3 Truffle/C File Considerations and Comparison . 21 4.3.1 Java Class File and Truffle/C File . 21 4.3.2 ELF and Truffle/C File . 22 4.4 Clang Modification Truffle/C File . 23 Contents II 5 Truffle/C Data Types 25 5.1 Data Type Hierarchy: Boxing, Upcasts and Downcasts .
    [Show full text]
  • DATA and C a Sample Program
    03 0672326965 CH03 10/19/04 1:53 PM Page 49 CHAPTER 3 DATA AND C You will learn about the following in this chapter: • Keywords: • The distinctions between integer int, short, long, unsigned, char, types and floating-point types float, double, _Bool, _Complex, • Writing constants and declaring _Imaginary variables of those types • Operator: • How to use the printf() and sizeof scanf() functions to read and • Function: write values of different types scanf() • The basic data types that C uses rograms work with data. You feed numbers, letters, and words to the computer, and you expect it to do something with the data. For example, you might want the com- puter to calculate an interest payment or display a sorted list of vintners. In this chap- ter,P you do more than just read about data; you practice manipulating data, which is much more fun. This chapter explores the two great families of data types: integer and floating point. C offers several varieties of these types. This chapter tells you what the types are, how to declare them, and how and when to use them. Also, you discover the differences between constants and variables, and as a bonus, your first interactive program is coming up shortly. A Sample Program Once again, you begin with a sample program. As before, you’ll find some unfamiliar wrinkles that we’ll soon iron out for you. The program’s general intent should be clear, so try compiling and running the source code shown in Listing 3.1. To save time, you can omit typing the com- ments.
    [Show full text]
  • C Programming Tutorial
    C Programming Tutorial C PROGRAMMING TUTORIAL Simply Easy Learning by tutorialspoint.com tutorialspoint.com i COPYRIGHT & DISCLAIMER NOTICE All the content and graphics on this tutorial are the property of tutorialspoint.com. Any content from tutorialspoint.com or this tutorial may not be redistributed or reproduced in any way, shape, or form without the written permission of tutorialspoint.com. Failure to do so is a violation of copyright laws. This tutorial may contain inaccuracies or errors and tutorialspoint provides no guarantee regarding the accuracy of the site or its contents including this tutorial. If you discover that the tutorialspoint.com site or this tutorial content contains some errors, please contact us at [email protected] ii Table of Contents C Language Overview .............................................................. 1 Facts about C ............................................................................................... 1 Why to use C ? ............................................................................................. 2 C Programs .................................................................................................. 2 C Environment Setup ............................................................... 3 Text Editor ................................................................................................... 3 The C Compiler ............................................................................................ 3 Installation on Unix/Linux ............................................................................
    [Show full text]
  • Chapter 11: Data Types in the Kernel
    ,ch11.3440 Page 288 Thursday, January 20, 2005 9:25 AM CHAPTER 11 Chapter 11 Data Types in the Kernel Before we go on to more advanced topics, we need to stop for a quick note on porta- bility issues. Modern versions of the Linuxkernel are highly portable, running on numerous different architectures. Given the multiplatform nature of Linux, drivers intended for serious use should be portable as well. But a core issue with kernel code is being able both to access data items of known length (for example, filesystem data structures or registers on device boards) and to exploit the capabilities of different processors (32-bit and 64-bit architectures, and possibly 16 bit as well). Several of the problems encountered by kernel developers while porting x86 code to new architectures have been related to incorrect data typing. Adherence to strict data typing and compiling with the -Wall -Wstrict-prototypes flags can prevent most bugs. Data types used by kernel data are divided into three main classes: standard C types such as int, explicitly sized types such as u32, and types used for specific kernel objects, such as pid_t. We are going to see when and how each of the three typing classes should be used. The final sections of the chapter talk about some other typi- cal problems you might run into when porting driver code from the x86 to other platforms, and introduce the generalized support for linked lists exported by recent kernel headers. If you follow the guidelines we provide, your driver should compile and run even on platforms on which you are unable to test it.
    [Show full text]
  • A Formal C Memory Model for Separation Logic
    Journal of Automated Reasoning manuscript No. (will be inserted by the editor) A Formal C Memory Model for Separation Logic Robbert Krebbers Received: n/a Abstract The core of a formal semantics of an imperative programming language is a memory model that describes the behavior of operations on the memory. Defining a memory model that matches the description of C in the C11 standard is challenging because C allows both high-level (by means of typed expressions) and low-level (by means of bit manipulation) memory accesses. The C11 standard has restricted the interaction between these two levels to make more effective compiler optimizations possible, at the expense of making the memory model complicated. We describe a formal memory model of the (non-concurrent part of the) C11 stan- dard that incorporates these restrictions, and at the same time describes low-level memory operations. This formal memory model includes a rich permission model to make it usable in separation logic and supports reasoning about program transfor- mations. The memory model and essential properties of it have been fully formalized using the Coq proof assistant. Keywords ISO C11 Standard · C Verification · Memory Models · Separation Logic · Interactive Theorem Proving · Coq 1 Introduction A memory model is the core of a semantics of an imperative programming language. It models the memory states and describes the behavior of memory operations. The main operations described by a C memory model are: – Reading a value at a given address. – Storing a value at a given address. – Allocating a new object to hold a local variable or storage obtained via malloc.
    [Show full text]
  • Codewarrior Development Studio for Starcore 3900FP Dsps Application Binary Interface (ABI) Reference Manual
    CodeWarrior Development Studio for StarCore 3900FP DSPs Application Binary Interface (ABI) Reference Manual Document Number: CWSCABIREF Rev. 10.9.0, 06/2015 CodeWarrior Development Studio for StarCore 3900FP DSPs Application Binary Interface (ABI) Reference Manual, Rev. 10.9.0, 06/2015 2 Freescale Semiconductor, Inc. Contents Section number Title Page Chapter 1 Introduction 1.1 Standards Covered............................................................................................................................................................ 7 1.2 Accompanying Documentation........................................................................................................................................ 8 1.3 Conventions...................................................................................................................................................................... 8 1.3.1 Numbering Systems............................................................................................................................................. 8 1.3.2 Typographic Notation.......................................................................................................................................... 9 1.3.3 Special Terms.......................................................................................................................................................9 Chapter 2 Low-level Binary Interface 2.1 StarCore Architectures......................................................................................................................................................11
    [Show full text]
  • Basic Data Types 1 Digital Number Systems
    13205 C Programming Lecture 3 - 23/Sept/2019 Basic Data Types c 2017 Abel Gomes All Rights Reserved Scribe: A. Gomes This lecture aims to respond to the following questions: • What is a data type? • Which is the main difference between a basic (or primitive) data type and an user-defined data type? • Which are the implications of defining an entity of a given type in terms of memory consumption? More specifically, the main goals of this lecture are: • To introduce the concept of data type. • To make clear which are the primitive data types that come with the C programming language. • To make clear the difference between data type and data format. • To master the conversion between number systems. 1 Digital Number Systems In computing, we usually deal with four number systems: binary, octal, decimal, and hexadecimal. 1.1 Number Bases Each number system has a distinct base: base 2 : which consists of two digits f0; 1g base 8 : which consists of eight digits f0; 1; 2; 3; 4; 5; 6; 7g base 10 : which consists of ten digits f0; 1; 2; 3; 4; 5; 6; 7; 8; 9g base 16 : which consists of sixteen digits f0; 1; 2; 3; 4; 5; 6; 7; 8; 9; A; B; C; D; E; F g In a digital number system, each number is formed by a sequence of digits belonging to its base. For example the number 2371 is valid in octal, decimal, and hexadecimal number systems, but no in the binary system because only the first digit 1 on the right-hand side belongs to the base 2.
    [Show full text]
  • C Data Types
    C Data Types Data Types in C Programming C Data Type specifies the type of data that we store in a variable. In C programming language, there are many data types and the type of data also specifies how much amount of memory is allocated to a specific variable. Data types are used along with variables and function’s return type. There are different data types. The main three classifications are : 1. Primary data types 2. Derived data types 3. Enumerated data types Primary Data Types Fundamental data types defined in C are called Primary Data Types. They are : 1. Integer a. Signed integer 1. int 2. Short 3. long b. unsigned integer 1. Int 2. Short 3. long 2. Float 1. float 2. double 3. long double 3. Character 1. signed char 2. unsigned char 4. Void Integer The below given data types will store whole numbers. Whole numbers are 0,1,2,3… Following table shows the ranges for different integer datatypes. Type Storage Space in bytes Range of whole numbers that can be stored Format Int (or)signed int 2 -32,768 to 32,767 %d Unsigned int 2 0 to 65,535 %u Short (or) signed short 1 -128 to 127 %d Unsigned short 1 0 to 255 %u Signed long 4 -2,147,483,648 to 2,147,483,647 %ld Unsigned long 4 0 to 4,294,967,295 %lu The most important objective of programming is saving memory to make the program run as fast as possible. To use memory efficiently, C programming has divided the data types as int, short, long, etc.
    [Show full text]
  • C Programming and Embedded Systems
    C Basics Introduction to the C Programming Language Review • Assembler Examples ▫ AVR Registers ▫ AVR IO ▫ AVR Addressing Modes ▫ Processor Review ▫ State Machine examples C History • Began at Bell labs between 1969 and 1973 • Strong ties to the development of the UNIX operating system ▫ C was developed to take advantage of byte- addressability where B could not • First published in 1978 ▫ Called K&R C Maximum early portability A psuedo “standard” before C was standardized The C Standard • First standardized in 1989 by American National Standards Institute (ANSI) ▫ Usually referred to C89 or ANSI C • Slightly modified in 1990 ▫ Usually C89 and C90 refer to essentially the same language • ANSI adopted the ISO.IEC 1999 standard in 2000 ▫ Referred to as C99 • C standards committee adopted C11 in 2011 ▫ Referred to as C11, and is the current standard ▫ Many still developed for C99 for compatability What is C? • Language that “bridges” concepts from high-level programming languages and hardware ▫ Assembly = low level ▫ Python = Very high level Abstracts hardware almost completely • C maintains control over much of the processor ▫ Can suggest which variables are stored in registers ▫ Don’t have to consider every clock cycle • C can be dangerous ▫ Type system error checks only at compile-time ▫ No garbage collector for memory management Programmer must manage heap memory manually C Resources • http://cslibrary.stanford.edu/101/EssentialC.pdf • http://publications.gbdirect.co.uk/c_book/ • MIT Open Courseware ▫ http://ocw.mit.edu/courses/#electrical-engineering-and- computer-science C vs. Java • C is a procedural language ▫ Centers on defining functions that perform single service e.g.
    [Show full text]
  • Data Types in C Language with Examples
    Data Types In C Language With Examples How red-light is Mateo when unallotted and subacid Adams hypnotise some haomas? Down-market and platinoid Ulric flatways,greets her she rusticators crackle heroutfly syncretisms while Orren rinses lustrating ruggedly. some Bunsen plaguily. Proconsular Osbourne decolor festively and Programmers have to manually include the final null character to strings made from programmerdefined arrays. Some memory that into the methods of stating this article deals with this information requested in language in data types c with examples of the table shows how to perform optimisations, blogger at all. What do you think? What is the use of the keyword import? The typedef keyword is used to explicitly associate a type with an identifier. What are all the data types in C programming with their details? Certain knowledge and types c source. The keyword which gives all this information is called the data type. So some programmers use integer type of variables even to store character values. Data types in C programming language Fixed-point integers Floating-point Data type limits Example over different data types variables Integer overflow system to. It is a term of converting one data type into another. Function declaration is required when you define a function in one source file and you call that function in another file. Employee has a length of nine. These invariant sections floating point number greater clarity or will discuss everything in outside the data type by commas, not allowed in the in data c language with examples of. You can only two nominal data types, examples in a way to compiler will get away with any or real constant values, which may accept.
    [Show full text]
  • Download, but There Are Some Other Paid C Compilers Also Available, Or Programmers Can Get It for Trial Version
    PPS Introduction To C In 1988, the American National Standards Institute (ANSI) had formalized the C language. C was invented to write UNIX operating system. C is a successor of 'Basic Combined Programming Language' (BCPL) called B language. Linux OS, PHP, and MySQL are written in C. C has been written in assembly language. Uses of C Programming Language In the beginning, C was used for developing system applications, e.g. : Database Systems Language Interpreters Compilers and Assemblers Operating Systems Network Drivers Word Processors C Has Become Very Popular for Various Reasons One of the early programming languages. Still, the best programming language to learn quickly. C language is reliable, simple and easy to use. C language is a structured language. Modern programming concepts are based on C. It can be compiled on a variety of computer platforms. Universities preferred to add C programming in their courseware. Features of C Programming Language C is a robust language with a rich set of built-in functions and operators. Programs written in C are efficient and fast. Problem Solving through programming in C Page 1 PPS C is highly portable, programs once written in C can be run on other machines with minor or no modification. C is a collection of C library functions; we can also create our function and add it to the C library. C is easily extensible. Advantages of C C is the building block for many other programming languages. Programs written in C are highly portable. Several standard functions are there (like in-built) that can be used to develop programs.
    [Show full text]
  • CSC 2400: Computer Systems Data Representation Computers and Programs
    CSC 2400: Computer Systems Data Representation Computers and Programs • A computer is basically a CPU processor (CPU) interacting with Control Data memory • Your program BUS (executable) must be first loaded into memory before it can start executing Your program Disk Memory Memory: Array of Bytes • Memory is basically an array of bytes, each with its own address • Memory addresses are defined using unsigned binary integers Memory: Array of Words 32-bit 64-bit Words Words Bytes Addr. • A word is a group of bytes 0000 Addr 0001 handled as a unit by the CPU = 0000?? 0002 – tied to the CPU architecture Addr 0003 = – natural storage size for 0000?? 0004 Addr 0005 numbers = 0004?? 0006 • Word address 0007 0008 – address of first byte in word Addr 0009 = – addresses of successive words 0008?? 0010 Addr differ by 4 (32-bit) or 8 (64-bit) = 0011 0008?? 0012 Addr 0013 = 0012?? 0014 0015 Memory and Variables q What happens when you declare a variable? - The compiler allocates a memory box for that variable - How big a box? o Depends on the type of the variable Memory Memory Address Value char c = ‘A’; 0016 01000001 One Annoying Thing: Byte Order q Hosts differ in how they store data - E.g., four-byte number (byte3, byte2, byte1, byte0) q Little endian (“little end comes first”) ß Intel PCs!!! - Low-order byte stored at the lowest memory location - Byte0, byte1, byte2, byte3 q Big endian (“big end comes first”) - High-order byte stored at lowest memory location - Byte3, byte2, byte1, byte 0 q Makes it more difficult to write portable code - Client
    [Show full text]