<<

CMSC 430 Introduction to Spring 2015

Intermediate Representations and Formats Introduction

Front end Source AST/IR code Lexer Parser Types

IR2 IRn IRn .s

Middle end Back end

■ Front end — syntax recognition, semantic , produces first AST/IR ■ Middle end — transforms IR into equivalent IRs that are more efficient and/or closer to final IR ■ Back end — translates final IR into or code

2 Three-address code • Classic IR used in many compilers (or, at least, textbooks) • Core statements have one of the following forms ■ x = y op z binary operation ■ x = op y unary operation ■ x = y copy statement ! • Example: ! = 2 * y z = x + 2 * y; z = x + t ! ■ Need to introduce temporarily variables to hold intermediate computations ■ Notice: closer to 3 Control Flow in Three-Address Code • How to represent control flow in IRs? ■ l: statement labeled statement ■ goto l unconditional jump ■ if x rop y goto l conditional jump (rop = relational op)

! • Example

t = x + 2 if (x + 2 > 5) if t > 5 goto l1 y = 2; y = 3 else goto l2 y = 3; l1: y = 2 x++; l2: x = x + 1

4 Looping in Three-Address Code • Similar to conditionals

! x = 10 x = 10; l1: if (x == 0) goto l2 while (x != 0) { ! a = a * 2 a = a * 2; x = x + 1 x++; ! goto l1 } l2: y = 20 y! = 20;

! ■ The line labeled l1 is called the loop header, i.e., it’s the target of the backward branch at the bottom of the loop ■ Notice same code generated for

for (x = 10; x != 0; x++) a = a * 2; y = 20;

5 Basic Blocks • A block is a sequence of three-addr code with ■ (a) no jumps from it except the last statement ■ () no jumps into the middle of the basic block ! • A control flow graph (CFG) is a graphical representation of the basic blocks of a three- address program ■ Nodes are basic blocks ■ Edges represent jump from one basic block to another - Conditional branches identify true/false cases either by convention (e.g., all left branches true, all right branches false) or by labeling edges with true/false condition ■ Compiler may or may not create explicit CFG structure

6 Example

1. a = 1 2. b = 10 1. a = 1 2. b = 10 3. = a + b 3. c = a + b 4. = a - b 4. d = a - b 5. if (d < 10) goto 9 5. d < 10 6. e = c + d 7. d = c + d 8. goto 3 6. e = c + d 9. e = c - d 9. e = c - d 7. d = c + d 10. e < 5 10. if (e < 5) goto 3 11. a = a + 1

11. a = a + 1

7 Levels of Abstraction • Key design feature of IRs: what level of abstraction to represent ■ if x rop y goto l with explicit relation, OR ■ t = x rop y; if t goto l only booleans in guard ■ Which is preferable, under what circumstances? ! • Representation of arrays ■ x = y[z] high-level, OR ■ t = y + 4*z; x = *t; low-level (ptr arith) ■ Which is preferable, under what circumstances?

8 Levels of Abstraction (cont’d) • Function calls? ■ Should there be a function call instruction, or should the be made explicit? - Former is easier to work with, latter may enable some low-level optimizations, e.g.,passing parameters in registers • Virtual method dispatch? ■ Same as above • Object construction ■ Distinguished “new” call that invokes constructor, or separate object allocation and initialization?

9 Virtual • An IR has a semantics • Can interpret it using a virtual machine ■ virutal machine ■ Lua virtual machine ■ “Virtual” just means implemented in , rather than hardware, but even hardware uses some interpretation - E.g., x86 processor has complex instruction set that’s internally interpreted into much simpler form • Tradeoffs?

10 (JVM) • JVM memory model ■ Stack (function call frames, with local variables) ■ Heap (dynamically allocated memory, garbage collected) ■ Constants • Bytecode files contain ■ Constant pool (shared constant ) ■ Set of classes with fields and methods - Methods contain instructions in language - Use javap -c to disassemble Java programs so you can look at their bytecode

11 JVM Semantics

• Documented in the form of a 500 page, English language book ■ http://java.sun.com/docs/books/ jvms/ • Many concerns ■ Binary format of bytecode files - Including constant pool ■ Description of model (running individual instructions) ■ Java bytecode verifier ■ Thread model

12 JVM Design Goals • Type- and memory-safe language ■ Mobile code—need safety and security • Small file size ■ Constant pool to share constants ■ Each instruction is a (only 256 possible instructions) • Good performance • Good match to Java

13 JVM • From the JVM book: ■ Virtual Machine Start-up ■ Loading ■ Linking: Verification, Preparation, and Resolution ■ Initialization ■ Detailed Initialization Procedure ■ Creation of New Class Instances ■ Finalization of Class Instances ■ Unloading of Classes and Interfaces ■ Virtual Machine Exit

14 JVM Instruction Set • Stack-based language ■ All instructions take operands from the stack • Categories of instructions ■ Load and store (e.g. aload_0,istore) ■ Arithmetic and logic (e.g. ladd,fcmpl) ■ (e.g. i2b,d2i) ■ Object creation and manipulation (new,putfield) ■ Operand stack management (e.g. swap,dup2) ■ Control transfer (e.g. ifeq,goto) ■ Method invocation and return (e.g. invokespecial,areturn) ! - (from http://en.wikipedia.org/wiki/Java_bytecode)

15 Example

class A { public static void main(void) { System.out.println(“Hello, world!”); } }

• Try compiling with javac, look at result using javap -c • Things to look for: ■ Various instructions; references to classes, methods, and fields; exceptions; type information • Things to think about: ■ File size really compact (Java → J)? Mapping onto machine instructions; performance; amount of abstraction in instructions 16 Dalvik Virtual Machine • Alternative target for Java • Developed by Google for Android phones ■ Register-, rather than stack-, based ■ Designed to be even more compact • .dex (Dalvik) files are part of apk’s that are installed on phones (apks are zip files, essentially) ■ All classes must be joined together in one big .dex file, contrast with Java where each class separate ■ .dex produced from .class files

17 Compiling to .dex

.class files .dex file • Many .class files ⇒ one .dex file Header Constant pool 1

Class 1 Class info 1 Constant pool • Enables more (a) Source Code Data 1 sharing Class definition 1 Source for this and several of the following slides:: Class definition 2 Octeau, Enck, and McDaniel. The ded . Constant pool 2 Networking and Security Research Center Tech Report NAS-TR-0140-2010, The Pennsylvania State University. May 2011. http://siis.cse.psu.edu/ded/ Class 2 Class info 2 papers/NAS-TR-0140-2010.pdf Class definition n Data 2 (b) Java (stack) bytecode Data

Constant pool n Class n Class info n Data n (c) Dalvik (register) bytecode

Figure 3: Register vs. Stack 18 Figure 2: dx compilation of Java classes to a DVM application (simplified view). Constant pool structure - Java applications necessarily repli- cate elements in constant pools within the multiple .class files, e.g., referrer and referent method names. The dx com- initions, and the data segment. A constant pool describes, piler attempts to reduce application size by eliminating much not surprisingly, the constants used by a class. This includes of this replication. Dalvik uses a single large constant pool among other items references to other classes, method names that all classes simultaneously reference. Additionally, dx and numerical constants. The class definitions consist in eliminates some constants by inlining their values directly the basic information such as access flags and class names. into the bytecode. In practice, pool elements for integers, The data element contains the method code executed by the long integers, and single and double precision floating-point target VM, as well as other information related to methods constants simply disappear as bytecode constants during the (e.g., number of DVM registers used, local variable table and transformation process. operand stack sizes) and to class and instance variables. Control flow Structure - Programmatic control flow ele- Register architecture - The DVM is register-based, whereas ments such as loops, switch statements and exception han- existing JVMs are stack-based. Java bytecode can assign dlers are structured very di↵erently by Dalvik and Java byte- local variables to a local variable table before pushing them code. To simplify, Java bytecode structure loosely mirrors onto an operand stack for manipulation by opcodes, but it the source code, whereas Dalvik bytecode does not. The can also just work on the stack without explicitly storing restructuring is likely performed to increase performance, variables in the table. Dalvik bytecode assigns local vari- reduce code size, or address changes in the way the under- ables to any of the 216 available registers. The Dalvik op- lying architecture handles variable types and registers. codes directly manipulate registers, rather than accessing elements on a program stack. For example, Figure 3 shows Ambiguous typing of primitive types - Java bytecode vari- how a typical operation (“add” function Figure 3(a)) is im- able assignments distinguish between integer (int) and single- plemented in Java (stack) versus Dalvik (register) virtual precision floating-point (float) constants and between long machine. The Java bytecode shown in Figure 3(b) pushes integer (long) and double-precision floating-point (double) the local variables a and b onto the operand stack using the constants. However, Dalvik constant assignments (int/float iload , and returns the result from the stack via the and long/double) use the same opcodes for integers and ireturn opcode. By comparison, the Dalvik bytecode shown floats, e.g., the opcodes are untyped beyond specifying pre- in Figure 3(c) simply references the registers directly. cision. This complicates decompilation of Dalvik bytecode because the variable type is not indicated by its declaration. Instruction set - The Dalvik bytecode instruction set is sub- Thus, the decompilation process must observe a variable cre- stantially di↵erent than that of Java: 218 opcodes vs. 200, ation and inspect its subsequent use to infer its type to cre- respectively. The nature of the opcodes is very di↵erent: ate accurate correct Java bytecode and constant pools. This for example, Java has tens of opcodes dedicated to push- is a specific instance of a broad class of widely-studied type ing and pulling elements between the stack and local vari- inference problems. Note that an incorrect inference (and able table. Obviously the Dalvik bytecode instruction set thus type) may result in incorrect behavior (and analysis) does not require any such opcodes. Moreover, as illustrated of the decompiled program. in the example in Figure 3, Dalvik instructions tend to be longer than Java instructions as they include the source and Null references - The Dalvik bytecode does not specify a destination registers (when needed). As a result, Dalvik ap- null type, instead opting to use a zero value constant. Thus, plications require fewer instructions. Applications encoded constant zero values present in the Dalvik bytecode have in Dalvik bytecode have on average 30% fewer instructions ambiguous typing that must be recovered. The decompila- than in Java, but have a 35% larger code size () [17]. tion process must recover the null type by inspecting the This increased code size has limited impact on performance, variable’s use in situ.Ifthenull type is not correctly re- as the DVM reads instructions by units of two bytes. covered, the resulting bytecode can have illegal integer zero

3 Dalvik is Register-Based

.class files .dex file

Header Constant pool 1

Class 1 Class info 1 Constant pool (a) Source Code Data 1 Class definition 1 Class definition 2 Constant pool 2 Class 2 Class info 2 Class definition n Data 2 (b) Java (stack) bytecode Data

Constant pool n Class n Class info n Data n (c) Dalvik (register) bytecode

Figure 3: Register vs. Stack Opcodes

Figure 2: dx compilation of Java classes to a DVM 19 application (simplified view). Constant pool structure - Java applications necessarily repli- cate elements in constant pools within the multiple .class files, e.g., referrer and referent method names. The dx com- initions, and the data segment. A constant pool describes, piler attempts to reduce application size by eliminating much not surprisingly, the constants used by a class. This includes of this replication. Dalvik uses a single large constant pool among other items references to other classes, method names that all classes simultaneously reference. Additionally, dx and numerical constants. The class definitions consist in eliminates some constants by inlining their values directly the basic information such as access flags and class names. into the bytecode. In practice, pool elements for integers, The data element contains the method code executed by the long integers, and single and double precision floating-point target VM, as well as other information related to methods constants simply disappear as bytecode constants during the (e.g., number of DVM registers used, local variable table and transformation process. operand stack sizes) and to class and instance variables. Control flow Structure - Programmatic control flow ele- Register architecture - The DVM is register-based, whereas ments such as loops, switch statements and exception han- existing JVMs are stack-based. Java bytecode can assign dlers are structured very di↵erently by Dalvik and Java byte- local variables to a local variable table before pushing them code. To simplify, Java bytecode structure loosely mirrors onto an operand stack for manipulation by opcodes, but it the source code, whereas Dalvik bytecode does not. The can also just work on the stack without explicitly storing restructuring is likely performed to increase performance, variables in the table. Dalvik bytecode assigns local vari- reduce code size, or address changes in the way the under- ables to any of the 216 available registers. The Dalvik op- lying architecture handles variable types and registers. codes directly manipulate registers, rather than accessing elements on a program stack. For example, Figure 3 shows Ambiguous typing of primitive types - Java bytecode vari- how a typical operation (“add” function Figure 3(a)) is im- able assignments distinguish between integer (int) and single- plemented in Java (stack) versus Dalvik (register) virtual precision floating-point (float) constants and between long machine. The Java bytecode shown in Figure 3(b) pushes integer (long) and double-precision floating-point (double) the local variables a and b onto the operand stack using the constants. However, Dalvik constant assignments (int/float iload opcode, and returns the result from the stack via the and long/double) use the same opcodes for integers and ireturn opcode. By comparison, the Dalvik bytecode shown floats, e.g., the opcodes are untyped beyond specifying pre- in Figure 3(c) simply references the registers directly. cision. This complicates decompilation of Dalvik bytecode because the variable type is not indicated by its declaration. Instruction set - The Dalvik bytecode instruction set is sub- Thus, the decompilation process must observe a variable cre- stantially di↵erent than that of Java: 218 opcodes vs. 200, ation and inspect its subsequent use to infer its type to cre- respectively. The nature of the opcodes is very di↵erent: ate accurate correct Java bytecode and constant pools. This for example, Java has tens of opcodes dedicated to push- is a specific instance of a broad class of widely-studied type ing and pulling elements between the stack and local vari- inference problems. Note that an incorrect inference (and able table. Obviously the Dalvik bytecode instruction set thus type) may result in incorrect behavior (and analysis) does not require any such opcodes. Moreover, as illustrated of the decompiled program. in the example in Figure 3, Dalvik instructions tend to be longer than Java instructions as they include the source and Null references - The Dalvik bytecode does not specify a destination registers (when needed). As a result, Dalvik ap- null type, instead opting to use a zero value constant. Thus, plications require fewer instructions. Applications encoded constant zero values present in the Dalvik bytecode have in Dalvik bytecode have on average 30% fewer instructions ambiguous typing that must be recovered. The decompila- than in Java, but have a 35% larger code size (bytes) [17]. tion process must recover the null type by inspecting the This increased code size has limited impact on performance, variable’s use in situ.Ifthenull type is not correctly re- as the DVM reads instructions by units of two bytes. covered, the resulting bytecode can have illegal integer zero

3 JVM Levels of Indirection

CONSTANT_Utf8_info tag = 1 length CONSTANT_Class_info bytes CONSTANT_Methodref_info tag = 7 CONSTANT_Utf8_info tag = 10 name_index tag = 1 class_index CONSTANT_NameAndType_info length name_and_type_index tag = 11 bytes name_index CONSTANT_Utf8_info descriptor_index tag = 1 length bytes

(a) Java constant pool entry

string_id_item string_data_off string_data_item type_id_item string_id_item utf16_size 20 descriptor_idx string_data_off data method_id_item proto_id_item type_id_item string_data_item string_data_item class_idx shorty_idx descriptor_idx utf16_size utf16_size proto_idx return_type_idx data data type_list string_data_item name_idx paramaters_off string_id_item type_id_item string_id_item size utf16_size string_data_off descriptor_idx string_data_off string_id_item list data string_data_off type_item string_data_item type_idx utf16_size data

(b) Dalvik constant pool entry

Figure 5: Constant pool entry structure for a method reference

Table 1: Example Dalvik to Java bytecode translation rules Dalvik Bytecode Instructions Java Bytecode Instructions Description add-int d0,s0,s1 iload s0 Integer addition iload s10 iadd istore d0 invoke-virtual s0,...,sk,mr iload s0 Virtual method invocation with move-result d0 ... assigned return value. iload sl0 invokevirtual mr0 istore d0 for a .class file. Constants include references to classes, more indirection. ded ignores Dalvik-specific entries such as methods, and instance variables. ded traverses the byte- “shorty” strings used as simplified method descriptors. code for each method in a class, noting such references. ded also identifies all constant primitives specified in the Dalvik 4.3 Method Code bytecode. Here, ded notes the types determined during type The final stage of the retargeting process is the translation inference, described in Section 4.1. of the method code. This is a two stage process, as shown Once ded identifies the constants required by a class, it in Figure 4. First, we preprocess the bytecode to reorganize adds them to the target .class file. For primitive type structures that cannot be directly retargeted. Second, we constants, new entries are created. For class, method, and linearly translate DVM bytecode to the JVM. instance variable references, we must populate the Java con- The preprocessing phase considers multidimensional ar- stant pool entry based on information available in the Dalvik rays. Both Dalvik and Java use blocks of bytecode instruc- constant pool. The two constant pool entries di↵er in com- tions to create multidimensional arrays; however, the in- plexity. Specifically, Dalvik constant pool entries use signif- structions have di↵erent semantics and layout. ded reorders icantly more references to reduce memory overhead. and annotates the bytecode with array size and type infor- Figure 5 depicts the method entry constant in both Java mation to allow linear instruction translation. and Dalvik formats. Other constant pool entry types have The bytecode translation linearly processes each Dalvik similar structures. Each box is a . Index instruction. First, ded maps each referenced register to entries (denoted as “idx” for the Dalvik format) are pointers a Java local variable table index. Second, ded performs to a data structure. The Java method constant pool entry, an instruction translation for each encountered Dalvik in- Figure 5(a), provides three strings: 1) the class name, 2) the struction. As Dalvik bytecode is more compact and takes method name, and 3) a descriptor string representing the more arguments, one Dalvik instruction frequently expands argument and return types. The Dalvik method constant to multiple Java instructions. Third, ded patches the rela- pool entry, Figure 5(b), also contains these strings, but uses tive o↵sets used for branches based on preprocessing anno-

5 CONSTANT_Utf8_info tag = 1 length CONSTANT_Class_info bytes CONSTANT_Methodref_info tag = 7 CONSTANT_Utf8_info tag = 10 name_index tag = 1 class_index CONSTANT_NameAndType_info length name_and_type_index tag = 11 bytes name_index CONSTANT_Utf8_info descriptor_index tag = 1 length bytes

CONSTANT_Utf8_info tag = 1 (a) Java constant pool entry Dalvik Levelslength of Indirection CONSTANT_Class_info bytes CONSTANT_Methodref_info tag = 7 CONSTANT_Utf8_infostring_id_item tag = 10 name_index tag = 1 string_data_off string_data_item class_index CONSTANT_NameAndType_info type_id_itemlength string_id_item utf16_size name_and_type_index tag = 11 descriptor_idxbytes string_data_off data name_index method_id_item proto_id_itemCONSTANT_Utf8_info string_data_item string_data_item descriptor_index type_id_item class_idx shorty_idxtag = 1 descriptor_idx utf16_size utf16_size proto_idx return_type_idxlength data data bytes type_list string_data_item name_idx paramaters_off string_id_item type_id_item string_id_item size utf16_size string_data_off descriptor_idx string_data_off string_id_item list data (a) Java constant pool entrystring_data_off type_item string_data_item type_idx utf16_size data string_id_item string_data_off string_data_item type_id_item string_id_item utf16_size (b) Dalvik(similarthesefor edges) constant pool entry descriptor_idx string_data_off data method_id_item proto_id_item type_id_item string_data_item string_data_item class_idx shorty_idx descriptor_idx utf16_size utf16_sizeFigure 5: Constant pool entry structure for a method reference proto_idx return_type_idx data data type_list string_data_item name_idx paramaters_off string_id_item type_id_item string_id_item size utf16_size string_data_off descriptor_idx string_data_off string_id_item list Table 1: Example Dalvikdata to Java bytecode translation rules string_data_off type_item string_data_item type_idxDalvik Bytecode Instructions Java Bytecode Instructions Description utf16_size data add-int d0,s0,s1 iload s0 21 Integer addition iload s10 (b) Dalvik constant pool entry iadd istore d0 Figure 5: Constant pool entryinvoke-virtual structure fors0,...,s a methodk,mr reference iload s0 Virtual method invocation with move-result d0 ... assigned return value. iload sl0 Table 1: Example Dalvik to Java bytecode translation rules invokevirtual mr0 istore d0 Dalvik Bytecode Instructions Java Bytecode Instructions Description 0 add-int d0,s0,s1 iload s0 Integer addition iload s10 iaddfor a .class file. Constants include references to classes, more indirection. ded ignores Dalvik-specific entries such as methods, and instance variables. ded traverses the byte- “shorty” strings used as simplified method descriptors. istore d0 code for each0 method in a class, noting such references. ded invoke-virtual s ,...,s ,m iload s0 Virtual method invocation with 0 k also identifies0 all constant primitives specified in the Dalvik 4.3 Method Code Retargeting move-result d0 ... assigned return value. bytecode. Here, ded notes the types determined during type The final stage of the retargeting process is the translation iload s0 inference,l described in Section 4.1. of the method code. This is a two stage process, as shown invokevirtual m0 Once ded identifiesr the constants required by a class, it in Figure 4. First, we preprocess the bytecode to reorganize istore d0 adds them0 to the target .class file. For primitive type structures that cannot be directly retargeted. Second, we constants, new entries are created. For class, method, and linearly translate DVM bytecode to the JVM. instance variable references, we must populate the Java con- The preprocessing phase considers multidimensional ar- for a .class file. Constants include references tostant classes, pool entrymore based indirection. on information ded ignores available Dalvik-specific in the Dalvik entriesrays. such Both as Dalvik and Java use blocks of bytecode instruc- methods, and instance variables. ded traverses theconstant byte- pool.“shorty” The two strings constant used pool as entries simplified di↵er method in com- descriptors.tions to create multidimensional arrays; however, the in- code for each method in a class, noting such references.plexity. ded Specifically, Dalvik constant pool entries use signif- structions have di↵erent semantics and layout. ded reorders also identifies all constant primitives specified in theicantly Dalvik more references4.3 Method to reduce Code memory Retargeting overhead. and annotates the bytecode with array size and type infor- bytecode. Here, ded notes the types determined duringFigure type 5 depictsThe the final method stage entry of the constant retargeting in both process Java is the translationmation to allow linear instruction translation. inference, described in Section 4.1. and Dalvik formats.of the method Other constant code. This pool is entry a two types stage have process, asThe shown bytecode translation linearly processes each Dalvik Once ded identifies the constants required by a class,similar it structures.in Figure Each 4. box First, is we a data preprocess structure. the bytecode Index to reorganizeinstruction. First, ded maps each referenced register to adds them to the target .class file. For primitiveentries type (denotedstructures as “idx” for that the cannot Dalvik be format) directly are retargeted. pointers Second,a Java we local variable table index. Second, ded performs constants, new entries are created. For class, method,to a and data structure.linearly The translate Java method DVM bytecode constant to pool the entry, JVM. an instruction translation for each encountered Dalvik in- instance variable references, we must populate the JavaFigure con- 5(a), providesThe preprocessingthree strings: 1) phase the class considers name, multidimensional 2) the struction. ar- As Dalvik bytecode is more compact and takes method name, and 3) a descriptor string representing the stant pool entry based on information available in the Dalvik rays. Both Dalvik and Java use blocks of bytecodemore instruc- arguments, one Dalvik instruction frequently expands argument and return types. The Dalvik method constant constant pool. The two constant pool entries di↵er in com- tions to create multidimensional arrays; however,to the multiple in- Java instructions. Third, ded patches the rela- pool entry, Figure 5(b), also contains these strings, but uses plexity. Specifically, Dalvik constant pool entries use signif- structions have di↵erent semantics and layout. dedtive reorders o↵sets used for branches based on preprocessing anno- icantly more references to reduce memory overhead. and annotates the bytecode with array size and type infor- Figure 5 depicts the method entry constant in both Java mation to allow linear instruction translation. 5 and Dalvik formats. Other constant pool entry types have The bytecode translation linearly processes each Dalvik similar structures. Each box is a data structure. Index instruction. First, ded maps each referenced register to entries (denoted as “idx” for the Dalvik format) are pointers a Java local variable table index. Second, ded performs to a data structure. The Java method constant pool entry, an instruction translation for each encountered Dalvik in- Figure 5(a), provides three strings: 1) the class name, 2) the struction. As Dalvik bytecode is more compact and takes method name, and 3) a descriptor string representing the more arguments, one Dalvik instruction frequently expands argument and return types. The Dalvik method constant to multiple Java instructions. Third, ded patches the rela- pool entry, Figure 5(b), also contains these strings, but uses tive o↵sets used for branches based on preprocessing anno-

5 Discussion • Why did Google invent its own VM? ■ Licensing fees? (C.f. current lawsuit between Oracle and Google) ■ Performance? ■ Code size? ■ Anything else?

22 Just-in-time Compilation (JIT) • Virtual machine that compiles some bytecode all the way to machine code for improved performance ■ Begin interpreting IR ■ Find performance critical sections ■ Compile those to native code ■ Jump to native code for those regions • Tradeoffs? ■ Compilation time becomes part of execution time

23 Trace-Based JIT • Recently popular idea for Javascript interpreters ■ JS hard to compile efficiently, because of large distance between its semantics and machine semantics - Many unknowns sabotage optimizations, e.g., in e.m(...), what method will be called? • Idea: find a critical (often used) trace of a section of the program’s execution, and compile that ■ Jump into the compiled code when hit beginning of trace ■ Need to be able to back out in case conditions for taking trace are not actually met

24