Bytecode Manipulation Techniques for Dynamic Applications for the Virtual Machine

Eugene Kuleshov, Terracotta Tim Eck, Terracotta Tom Ware, Charles Nutter, , Inc.

TS-1326

2007 JavaOneSM Conference | Session TS-1326 | Goal Bytecode manipulation isn’t difficult and it is very cool

Understand how dynamic frameworks for Java™ platform do their job and how these ideas can be used in other applications.

.

2007 JavaOneSM Conference | Session TS-1326 | 2 Agenda

Java Virtual Machine, Bytecode and ASM Framework Lazy Attributes in Java Persistence API (JPA) (TopLink) Terracotta DSO Ruby to Java Compiler in JRuby Summary

The terms “” and “JVM” mean a Virtual Machine for the Java™ platform.

2007 JavaOneSM Conference | Session TS-1326 | 3 Java Virtual Machine (JVM™)

• Proven platform for running reliable and high-performance applications • Built for the statically-typed language • Class-loading architecture and reflection API enables dynamic code • Many frameworks need more • Introduce additional logic into the existing code • Increase performance • Non-Java programming languages

The terms “Java Virtual Machine” and “JVM” mean a Virtual Machine for the Java™ platform.

2007 JavaOneSM Conference | Session TS-1326 | 4 The Class File Format

• Constant Pool • Field and method names, type descriptors • String literals and other constants • Attributes • Fields • Methods • Code (ordered list of instructions) • Debug information (line numbers, local variable names) • Exceptions • User defined attributes

Source: The Java Virtual Machine Specification

2007 JavaOneSM Conference | Session TS-1326 | 5 Class File Modification Problems

• Lots of serialization and deserialization details • Constant pool management • Missing or unused constants • Managing constant pool indexes/references • Jump offsets • Inserting or removing instructions from the method • Computation of stack size and StackMapTable • Requires a control flow analysis

2007 JavaOneSM Conference | Session TS-1326 | 6 ASM Bytecode Framework

• Goal: Dynamic class generation and modification • Very small and very fast tool • Tool primarily adapted for simple transformations • Complete control over the produced classes is not needed • Approach • Use the Visitor pattern without using in-memory object model • Hide the (de)serialization and constant pool management details • Represent jump offsets by Label marker objects • Checker and ASMifier tools helps with the method code • Automatic computation of the max stack size and StackMapTable

Source: ASM project—http://asm.objectweb.org/

2007 JavaOneSM Conference | Session TS-1326 | 7 ASM Bytecode Framework Main idea ClassReader ClassAdapter ClassWriter accepts ClassVisitor implements ClassVisitor implements ClassVisitor

accept(v) visit("C", "Object", …) visit("C", "Object", …)

visitField("i", "byte", …) visitField("i", "byte", …) visitField("_i", "int", …)

visitFieldInsn(GETFIELD, "i") visitMethodInsn( serialized class (byte array) serialized class (byte array) INVOKEVIRTUAL, "_getI") toByteArray()

Source: ASM project—http://asm.objectweb.org/

2007 JavaOneSM Conference | Session TS-1326 | 8 ASM Bytecode Framework Example ClassReader cr = new ClassReader(bytecode); ClassWriter cw = new ClassWiter(cr, ClassWriter.COMPUTE_MAXS); FooClassAdapter cv = new FooClassAdapter(cw); cr.accept(cv, 0);

// load the new class final byte[] bytes = cw.toByteArray(); Class newClass = new ClassLoader(parent) { Class c = defineClass(name, bytes, 0, bytes.length); }.c;

Source: ASM project—http://asm.objectweb.org/

2007 JavaOneSM Conference | Session TS-1326 | 9 ASM Bytecode Framework Framework Organization • Core (only 36Kb) • Generate classes • Basic transformations • Tree and Analysis • In-memory representation and analysis algorithms • Commons • Renaming, sort local variables, inline subroutines (JSR/RET instructions), calculate serialVersionUID, advice adapter, etc. • Util • Checker, decompiler/tracer and ASMifier utils • XML • XSLT-based transformations and querying

Source: ASM project—http://asm.objectweb.org/

2007 JavaOneSM Conference | Session TS-1326 | 10 Agenda

Java Virtual Machine, Bytecode and ASM Framework Lazy Attributes in JPA API (TopLink) Terracotta DSO Ruby to Java Compiler in JRuby Summary

2007 JavaOneSM Conference | Session TS-1326 | 11 TopLink JPA

• Java Persistence API (JPA) • Provides a standard API for Object/Relational mapping • Most common use: Mapping Java objects to relational databases • TopLink is an advanced object mapping library • Open source JPA API reference implementation: TopLink Essentials (GlassFish™ project) • Open source: EclipseLink— Java Persistence Platform Project • Oracle TopLink

2007 JavaOneSM Conference | Session TS-1326 | 12 JPA API Lazy Loading

1 Employee 1 address 1 Address

managed m

• Loading Employee could result in address and managed employees being loaded • Potentially loads a large amount of data • JPA API allows these relationships to be LAZY • Lazy relationships are fetched only when needed • Weaving is used to make 1-to-1 relationships LAZY (address)

2007 JavaOneSM Conference | Session TS-1326 | 13 JPA API Lazy Loading

• JPA API allows both field access and property access • For field access, we replace any access to the field with a call to a method we add • For property access, we weave the getter and setter methods to add some additional code • The weaving code inserts a proxy object called a ValueHolder to represent the relationship

Employee address_vh ValueHolder value Address

2007 JavaOneSM Conference | Session TS-1326 | 14 Employee.java

/** * A simple Employee class using field access */ @Entity public class Employee { ... @OneToOne(fetch=LAZY) private Address address;

public Address getAddress() { return address; }

public void setAddress(Address address) { this.address = address; } }

2007 JavaOneSM Conference | Session TS-1326 | 15 Employee.java woven

/** * Employee class using field access after weaving */ @Entity public class Employee { ... @OneToOne(fetch=LAZY) private Address address; @Transient private ValueHolder _toplink_address_vh;

public Address getAddress() { return _toplink_getaddress(); }

public void setAddress(Address address) { _toplink_setaddress(address); } 2007 JavaOneSM Conference | Session TS-1326 | 16 Employee.java woven (Cont.)

// added through weaving public Address _toplink_getaddress() { address = _toplink_address_vh.getValue(); return address; }

// added through weaving public void _toplink_setaddress(Address address) { _toplink_address_vh.setValue(address); this.address = address; }

}

2007 JavaOneSM Conference | Session TS-1326 | 17 Replacing a Variable Reference

// this method is called by the visitFieldInsn() callback public void weaveAttributesIfRequired(int opcode, String owner, String name, String desc) { ... if (opcode == GETFIELD && attributeDetails != null) { cv.visitMethodInsn( INVOKEVIRTUAL, tcw.classDetails.getClassName(), "_toplink_get" + name, "()L" + attributeDetails.getReferenceClassType() .getDescriptor() ); } else { super.visitFieldInsn(opcode, owner, name, desc); } }

2007 JavaOneSM Conference | Session TS-1326 | 18 Bytecode Transformation in TopLink

• Benefits/suggestions • Allows us to add features that can be used in more intuitive way • Combination of the ‘ASMifier’ and your favourite decompiler make it fairly easy to prototype weaving code • Strong need for very well commented code • Challenges • Designing to avoid unintended side effects • Do not benefit from some compiler features (e.g., primitive wrapping) • Additional uses • Optimizing change set calculation • Fetch groups • Read-only validation

2007 JavaOneSM Conference | Session TS-1326 | 19 Agenda

Java Virtual Machine, Bytecode and ASM framework Lazy attributes in JPA API (TopLink) Terracotta DSO Ruby to Java Compiler in JRuby Summary

2007 JavaOneSM Conference | Session TS-1326 | 20 Terracotta DSO Distributed Shared Objects • What is DSO? • Object distribution and thread coordination across VMs • Open source • It’s just Java code • No APIs Æ plain objects • Existing language threading primitives synchronized, wait() / notify() • Why ASM for DSO? • Fast/small • Actively maintained and supported • Widely adopted, open source

2007 JavaOneSM Conference | Session TS-1326 | 21 Basic Terracotta Concepts

• DSO “root” objects • Root objects are the top most object nodes of a distributed object graph • Roots are bound to fields in your classes • Objects referenced by the graph starting from a root become distributed

class MyAppType { Map m; // root field }

2007 JavaOneSM Conference | Session TS-1326 | 22 Terracotta ASM Use Field/Array Operations • PUTFIELD / AASTORE • Object state mutations are recorded and broadcast to other VMs that contain the same object • GETFIELD / AALOAD • Allow lazy loading of portions of the object graph • Record access frequency for use in eviction policy

2007 JavaOneSM Conference | Session TS-1326 | 23 Terracotta ASM Use Field Operations class Foo { private Bar bar; public Bar get() { return bar; }

public void set(Bar bar) { this.bar = bar; } }

2007 JavaOneSM Conference | Session TS-1326 | 24 Terracotta ASM Use Field Operations class Foo { private Bar bar; public Bar get() { if (isShared() && bar==null) bar = ManagerUtil.resolveReference(this, “Foo.bar"); return bar; }

public void set(Bar bar) { if (isShared()) ManagerUtil.fieldChanged(this, “Foo.bar”, bar); this.bar = bar; } }

2007 JavaOneSM Conference | Session TS-1326 | 25 Terracotta ASM Use Distributed Object Monitors • MONITORENTER / MONITOREXIT

synchronized(obj) { // MONITORENTER ... } // MONITOREXIT

• If a distributed object instance is synchronized on, the monitor can configured to be cluster wide • Thread safe data structures can thus be made cluster safe

2007 JavaOneSM Conference | Session TS-1326 | 26 Terracotta ASM Use Interesting methods • java.lang.Object: wait(), notify(), notifyAll() • When invoked on distributed objects, thread signals can be used across VMs • If you know how to use these methods, you already know how to use them in DSO • Logical object actions • For data structures like HashMap that depend on local hashCode() values • Distributed Method Invocation • DMI can be used to create event style notifications using your own methods

2007 JavaOneSM Conference | Session TS-1326 | 27 Agenda

Java Virtual Machine, Bytecode and ASM framework Lazy attributes in JPA API (TopLink) Terracotta DSO Ruby to Java Compiler in JRuby Summary

2007 JavaOneSM Conference | Session TS-1326 | 28 JRuby Ruby to Java Compiler • Ruby is a very dynamic language • Many methods/types generated at runtime • Interpretation remains important • …but code is largely static • JVM implementation optimizes simpler code better • Interpreter is not simple • New code spat out every few seconds is not simple • Compilation allows JVM implementation to work its magic • Compiled code is “bare metal” • Eliminate overhead of repeated interpreter logic

2007 JavaOneSM Conference | Session TS-1326 | 29 JRuby Ruby to Java Compiler • Mixed-mode execution • Code potentially generated with every operation • Don’t compile everything • AOT (ahead-of-time) mode • Dump out for atomic pieces • Straight-through run of a script file • Methods defined during that run • Single class file for single .rb file • One Java method per Ruby method plus run-through • Simple distribution

2007 JavaOneSM Conference | Session TS-1326 | 30 JRuby Ruby to Java Compiler • JIT (just-in-time) compilation • Generated code is used either once or many times • Interpret at first • Count usages, compile when appropriate • Very much like JVM implementation operation • Lower overhead than “compile always” • No continuous compilation cost • No continuous class-gen/class-load cost • Fall back on pure interpretation where compile won’t work • Incremental development • 50% of code compiles? No problem, interpret the rest

2007 JavaOneSM Conference | Session TS-1326 | 31 JRuby Compiler Code walkthrough and perf tests

2007 JavaOneSM Conference | Session TS-1326 | 32 For More Information

• ASM project http://asm.objectweb.org/ • TopLink JPA http://www.otn.oracle.com/jpa/ • Terracotta DSO http://terracotta.org/ • JRuby http://www.jruby.org/

2007 JavaOneSM Conference | Session TS-1326 | 33 Q&A Eugene Kuleshov, Terracotta Tim Eck, Terracotta Tom Ware, Oracle Corporation Charles Nutter, Sun Microsystems, Inc.

2007 JavaOneSM Conference | Session TS-1326 | 34 Bytecode Manipulation Techniques for Dynamic Applications for the Java Virtual Machine

Eugene Kuleshov, Terracotta Tim Eck, Terracotta Tom Ware, Oracle Corporation Charles Nutter, Sun Microsystems, Inc.

TS-1326

2007 JavaOneSM Conference | Session TS-1326 |