!John Pampuch –Director, VM Technology –Client Software Group – VM Optimizations for Language Designers

Exact File Name 9/24/08 Page 1 There once was a time...

Java

Copyright , Inc 2008: 2 / 32

Exact File Name 9/24/08 Page 2 Oh wait...

JavaTM

Copyright Sun Microsystems, Inc 2008: 3 / 32

Exact File Name 9/24/08 Page 3 But really, what we meant was... JavaTM

Copyright Sun Microsystems, Inc 2008: 4 / 32

Exact File Name 9/24/08 Page 4 Fortunately, clearer minds prevail Bex Script WebL Funnel JESS Jickle iScript Modula-2 Lisp Zigzag Simkin CAL JavaScript Correlate Nice JudoScript Simkin Drools Basic Eiffel Luck Icon Groovy v-language Tea Prolog Mini Pascal Scala Tcl PLAN Hojo Rexx JavaFX Script foo FScript Tiger Anvil Oberon E Smalltalk JHCR Logo Yassl Tiger JRuby Ada G Clojure Scheme Sather Phobos Processing Dawn TermWare Sleep LLP Pnuts BeanShell Forth C# PHP Piccola SALSA Yoix ObjectScript Copyright Sun Microsystems, Inc 2008: 5 / 32

Exact File Name 9/24/08 Page 5 So what good is it? • Java Performance – Isn't that an oxymoron? • A little history > 1996 > desktop systems: 60-100 MHz > Workstations: 120-167MHz (I don't have a clue on servers) > Java: no JIT! > Vs. Today > Desktop systems: 2-core 1.0-3.0 Ghz (roughly 30-90 times) > Servers systems: 4-socket, 8-core, 8 HW threads/core, 1+GHz per core (easily 100 times improvement) > Java: JIT and more... 10 to 30 times improvement

Copyright Sun Microsystems, Inc 2008: 6 / 32

Exact File Name 9/24/08 Page 6 Which means... • Java as we perceive it is 300 to 3000 times faster than when it started • Today, Java is also measured to be 50% to 100+% the speed of C and C++ * • Sometimes, it is 50 to 250 times faster than your favorite * • Or more! * • Pace of improvement remains high • But, we demand a lot more too * Source: http://shootout.alioth.debian.org

Copyright Sun Microsystems, Inc 2008: 7 / 32

Exact File Name 9/24/08 Page 7 To be fair... • There is that footprint and startup time issue • But, we're working on that! • Near-term tactical effort to reduce internal class coupling • Longer term strategies for further improvement

Now, let's change topics a bit

Copyright Sun Microsystems, Inc 2008: 8 / 32

Exact File Name 9/24/08 Page 8 So what is it like to be part of the HotSpot Team?

Copyright Sun Microsystems, Inc 2008: 9 / 32

Exact File Name 9/24/08 Page 9 The Rules of the Road • Rules are simple: > Make it better (faster, smaller, etc.) > Anything you want to do, as long as no one can tell > You can't break working applications! > And, don't forget: > 100s of thousands of tests > Millions of applications > 100s of millions of users > Kind of like doing taxes > PRT/JPRT etc. > Security, Compatibility, Compliance

Copyright Sun Microsystems, Inc 2008: 10 / 32

Exact File Name 9/24/08 Page 10 Results are extraordinary • Thanks to the hard work of many engineers > At least 500 man years at Sun, so far > Plus some close collaboration with Intel, and others • And some friendly competition between Sun, IBM HP, SAP, Azul, and BEA/Oracle, to name a few • Improvements range from VM to core libraries to GUI components • Benefits can reach beyond the Java language > At least from the VM > But sometimes from the work as well

Copyright Sun Microsystems, Inc 2008: 11 / 32

Exact File Name 9/24/08 Page 11 A Primer on Optimization • Two basic strategies: > Do more faster – Run, don't walk > Do less – procrastinator's creed! > Don't do today what you can put off to tomorrow > Don't do tomorrow what you can put off all together • Is this free? > Overhead > Degenerate cases > Response time > Footprint > Startup time

Copyright Sun Microsystems, Inc 2008: 12 / 32

Exact File Name 9/24/08 Page 12 HotSpot: Some of the Optimization Samples • Inlining • Loop unrolling • Constant folding • Escape analysis • String/Memory optimizations • Vectorization • Library improvements • Processor specific optimizations

Copyright Sun Microsystems, Inc 2008: 13 / 32

Exact File Name 9/24/08 Page 13

•Java gets faster by a whole range of approaches. • •The biggest gains, as usual, come from improvements in the fundamental algorithms both in the libraries and the VM. • •Better register allocation •Smarter numerics •Better collection class implementations

Key here: Use the core classes when you can... they keep getting better, and so will your app!

But another subtle message: If you are implementing a new language: Consider using the JVM as your foundation... and again, when you can use its core classes. You'll reap benefits. Practical Examples of Optimization • Start with Java language source example • But instead of compiling to bytecodes, etc. • I'll show the impact of optimization as if it were done directly on the source. • The same principles would apply to any language that is compiled to byte codes • And, to be fair, many of these techniques are used in static compilers as well, but some can't!

Copyright Sun Microsystems, Inc 2008: 14 / 32

Exact File Name 9/24/08 Page 14 Inlining • Simple case: bring the implementation of small methods (and sometimes not so small) into a containing method • Do more faster! • The Java Language (and others) encourages small methods like accessors • With a good JIT, there is no penalty • But wait, there's more: > HotSpot will try to inline virtual methods (type profiling) > Inlining will enhance other optimizations (coming up!)

Copyright Sun Microsystems, Inc 2008: 15 / 32

Exact File Name 9/24/08 Page 15 Virtual methods seem quite difficult.

But like many optimizations that occur, HotSpot can determine the specific use case, and determine that a generalization done in a library isn't required in the actual usage

For example, a class might use a virtual method to reach back into the application using the class.

In the general case, there might be many different choices, and inlining won't work. But in the common case, there might be just a single class, and thus a single implementation, which does permit inlining.

But... hey, what about a case where you don't know? Class not loaded (yet) for example.

That's OK, HotSpot will detect that case, and 'uncompile' (and then recompile) to get it right. Inlining example (excerpts) Before After private final int count [] = private final int count [] = { 12, -5, 19, 22, -6 }; { 12, -5, 19, 22, -6 }; /* inline, this goes away private int count (int n) { private int count (int n) { return count [n]; return count [n]; } } */

private int sum () { private int sum () { int sum = 0; int sum = 0;

for (int n = 0; n < // subtle change count.length; n++) { for (int n = 0; n < sum += count(n); count.length; n++) { } sum += count[n]; return sum; } } return sum; }

Copyright Sun Microsystems, Inc 2008: 16 / 32

Exact File Name 9/24/08 Page 16 Loop unrolling • Some loops aren't worth implementing as a loop • Not much fun writing that out though • And, again, who knows, when you are writing a general purpose library? • Another case of do more faster! • Similar example as before

Copyright Sun Microsystems, Inc 2008: 17 / 32

Exact File Name 9/24/08 Page 17 Loop unrolling example (excerpts) Before After private final int count [] = private final int count [] = { 12, -5, 19, 22, -6 }; { 12, -5, 19, 22, -6 }; private int sum () { int sum = 0; private int sum () { int sum = 0; // subtle change int n = 0; for (int n = 0; n < sum += count[n++]; count.length; n++) { sum += count[n++]; sum += count[n]; sum += count[n++]; } sum += count[n++]; return sum; sum += count[n++]; } }

return sum; }

Copyright Sun Microsystems, Inc 2008: 18 / 32

Exact File Name 9/24/08 Page 18 This is a bit of an over simplification, most cases don't look like this.

But in this case, since the loop bounds are constant, it is easly to flatten it out completely.

But what might occur is that the loop might be flattened a bit... instead of doing one add per loop, we might do 4 or 8.

It's a bit more complicated then, because the last, and maybe the first iteration need to be specialized. But, Hotspot handles all of that automatically. Constant folding • This is a pretty well known optimization • Ancient even, by code generation standards • But, a JIT has edge: there are more constants to work with!

Copyright Sun Microsystems, Inc 2008: 19 / 32

Exact File Name 9/24/08 Page 19 Constant folding example Before private final int count [] = { 12, -5, 19, 22, -6 };

private int sum () { int sum = 0; int i = 0;

sum += count[i++]; sum += count[i++]; sum += count[i++]; sum += count[i++]; sum += count[i++];

return sum; }

Copyright Sun Microsystems, Inc 2008: 20 / 32

Exact File Name 9/24/08 Page 20 This is another bit of an over simplification, most cases don't look like this.

Now that we have flattened out the loop, a lot of things start to look like constants.

First, the array indexes are effectively constant. How much would you pay for this code? After

private int sum () { return 42; }

OK, so this is probably a bit of an over simplification. Not a very realistic example!

Copyright Sun Microsystems, Inc 2008: 21 / 32

Exact File Name 9/24/08 Page 21 Here, we're just folding together the whole result. What's left? Not much.

A real example would rarely be this simple. But with escape analysis, inlining and a host of other methods, run-time constants are all over the place, and lots of the code can be substantially simplified.

And again, let's repeat: The JIT will try to uncover the special cases buried in general case implementations, so you don't need to! Escape analysis • Not really an optimization technique per se • But a tool to find other optimizations • In Summary: find objects that don't escape their local scope • Why? Objects known to be local in scope can be handled much more simply > Avoid expensive operations like synchronization (do less) > Allocate on stack (run faster and do less) > Allocate only what you use (do less) > AKA object explosion > And others Copyright Sun Microsystems, Inc 2008: 22 / 32

Exact File Name 9/24/08 Page 22 Escape Analysis example Before After import java.awt.Point; import java.awt.Point; ......

public Point moveLeft (Point p) { public Point moveLeft (Point p) { Point l = new Point (10,0); int x = 10; Point r = int y = 0; new Point ( p.getX() + l.getX(), Point r = p.getY() + l.getY()); new Point ( p.getX() + x, return r; p.getY() + y); } return r; }

Copyright Sun Microsystems, Inc 2008: 23 / 32

Exact File Name 9/24/08 Page 23 In this instance, we're showing that escape analysis can help us use simpler code paths to obtain the same results.

Sure, you wouldn't really write code this way, since there is a built in 'move' method in Point, but you get the 'Point'.,

Note, 'y' could just go away (well, so could 'x') but one of the wins to Escape Analysis is that when objects 'explode', you can get rid of unused parts.

Again, quite an over simplification, but the same ideas can be applied to more complicated uses too. Synchronization Improvements • Biased Locking > Assume a lock generally associates with a single thread > Until proven otherwise > Save time in the common case • Spin Locks > Instead of giving up, try again to acquire a lock > Usually cheaper than a context switch • Lock Coarsening > Locks are expensive, do less of them

Copyright Sun Microsystems, Inc 2008: 24 / 32

Exact File Name 9/24/08 Page 24 Except on Single hardware thread systems, synchronization is quite expensive. The net of it is that it forces you to circumvent all of the benefits of caching and instruction pipeline that most modern processors hae.

An operation that might take a handful of machine cycles to execute without synchronization, might take 50 or100 with. Lock Coarsening example Before After

public Point moveNW (Point p) { public Point moveNW (Point p) {

synchronized (p) { synchronized (p) { p.x += 10; p.x += 10; } p.y += 10; } synchronized (p) { } p.y += 10; } }

Copyright Sun Microsystems, Inc 2008: 25 / 32

Exact File Name 9/24/08 Page 25 OK, first of all, we're taking liberties here... we can't access the package private members of Point.

But imagine this method was part of point....

Granted, no one would code something like this intentionally.

But again consider making multiple calls into a library. Each method might have synchronized methods. Especially true with accessors.

Remember too, we're pretty aggressive about inlining

Now this is a good point to mention that this can't always be used. Real-Time VM implementation really can't (and shouldn't do many optimizations because they are barriers to predictability. Library improvements • Aside from improvements in the VM

Copyright Sun Microsystems, Inc 2008: 26 / 32

Exact File Name 9/24/08 Page 26 Processor specific optimizations • Every processor design has its tricks eg 8080: XOR AL, AL vs. more obvious MOV AL, 0 • Even within a processor family • Some are subtle/minor, like prefetch distances • Others include new instructions • With a JIT like HotSpot, the dynamically generated code can fully use processor features without conditional compilation, specialized binaries, etc. • Special instructions to improve VM performance

Copyright Sun Microsystems, Inc 2008: 27 / 32

Exact File Name 9/24/08 Page 27 One more thing... Options • There are many open source and commercial Java VMs • To pick on one (HotSpot) > 1 interpreter > 2 compilers > 3 OOP models (32 bit, 64 bit and compressed 64 bit) > 4 (soon to be 5) Garbage Collectors > About 937000 other run-time switches > But, don't use them unless you need them • And, a real-time variant

Copyright Sun Microsystems, Inc 2008: 28 / 32

Exact File Name 9/24/08 Page 28 How do I stay ahead? • Use the core classes when you can > They keep getting better > The VM sometimes can do special treatment > Core classes are pretty reliable and performant • Multi-platform comes at much lower cost (not free!) > OS and processor specific optimizations too > Don't forget to test your targets! • You'll get plenty of benefit, and more to come! > HotSpot tends to stay on top of latest processor features

Copyright Sun Microsystems, Inc 2008: 29 / 32

Exact File Name 9/24/08 Page 29 Summary • • Let the JIT do what it does best • Finally... don't try this at home > The examples are contrived > JITs detect hot spots by execution many times > It compiles (and sometimes recompiles) only after many executions

Copyright Sun Microsystems, Inc 2008: 30 / 32

Exact File Name 9/24/08 Page 30 Reference

• http://weblogs.java.net/blog/kohsuke/archive/2008/03/deep_dive_into.html • http://java.sun.com/javase/6/docs/technotes/guides/vm/index.html • http://en.wikipedia.org/wiki/Java_performance • http://spec.org/ • http://www.idiom.com/~zilla/Computer/javaCbenchmark.html • http://www.ibm.com/developerworks/java/library/j-jtp09275.html • http://shootout.alioth.debian.org • http://performance.netbeans.org/howto/jvmswitches/index.html

Copyright Sun Microsystems, Inc 2008: 31 / 32

Exact File Name 9/24/08 Page 31 Questions?

!John Pampuch –[email protected]

Exact File Name 9/24/08 Page 32