從 Maxine VM 理解高效能 Java 虛擬機器運作原理

從 Maxine VM 理解高效能 Java 虛擬機器運作原理

從 Maxine VM 理解高效能 Java 虛擬機器運作原理 Jim Huang ( 黃敬群 ) <[email protected]>, Aug 2, 2013 / 台北國際會議中心 1+ OpenJDK vs. Dalvik/ART Jim Huang ( 黃敬群 ) <[email protected]>, Nov 15, 2014 / 中央研究院 What We Will Learn • How a dynamic compiler like Hotspot and Dalvik/ART works • The common optimization techniques in virtual machines • Performance specific issues What We won't • JVM tuning • JNI, GC, invokedynamic • Production tweaking • Android Programming, sorry Heritage of Languages Scheme function closure prototype-based OO Self JavaScript C-like syntax, built-in objects Java … Heritage of Virtual Machine CLDC-HI (Java) HotSpot VM (Java) Strongtalk VM (Smalltalk) Self VM V8 (Self) (JavaScript) JIT • Just-In-Time compilation • Compiled when needed – Maybe immediately before execution – ...or when we decide it’s important – ...or never? Mixed-Mode • Interpreted – Bytecode-walking – Artificial stack machine • Compiled – Direct native operations – Native register machine Profiling • Gather data about code while interpreting – Invariants (types, constants, nulls) – Statistics (branches, calls) • Use that information to optimize – Educated guess – Guess can be wrong... Runtime Statistics Golden Rule of Optimization • Don’t do unnecessary work. Optimizations • Method inlining • Loop unrolling • Lock coarsening/eliding • Dead code elimination • Duplicate code elimination • Escape analysis Inlining • Combine caller and callee into one unit – e.g. based on profile – Perhaps with a guard/test • Optimize as a whole – More code means better visibility Inlining int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } int add(int a, int b) { return a + b; } Inlining int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum;Only one target is ever seen } int add(int a, int b) { return a + b; } Inlining int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = accum + i; } return accum; Don’t bother making the call } Loop unrolling • Works for small, constant loops • Avoid tests, branching • Allow inlining a single call as many Loop unrolling private static final String[] options = { "yes", "no", "maybe"}; public void looper() { for (String option : options) { process(option); } } Small loop, constant stride, constant size Loop unrolling private static final String[] options = { "yes", "no", "maybe"}; public void looper() { process(options[0]); process(options[1]); Unrolled! process(options[2]); } Lock Coarsening public void needsLocks() { for (option : options) { process(option); } Repeatedly locking } private synchronized String process(String option) { // some wacky thread-unsafe code } Lock Coarsening public void needsLocks() { Lock once synchronized (this) { for (option : options) { // some wacky thread-unsafe code } } } Lock Eliding public void overCautious() { Synchronize List l = new ArrayList(); on new Object synchronized (l) { for (option : options) { l.add(process(option)); } } } But we know it never escapes this thread... Lock Eliding public void overCautious() { List l = new ArrayList(); for (option : options) { l.add( /* process()’s code */); } } No need to lock Escape Analysis private static class Foo { public final String a; public final String b; Foo(String a, String b) { this.a = a; this.b = b; } } Escape Analysis public void bar() { Foo f = new Foo("Hello", "JVM"); baz(f); } public void baz(Foo f) { System.out.print(f.a); Same object all the System.out.print(", "); quux(f); way through } public void quux(Foo f) { System.out.print(f.b); Never “escapes” these System.out.println('!'); methods } Escape Analysis public secret awesome inlinedBarBazQuux() { System.out.print("Hello"); System.out.print(", "); System.out.print("JavaOne"); System.out.println('!'); } Don’t bother allocating foo object Escape Analysis • A bit tweaky on Hotspot – All paths must inline – No external view of object Performance Pitfall • Memory accesses – By far the biggest expense • Calls – Memory reference + branch kills pipeline – Call stack, register juggling costs • Locks Performance Pitfall (again) • Each CPU maintains a memory cache • Caches may be out of sync – If it doesn’t matter, no problem – If it does matter, threads disagree! • Volatile forces synchronization of cache – Across cores and to main memory Hotspot • client mode (C1) inlines, less aggressive – Fewer opportunities to optimize • server mode (C2) inlines aggressively – Based on richer runtime profiling Tiered • Increasing tiers of interpreter, C1, and C2 • Level 0 = Interpreter • Level 1-3 = C1 • Level 4 = C2 HotSpot Client Compiler C2 Compiler • Profile to find “hot spots” – Call sites – Branch statistics – Profile until 10k calls from Interpreter to Compiler 純 Interpreter 簡單 Compiler Source-level interpreter base compiler Tree-traversal static optimizing Bytecode interpreter compiler – switch-threading – 基於程式碼模式 – indirect-threading Dynamic optimizing – token-threading compiler – direct-threading – 基於硬體作業系統 – subroutine-threading – 基於執行頻率 – inline-threading – 基於類型反射 – context-threading – – … 基於整個程式的分析 簡單 Compiler 優化 Compiler – … 執行時期的開銷 : Interpreter fetch execute dispatch Java source public class TOSDemo { public static int test() { JVM bytecode int i = 31; int j = 42; iload_0 int x = i + j; iload_1 return x; iadd } javac istore_2 如何執行? } ;;------------ iload_0 iload_0------------ mov ecx,dword ptr ss:[esp+1C] 由 Sun JDK 1.0.2 的 iload_1 mov edx,dword ptr ss:[esp+14] JVM 解譯執行 iadd add dword ptr ss:[esp+14],4 [x86 指令序列 ] mov eax,dword ptr ds:[ecx] istore_2 inc dword ptr ss:[esp+10] mov dword ptr ds:[edx+10],eax jmp javai.1000E182 由 Sun JDK 1.1.8 的 由 Sun JDK 6 的 mov eax,dword ptr ss:[esp+10] mov bl,byte ptr ds:[eax] JVM 解譯執行 HotSpot 解譯執行 xor eax,eax [x86 指令序列 ] [x86 指令序列 ] mov al,bl cmp eax,0FF ja short javai.1000E130;;-------------iadd-------------- ;;------------- ;;-------------iload_0------------- jmp dword ptrmov ds: ecx,dword ptr ss:[esp+14] iload_0------------- mov eax,dword ptr ds:[edi] [eax*4+10011B54]inc dword ptr ss:[esp+10] movzx eax,byte ptr ds:[esi+1] movzx ebx,byte ptr ds:[esi+1] ;;------------sub dword ptr ss:[esp+14],4 mov ebx,dword ptr ss:[ebp] inc esi iload_1------------mov edx,dword ptr ds:[ecx+C] inc esi jmp dword ptr ds:[ebx*4+6DB188C8] mov ecx,dwordadd ptr dword ss:[esp+1C] ptr ds:[ecx+8],edx jmp dword ptr ds: ;;-------------iload_1------------- mov edx,dwordjmp ptr javai.1000E182 ss:[esp+14] [eax*4+1003FBD4] push eax add dword ptrmov ss:[esp+14],4 eax,dword ptr ss:[esp+10] ;;------------- mov eax,dword ptr ds:[edi-4] mov eax,dwordmov ptr bl,byte ds:[ecx+4] ptr ds:[eax] iload_1------------- movzx ebx,byte ptr ds:[esi+1] inc dword ptrxor ss:[esp+10] eax,eax movzx eax,byte ptr ds:[esi+1] inc esi mov dword ptrmov ds:[edx+10],eax al,bl mov ecx,dword ptr ss:[ebp+4] jmp dword ptr ds:[ebx*4+6DB188C8] jmp javai.1000E182cmp eax,0FF inc esi ;;--------------iadd--------------- mov eax,dwordja ptr short ss:[esp+10] javai.1000E130 jmp dword ptr ds: pop edx mov bl,byte jmpptr dwordds:[eax] ptr ds:[eax*4+10011B54] [eax*4+1003FFD4] add eax,edx xor eax,eax ;;-----------istore_2------------ ;;-------------- movzx ebx,byte ptr ds:[esi+1] mov al,bl mov eax,dword ptr ss:[esp+14] iadd--------------- inc esi cmp eax,0FF mov ecx,dword ptr ss:[esp+1C] add ebx,ecx jmp dword ptr ds:[ebx*4+6DB188C8] ja short javai.1000E130sub dword ptr ss:[esp+14],4 movzx eax,byte ptr ds:[esi+1] ;;------------istore_2------------- jmp dword ptrmov ds: edx,dword ptr ds:[eax+C] inc esi mov dword ptr ds:[edi-8],eax [eax*4+10011B54]inc dword ptr ss:[esp+10] jmp dword ptr ds: movzx ebx,byte ptr ds:[esi+1] mov dword ptr ds:[ecx+8],edx [eax*4+1003FBD4] inc esi jmp javai.1000E182 ;;------------ jmp dword ptr ds:[ebx*4+6DB19CC8] mov eax,dword ptr ss:[esp+10] istore_2------------- mov bl,byte ptr ds:[eax] movzx eax,byte ptr ds:[esi+1] xor eax,eax mov dword ptr ss:[ebp+8],ebx mov al,bl inc esi cmp eax,0FF jmp dword ptr ds[eax*4+1003F7D4] ja short javai.1000E130 jmp dword ptr ds [eax*4+10011B54] instruction traces Summary: OpenJDK Introduction to Dalvik VM K. Yaghmour, Embedded Android 1st edition, Chapter 2, Figure 2-1 Dalvik VM in a nutshell • The core of Android Applications – All fancy Android applications are run by it • Register-based Process Virtual Machine – Think of running a Java application • Intermediate Language = Dalvik Bytecode • Executable File = Dalvik Executable (DEX) – A converted Java class done by “dx” tool – Reduce redundancy in variables Dalvik VM: Bytecode • Register-based, 32-bits • Instructions Fetch Unit : 16 bits • Byte code store as binary • Constant pools • String, Type, Field, Method, Class • Human-syntax andmnemonics Insturction Suffix -wide(64bits OpCodes) -char -boolean -short -byte -int -long -float -object -string -class -void Dalvik is Register-based • const-4 to store 1 into register 0 • add-int/lit8 to sum the value in register 0 (1) with the literal 2 and store the result into register 1, namely “foo” • Fewer dispatches generally means less time spent reading code and more time spent running it by the interpreter Dalvik Bytecode: Human syntax • Example: move-wide/from16 vAA,vBBBB – Opcode: “move": move a register's value – "wide" is the name suffix • it operates on wide (64 bit) data. – "from16" is the opcode suffix • 16-bit register referenceas a source. – "vAA" is the destination register • v0 – v255 – "vBBBB" is the source register • v0 – v65535 Dalvik Registers • Consider, the for loop shown here, it is not legal to do just a push of a number onto the stack inside a loop in Java byte code. – to be able map, stack slots to hardware registers, we need the stack height to be the same at the start and end of a loop -- unlike true stack-based languages like Forth. • The irony is in the end, normal JVMs convert to the same form as Dalvik anyway. – For instance, Java HotSpot 6 client JIT.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    54 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us