Techniques for Real-System Characterization of Virtual Machine Energy and Power Behavior Gilberto Contreras Margaret Martonosi

Department of Electrical Engineering Princeton University

1 Why Study Power in Java Systems?

„ The Java platform has been adopted in a wide variety of devices „ Java servers demand performance, embedded devices require low-power „ Performance is important, power/energy/thermal issues are equally important „ How do we study and characterize these requirements in a multi-layer platform?

2 Power/Performance Design Issues

Java Application

Java Virtual Machine

Operating System

Hardware

3 Power/Performance Design Issues

Java Application

Garbage Class Runtime Execution Collection LoaderJava VirtualCompiler MachineEngine

Operating System

Hardware ‰ How do the various software layers affect power/performance characteristics of hardware? ‰ Where should time be invested when designing power and/or thermally aware Java virtual Machines?

4 Outline

„ Approaches for Energy/Performance Characterization of Java virtual machines „ Methodology ‰ Breaking the JVM into sub-components ‰ Hardware-based power/performance characterization of JVM sub-components „ Results ‰ Jikes & on Pentium M ‰ Kaffe on Intel XScale „ Conclusions

5 Power & Performance Analysis of Java

„ Simulation Approach √ Flexible: easy to model non-existent hardware x Simulators may lack comprehensiveness and accuracy x Thermal studies require tens of seconds granularity „ Accurate simulators are too slow

„ Hardware Approach √ Able to capture full-system characteristics and effects √ Data gathering is comparable to hardware speeds x Only applicable to existent hardware

6 Hardware-based Characterization

Hardware

Virtual Machine 0010 CH0 Track CH1 code Class Loader CH2 region ID: 0001 CH3

DAQ CPU Garbage Collector ID: 0010 CH- Power measurements Memory CH+ scheduler ID: 0100

Execution Engine ID: 1000

7 Two Virtual Machines

Jikes RVM Kaffe JVM

High performance Flexibility and Design goal portability

Architecture High-end processors High-end to embedded support

Garbage Multiple collectors Mark-and-sweep collection Runtime compiler with Just-in-time compiler different optimization Runtime optimizations levels

8 Two Platforms Pentium M (P6) Intel XScale

High-performance mobile High-end handheld Platform Type computers devices

Configuration 1.6Ghz, 512MB RAM 400Mhz, 32MB RAM

Theoretical Max Power 31W 1.4W

Jikes RVM JVM Kaffe Kaffe Used

9 Outline

„ Approaches for Energy/Performance Characterization of Java virtual machines „ Methodology ‰ Breaking the JVM into sub-components ‰ Power/performance hardware-based characterization of JVM sub-components „ Results ‰ Jikes & Kaffe on Pentium M ‰ Kaffe on Intel XScale „ Conclusions

10 Jikes Energy Distribution on P6

SemiSpace Garbage Collector app gc cl base opt_comp 100%

80%

60%

40% Energy Usage 20%

0%

8 2 8 8 2 8 8 8 32 2 3 2 32 2 3 2 32 4 2 1 1 1 1 128 1 Heap size (MB) db fop jess jack javac compress

11 Jikes Energy Distribution on P6

SemiSpace Garbage Collector app gc cl base opt_comp 100%

80%

60%

40% Energy Usage 20%

0%

8 2 8 8 2 8 8 8 32 2 3 2 32 2 3 2 32 4 2 1 1 1 1 128 1 Heap size (MB) db fop jess jack javac

compress •JVM: Up to 60% of the total energy •GC: Average 37% of the total energy of SpecJVM98

12 Jikes Energy-Delay Product on P6

SemiSpace MarkSweep GenMS GenCopy

3500

3000

2500

2000

1500 EDP (J*sec)

1000

500

0

4 8 4 8 2 6 2 6 2 8 32 6 96 2 32 6 96 3 64 9 3 64 9 3 64 96 2 1 12 128 128 1 Heap size (MB)

b ck ress ss d va ja je ja mp co

„ Jikes: heap size has a significant impact on energy efficiency „ EDP decrease across heap sizes due to a decrease in application execution time

13 Jikes Power Consumption on P6

app gc cl GenCopy Garbage Collector

18

16

14

12

10

8 Watts 6

4

2

0

8 8 2 4 4 32 64 96 32 64 96 32 64 96 2 3 6 96 32 6 96 48 80 12 128 1 128 128 112 Heap size (MB) b jess d jack javac fop compress

„ Average power for JVM varies little across heap-sizes „ Garbage collector is high energy consumer, but low power

14 Jikes Peak Power on P6 app gc cl

20 18 16 14 12 10 Watts 8 6 4 2 0

8 8 8 32 64 96 12 32 64 96 128 32 64 96 12 32 64 96 128 32 64 96 12 48 80 112 Heap size (MB)

b d jess jack javac fop compress

„ Execution engine has the highest peak-power

15 Jikes versus Kaffe: Energy Distribution on P6

Jikes Kaffe

app gc cl base_comp opt_comp app gc cl jit 100% 100%

80% 80%

60% 60%

40% 40%

20% 20%

0% 0%

2 6 8 2 8 4 6 2 6 8 2 4 8 8 0 2 4 8 4 8 4 8 4 8 4 8 8 2 3 64 9 3 64 96 32 6 9 3 64 9 3 6 96 4 8 32 6 96 32 6 96 2 32 6 96 2 32 6 96 2 32 6 96 2 4 80 12 12 128 12 12 11 12 1 1 1 1 11

s k s k s b c p s b c s a o s a d ac f es d ac p e jes j e j j o r av r av f p j p j m m o o c c

„ Kaffe: high application energy caused by long execution times „ Kaffe: 8% of total average energy goes to virtual machine

16 Kaffe Across Platforms

Pentium M Intel XScale

app gc cl jit 100% app gc cl jit

80% 100%

80% 60%

60%

40% 40%

20% 20%

0% 0% 2 8 2 6 8 6 8 6 8 6 8 2 3 64 96 2 3 64 9 2 32 64 9 2 32 64 9 2 32 64 9 2 48 80 1 2 0 6 1 1 1 1 1 1 12 16 20 24 28 3 12 16 20 24 28 32 12 16 20 24 28 32 12 16 2 24 28 32 12 1 20 24 28 32

s b c k s b c k ss s d c e e a ss s d c r j va j e e va a a r j j p j ja p m m o o c c

„ XScale: no classes are included in the binary „ XScale: GC only represents 6% of the total energy consumed

17 Conclusions

„ Methodology

‰ The complexity of the calls for a more in-depth power/energy analysis

‰ Hardware-based characterization of the virtual machine’s sub-components allow long execution times and trustworthy measurements „ Lessons learned

‰ In both platforms, JVM energy overhead is considerable

‰ Jikes: the GC is low power but high energy consumer (up to 37% on average)

‰ For Kaffe on XScale, the class loaded becomes high-energy consumer (18% for measured benchmarks)

18 Thank you!

19