X. Zhang, "Application-Specific Benchmarking"

Application-Specific Benchmarking A thesis presented by Xiaolan Zhang to The Division of Engineering and Applied Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Computer Science Harvard University Cambridge, Massachusetts May, 2001 Copyright © 2001 by Xiaolan Zhang All rights reserved Abstract This thesis introduces a novel approach to performance evaluation, called application-specific benchmarking, and presents techniques for designing and constructing meaningful benchmarks. A traditional benchmark usually includes a fixed set of programs that are run on different systems to produce a single figure of merit, which is then used to rank system performance. This approach often overlooks the relevance between the benchmark programs and the real applications they are supposed to represent. When the behaviors of the benchmark programs do not match those of the intended application, the benchmark scores are uninformative, and sometimes can be misleading. Furthermore, with the rapid pace of application development, it is impractical to create a new standard benchmark whenever a new “killer” application emerges. The application-specific benchmarking approach incorporates characteristics of the application of interest into the benchmarking process, yielding performance metrics that reflect the expected behavior of a particular application across a range of different platforms. It also allows benchmarks to evolve with applications and consequently the benchmarks are always up-to-date. This thesis applies the application-specific benchmarking methodology to a variety of domains covering Java Virtual Machines, garbage collection, and operating system. The result is a collection of benchmark suites that comprise the HBench framework, including HBench:Java, HBench:JGC, and HBench:OS. iii This thesis demonstrates HBench’s superiority in predicting application performance over a more conventional benchmarking approach. It is shown that HBench:Java is able to correctly predict the rank of running times of three commercial applications on a variety of Java Virtual Machine implementations. In the realm of garbage collection, the predicted garbage collection times for a stop-the-world, mark- sweep garbage collector closely match the actual times. In the domain of operating system, it is demonstrated that HBench:OS can be used to analyze performance bottlenecks and to predict performance gains resulting from a common optimization. iv Acknowledgments During my years at Harvard I am fortunate enough to have worked with a group of brilliant scholars. It is their enthusiasm towards research that inspires me; it is their encouragement that drives me to achieve the best I can. I owe them a great deal of gratitude. Among them is my advisor Margo Seltzer, to whom I am eternally grateful for her guidance and support, especially during the last two years when I had to work remotely due to the two-body problem. Her visionary view of systems research has proved invaluable to my thesis research. Her wonderful personality together with her high standards for research makes her an ideal advisor. To me, she is more than an advisor – she is also a role model. Brad Chen advised me for the first three years of my study. His creative thinking has had tremendous impact on me. What I learned from him has been very helpful in my research. Mike Smith provided sound advice in my earlier projects. Mike is also one of the nicest people I know. This is evidenced by the ping-pong table he donated to the division, which dramatically improved the quality of life for many graduate students. Professor Michael Rabin showed me the beauty and elegance of randomized algorithms. His pursuit for simplicity has profound influence on my approach to problems. Jim Waldo has the rare combination of both academic strength and a strong sense of business. I am grateful for the countless help and advice I received from him. v During my summer internships I have come to know a few great researchers who helped shape my thesis research. Bill Weihl’s group at Digital’s System Research Center taught me many things about architecture. Lance Berc helped me find the Java applications examined in this thesis. In the summer I spent with Steve Heller’s group at Sun Microsystems Lab East, I learned about garbage collection (GC). That experience greatly assisted my research on benchmarking garbage collectors. Dave Detlefs and Ole Agesen know more about GC than anybody else I can think of. Their insights and guidance have helped me develop a deeper understanding about GC. I also wish to express gratitude to my fellow students and colleagues who have made this journey more enjoyable. The VINO group members deserve my thanks for their assistance in many aspects. I have enjoyed many long interesting conversations with Keith Smith; I could not thank Kostas Magoutis enough for many rides to the airport and for numerous lunch discussions on research ideas; Most of my research was conducted on machines “borrowed” from Dan Ellard; Dave Sullivan and Dave Holland have also helped in various ways. Thanks to the HUBE group, most notably Cliff Young, Zheng Wang, and Glen Holloway, for their support in my earlier projects. My daily life would be less pleasant without the company of my other fellow students, Rebecca Hwa, Racqell Hill, Xianfeng Gu, Chris Small, Yaz Endo, Chris Stein and Dave Krinsky. Many thanks to Qingxia Tong for her friendship and for hosting me during the last few weeks of my stay in Boston. I am also grateful to Chris Lindig for our weekly ping-pong game, which has been my only physical exercise for the recent years. Finally, Elizabeth Pennell and Mollie Goldbarg also helped in proofreading my papers. vi This thesis is dedicated to my family, my husband Zhibo Zhang, my parents, Ruhuai Zhang and Huizhen He. Without their continuous love and support, none of this would have been possible. vii Table of Contents 1. Introduction......................................................................................................... 1 1.1 How Benchmark Results Can be Misleading – An Example .................. 3 1.2 Thesis Contributions ................................................................................. 4 1.3 Thesis Outline............................................................................................ 5 2. Background and Related Work .......................................................................... 7 2.1 Computer Benchmarks in General............................................................ 7 2.1.1 Types of Benchmarks ................................................................... 7 2.1.2 Examples of Current Standard Benchmarks ................................ 8 2.2 Non-Traditional Approaches to Benchmarking ....................................... 9 2.3 Performance Evaluation of Java Virtual Machines................................ 11 2.4 Garbage Collector Performance Evaluation........................................... 13 2.5 Operating System Benchmarking ........................................................... 15 2.6 Performance Prediction Using Queueing Models.................................. 16 2.7 Conclusions............................................................................................. 17 3. Application-Specific Benchmarking: the HBench Approach ......................... 19 3.1 The HBench Approach............................................................................ 19 3.1.1 The Vector-Based Methodology ................................................ 20 3.1.2 The Trace-Based Methodology.................................................. 24 3.1.3 The Hybrid Methodology........................................................... 25 3.2 Related Approaches................................................................................. 25 3.3 Conclusions............................................................................................. 27 4. HBench:Java: An Application-Specific Benchmark for JVMs ...................... 28 4.1 Identifying Primitive Operations ............................................................ 28 4.1.1 JVM Overview............................................................................ 28 4.1.2 First Attempt............................................................................... 29 4.1.3 A Higher Level Approach .......................................................... 31 4.2 HBench:Java Implementation................................................................. 32 4.2.1 Profiler......................................................................................... 32 4.2.2 Microbenchmarks ....................................................................... 33 4.2.3 JVM Support for Profiling and Microbenchmarking ................ 35 4.3 Experimental Results............................................................................... 37 4.3.1 Experimental Setup..................................................................... 37 4.3.2 Results......................................................................................... 39 4.4 Discussion................................................................................................ 45 4.5 Summary.................................................................................................. 47 viii 5. Evaluating Garbage Collector Performance with HBench:JGC ..................... 48 5.1 Introduction ............................................................................................48

Load more