Michael Kuperberg
Quantifying and Predicting the Influence of Execution Platform on Software Component Performance The Karlsruhe Series on Software Design and Quality Volume 5
Chair Software Design and Quality Faculty of Computer Science Karlsruhe Institute of Technology and
Software Engineering Division Research Center for Information Technology (FZI), Karlsruhe
Editor: Prof. Dr. Ralf Reussner Quantifying and Predicting the Influence of Execution Platform on Software Component Performance by Michael Kuperberg Dissertation, Karlsruher Institut für Technologie Fakultät für Informatik Tag der mündlichen Prüfung: 4. November 2010 Referenten: Prof. Dr. Ralf Reussner, Prof. Dr. Walter F. Tichy
Impressum Karlsruher Institut für Technologie (KIT) KIT Scientific Publishing Straße am Forum 2 D-76131 Karlsruhe www.ksp.kit.edu
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft
Diese Veröffentlichung ist im Internet unter folgender Creative Commons-Lizenz publiziert: http://creativecommons.org/licenses/by-nc-nd/3.0/de/
KIT Scientific Publishing 2011 Print on Demand
ISSN 1867-0067 ISBN 978-3-86644-741-7 Abstract
Software engineering is concerned with the cost-efficient construction of app- lications which behave as specified, are well-designed and of high quality. Among software quality attributes, performance is one of most prominent and well-studied. Performance evaluation is concerned with explaining, predicting and preventing long waiting times, overloaded bottleneck resources and other performance problems. However, performance remains hard to evaluate because it depends not only on software implementation, but also on several other factors such as the work- load and the execution platform on which the software runs. The execution plat- form comprises hardware resources (CPU, networks, hard disks) and software resources (operating system, middleware). In former approaches, the influence of the execution platform was a hard-wired part of the model, and not an ad- justable parameter. This meant that to answer sizing and relocation questions, a performance model had to be recreated and quantified for each candidate exe- cution platform. The resulting challenge addressed by this thesis is to devise an effective ap- proach for quantifying and predicting the influence of the execution platform on software performance, using Model-Based Performance Evaluation (MBPE) at the level of software architecture. The primary targeted benefit is a decrease of the effort needed for performance prediction, since answering sizing and re- location questions no longer needs the deployment and measurement of the considered application on every candidate execution platform. The application of MBPE starts at design time since delaying performance evaluation until the implementation of the software is not desirable: the refact- oring costs increase with the degree of completeness and deployment. To model the artefacts of the software application, MBPE builds upon the well-studied concept of software components and their required and provided services as exchangeable building blocks which facilitate recomposition and reuse. In most MBPE approaches, the atomic behaviour actions of components carry timing values. On the basis of these timing values, an analysis of the overall applica- tion behaviour (e.g. prediction of response times) is then performed. Unfortunately, such timing values are platform-specific and the resulting ar- chitectural model is also platform-specific. Therefore, the model needs to be rebuilt for each considered execution platform and for each usage profile. Addi- tionally, the durations of atomic component actions often amount to just a few nanoseconds, and measuring such fine-granular actions is challenging because conventional timer methods are too coarse for them. The contribution of this thesis is a novel approach to quantify and to predict both platform-independent and platform-dependent resource demands on the basis of performance models. Using automated benchmarking of the execution platform, the approach is able to make precise, platform-specific performance predictions on the basis of these models, without manual effort. By separating the performance evaluation of the application from the performance evaluation of the execution platform, the effort to consider different platforms (e.g. for relo- cation or sizing scenarios) is significantly decreased, since it is no longer needed to deploy the application on each candidate platform. To select the timer meth- ods used in measurements, this thesis introduces a novel platform-independent algorithm which quantifies timer quality metrics (e.g. accuracy and overhead). Building on the Palladio Component Model (PCM) and its tooling, the imple- mentation of the approach provides a convenient user interface and a validated theoretical foundation. The resource demands are parametrised over the usage (workload) of the considered components, and are expressed as annotations in the PCM-based behaviour model of the component. To integrate the presented approach into the PCM, new meta-model concepts have been introduced into the PCM, and corresponding tooling has been added. The enhanced PCM workbench allows for automated creation of PCM model ii instances from black-box bytecode components, and also includes concepts and tools to convert benchmarking results into PCM resource models. The presented approach focuses on applications that will run as platform- independent bytecode on bytecode-executing virtual machines (BEVMs) such as the Java VM. It accounts for dynamic and static optimisations performed in modern BEVMs, e.g. just-in-time compilation (JIT) and inlining. To translate the platform-independent resource demands into timing values, this thesis in- troduces a benchmark suite for BEVMs. This benchmark suite addresses both fine-granular bytecode instructions (e.g. integer addition or array initialisation) and platform API methods provided by BEVM’s base libraries, e.g. by the Java Platform API. Unlike existing approaches, the contribution of this thesis
• does not require modification or instrumentation of the execution platform
• quantifies the performance speedups of the execution platform (e.g. just- in-time compilation) and reflects them during performance prediction
• deals with API and library methods in an atomic way, providing method- level benchmarking results which are more intuitive than per-instruction timings
• provides more detailed per-invocation performance results than conven- tional profilers, and supports stochastic distributions of performance val- ues, which are more realistic and information-richer than conventional av- erage or median metrics
An extensive validation of performance prediction capabilities offered by the new approach was performed on a number of Java applications, such as widely used SPECjvm2008, SPECjvm98, SPECjbb2005 and Linpack benchmarks. The validation demonstrated the prediction accuracy of bytecode-based cross- platform performance prediction, and showed that it has significantly better results than prediction based on CPU cycles. The validation used one exe- cution platform as a basis to obtain platform-independent resource demands,
iii and predicted the performance of the application on other execution platforms (which were significantly different from the basis platform) without deploying and benchmarking the application on them. The validation also addressed indi- vidual parts of the presented approach: the precision and the overhead of the re- source demand quantification were studied, and the heuristics-based approach for automated method benchmarking was evaluated w.r.t. its effectiveness, cov- erage and precision of the benchmarking results. A large comparison of timer methods on the basis of quality attributes was performed on several Java and .NET platforms.
iv Zusammenfassung
Software Engineering beschäftigt sich mit kosteneffektiver Konstruktion von qualitativ hochwertigen Softwareanwendungen, deren Verhalten einer vorgegebenen Spezifikation folgt und denen ein zielgerichteter Entwurf zu- grundeliegt. Unter den Qualitätsattributen von Software nimmt die Perform- ance eine zentrale Rolle ein und wird dementsprechend intensiv erforscht. Der Forschungsbereich Performance-Analyse beschäftigt sich mit Messung, Model- lierung und Vorhersage von Performance, um Performance-Probleme wie z.B. überlastete Ressourcen zu erklären und ihnen vorzubeugen. Performance-Analyse bietet zahlreiche Herausforderungen und offene Forschungsfragen, da die Performance einer Applikation in komplexer Weise von Faktoren wie Implementierung, Nutzlast und Ausführungsumgebung ab- hängt. Die Ausführungsumgebung beinhaltet Hardware-Ressourcen wie z.B. CPU und Festplatte, aber auch Software-Ressourcen wie das Betriebssystem oder die Middleware. In früheren Modellierungsansätzen war der Einfluss der Ausführungsumgebung als ein konstanter und fixierter Faktor enthalten, so- dass das Modell für Vergleiche der Ausführungsumgebungen oder für Frages- tellungen zur Ressourcendimensionierung mehrfach neu aufgestellt werden musste. Daraus ergibt sich die in dieser Doktorarbeit angegangene Herausforderung, einen effektiven Ansatz zur Vorhersage des Einflusses der Ausführungsumge- bung auf Software-Performance zu entwickeln. Der zu entwickelnde Ansatz soll ohne Installation und Messung der analysierten Applikation auf jeder der betrachteten Ausführungsumgebungen auskommen. Dieser Ansatz soll als Be- standteil von modellbasierter Performance-Analyse auf der Ebene der Software- Architektur zum Einsatz kommen, während also nur einzelne Teilkomponen- ten der Anwendung zur Verfügung stehen. Der Nutzen des neuen Ansatzes liegt darin, dass weniger Zeit und Kosten für modellbasierte Performance- vorhersagen in Dimensionierungs- und Verlegungsszenarien aufgewendet wer- den müssen. Die Anwendung der modellbasierten Performance-Vorhersage beginnt bereits zur Entwurfszeit, da das Hinauszögern von Performance-Analysen bis zur Implementierungsphase dazu führt, dass die Behebung der aufgedeck- ten Performance-Probleme mit umso höheren Kosten verbunden ist, je weiter die Implementierung fortgeschritten ist. Die Anwendungen werden dabei mit Hilfe von Software-Komponenten modelliert, welche als austauschbare und unabhängig einsetzbare Einheiten mit schnittstellenbasierter Kommunika- tion einen gegliederten Entwurf und nichtmonolitische Umsetzung erlauben. Die zur Entwurfszeit bereits implementiert vorliegende Komponenten wer- den dabei mit dem beschriebenen Ansatz analysiert; fr¨ noch nicht implemen- tierte Komponenten werden Schätzungen und Performance-Vorgaben (z.B. über Service Level Agreements) verwendet. Die Modellierung der Performance wird in den meisten komponentenbasierten Ansätzen über die Annotation von Zeitwerten an Elemente von Verhaltensmodellen bewerkstelligt, welche an- schließend durch einen analytischen oder simulationsbasierten Ansatz ausgew- ertet werden. Der signifikante Nachteil der Verwendung von Zeitwerten zur Performance-Modellierung ist allerdings deren plattformspezifische Natur, so- dass das resultierende Modell auch plattformspezifisch bleibt. Deshalb muss das Modell für jede betrachtete Ausführungsumgebung dupliziert und neu an- notiert werden. Erschwerend kommt hinzu, dass die Dauer von Komponenten- diensten oft im Nanosekundenbereich liegt und mit zur Verfügung stehenden Bibliotheksmethoden zur Zeitmessung nicht akkurat gemessen werden kann, da diese zu grobgranular dafür sind. Der wissenschaftliche Beitrag der vorliegenden Doktorarbeit ist ein neuer modellbasierter Ansatz für Messung und Vorhersage von plattformun- abhängigen Ressourcenverbräuchen und plattformspezifischen Ausführung- szeiten von Software-Komponenten. Der vorgestellte Ansatz ist auf Anwendun- vi gen ausgerichtet, die in Bytecode vorliegen und damit von virtuellen Maschinen (VMs, z.B. Java VM) plattformübergreifend ausgeführt werden können. Der An- satz berücksichtigt dabei statische und dynamische Optimierungen, die in mo- dernen VMs eingesetzt werden, wie z.B. die Kompilierung von Bytecode nach Maschinencode zur Laufzeit (Just-in-Time compilation, „JIT“) oder das Inlining von Methoden. Durch weitestgehende Automatisierung der einzelnen Schritte (und vor al- lem durch automatisches Benchmarken der Ausführungsplattform) ist der An- satz dabei in der Lage, den manuellen Aufwand für die Performancevorhersage zu minimieren. Das Benchmarken der virtuellen Maschine umfasst sowohl die feingranularen Bytecodebefehle (z.B. Addition oder Arraybenutzung) als auch die Bibliotheksmethoden der Plattform-API. Indem die Performance der Anwendung von der Performance der Ausführungsplattform getrennt wird, sinkt auch der Aufwand für die Betrachtung verschiedener Plattformen in Dimensionierungs- und Verlegungsszenarien. So ist es nicht länger notwendig, die Anwendung auf jeder der betrachteten Plattformen zu installieren und durchzumessen. Für die Auswahl der Bibliotheksmethoden für die Messung der Zeit entwick- elt die vorliegende Arbeit einen neuen plattformunabhängigen Ansatz, der die Qualitätsattribute dieser Methoden quantifiziert und durch eine neue aggregi- erende Metrik den Vergleich zwischen diesen Bibiotheksmethoden erleichtert. Die Implementierung des Ansatzes erweitert das Palladio- Komponentenmodell (Palladio Component Model, PCM), und kann damit über dessen Werkzeuge für Performance-Vorhersagen benutzt werden. Um die neu eingeführen plattformunabhängigen Ressourcenverbräuche in PCM-Modellen verwenden zu können, wurde das PCM-Metamodell und die entsprechenden Modelltransformationen erweitert. Zudem wurden Werkzeuge für die Gener- ierung von Modellinstanzen aus Ressourcenbenutzung durch Komponenten und aus Benchmarking-Ergebnissen von Ausführungsplattformen entwickelt. Im Unterschied zu existierenden Ansätzen zeichnet sich der Beitrag der vorliegenden Arbeit durch folgende Eigenschaften aus:
vii • Die Ausführungsplattform muss weder instrumentiert noch verändert werden.
• Die Performance-Erhöhungen durch Laufzeitoptimierungen der Aus- führungsplattform (z.B. JIT) werden quantifiziert und bei der Performance- Vorhersage berücksichtigt.
• Bibliotheksmethoden wie z.B. diejenigen der Java Platform API werden als atomare Einheiten während der Benchmarking-Phase betrachtet und nicht in Bytecodeinstruktionen aufgespalten, da ihre Performance auf Meth- odenebene besser handhabbar und für Nutzer leichter verständlich ist.
• Während Profiler die gemessenen Zeitenwerte als Durchschnitt oder Me- dian zur Verfügung stellen, unterstützt der vorgestellte Ansatz stochas- tische Verteilungen von Bytecode-basierten Ressourcennutzungswerten und hat damit einen höheren Informationsgehalt.
Eine umfangreiche Validierung des neuen Verfahrens zur Performance- vorhersage untersucht die Güte der Vorhersageergebnisse mit Hilfe weit ver- breiteter Benchmarks wie SPECjvm2008, SPECjbb2005 und Linpack. Die Val- idierung zeigt die Genauigkeit der Vorhersagen und die Überlegenheit des vorgestellten Verfahrens gegenüber dem bisher in PCM benutzten Ansatz, der auf Zählung von CPU-Zyklen basiert. Die Validierung benutzt eine Aus- führungsplattform als Basis für die Quantifizierung plattformunabhängiger Ressourcenverbräuche, und sagt dann die Performance der betrachteten Ap- plikationen auf anderen Ausführungsplattformen voraus, ohne diese Applika- tionen dort zu installieren und zu messen. Die Validierung umfasst ebenso die einzelnen Bestandteile des Ansatzes, also die Bestimmung der Bytecode-orientierten Ressourcenverbräuche sowie das ap- plikationsunabhängige Benchmarken der virtuellen Maschinen. Das im Rah- men der Dissertation entwickelte Verfahren zur Quantifizierung von Qualitätat- tributen der Timermethoden wird auf zahlreiche Methoden unter Java und .NET angewandt und die Ergebnisse werden anhand der neu eingeführten Met- rik verglichen. viii Acknowledgements
This thesis has one author but many people to thank for – colleagues and col- laborators, students and staff, family and friends. First and foremost, the vision and wisdom of my parents Valentina and Ilya have inspired me for many years, and I’ve learned a lot more from them than can fit into any PhD thesis. Their unconditional love and support but also fair and pointed criticism provided me with a framework for which I am endlessly grateful. Therefore, I dedicate this thesis to them. Prof. Ralf Reussner has been a great PhD advisor, research group leader and a wonderful person to work with. Ralf has provided me with an environment to explore, to invent and to publish and he has supported my work in every manner. I’m especially grateful for his trust and his patience at the beginning of my work, and for providing me with numerous opportunities to teach. Ralf has managed to bring good mood, a sense of belonging together and a common vision to a team of people with different backgrounds and individual research interests. Despite his increasingly tight schedule, Ralf always found time for advising me and his critical reviews helped to shape this thesis. Prof. Walter F. Tichy has provided helpful feedback even before the thesis writing phase, and his comments during IPD seminars and during oral examin- ations have given me several useful insights. I’m very grateful for his involve- ment as the second advisor of my thesis, and for his suggestions on how to improve it. Over the years, many members of the Software Design and Quality research group (SDQ) have scrutinized my work, reviewed my publications, and gave a lot of much-appreciated advice on my research, presentations and implementa- tions. Klaus Krogmann has been a great officemate, a demanding co-author of papers and an engaged reviewer of this thesis – and we also had a lot of fun for over four years! Steffen Becker, Heiko Koziolek and Jens Happe gave me useful advice and provided much-valued reviews at the beginning of my PhD research. Samuel Kounev has inspired me with his enthusiasm, and invited me to participate in the visionary work of the SPEC Research Group. Thomas Gold- schmidt reviewed this thesis and gave me useful advice on its readability. Erik Burger, Jörg Henß, Heinz Herrmann, Elena Kienhöfer, Anne Koziolek and many, many others at the KIT and at the FZI supported me in various organizational, technical and scientific matters. Advising students was a great and rewarding part of my PhD work, and particular gratitude goes to Martin Krogmann for his work on bytecode instru- mentation, to Fouad Omri for his work on benchmarking, as well as to Michael Hauck, Sebastian Bauer, and all others who influenced the work described in this thesis. My family and relatives, spread over countries and continents, kept remind- ing me that there is life outside of Eclipse and TexMakerX, and their constant inquiries about my progress and the applicability of my work were an addi- tional inspiration. During my PhD work and the writing of this thesis, many friends had to cut back on our joint hobbies and interests. Now that their wait- ing is over, I look forward to all the other activities and dreams which we had put in hibernation mode.
Karlsruhe, November 2010 Michael Kuperberg
x Contents
1 Introduction 1 1.1 Motivation ...... 1 1.2 Problem Statement and Scientific Challenges ...... 4 1.3 Shortcomings of Existing Solutions ...... 8 1.4 Thesis Approach ...... 11 1.5 Contributions ...... 16 1.6 Validation ...... 18 1.7 Thesis Organisation ...... 19
2 Foundations and State-of-the-Art 21 2.1 Software Performance ...... 21 2.2 Performance Evaluation, Engineering, Optimisation, Modelling and Prediction ...... 24 2.2.1 Model-based Performance Prediction ...... 25 2.2.2 Software Performance Engineering ...... 26 2.3 Benchmarking and Performance Measuring ...... 27 2.3.1 Benchmark Types ...... 28 2.3.2 Overview of Benchmarks ...... 31 2.3.3 Summary ...... 32 2.4 An Overview of Timer Methods, Timers and Counters ...... 33 2.4.1 Hardware Performance Counters and Monitors ...... 34 2.4.2 Software-Provided Performance Indicators ...... 38 2.4.3 Timer Methods ...... 39 2.4.4 Summary ...... 44 2.5 Middleware, Virtual Machines and Bytecode ...... 45 Contents
2.6 Just-in-Time Compilation ...... 47 2.7 Bytecode Engineering ...... 50 2.8 Instrumentation ...... 51 2.9 Ahead-Of-Time Compilation (AOT) ...... 52 2.10 Workload Quantification, Resource Demand Quantification and Profiling ...... 54 2.11 Software Components and their Performance ...... 56 2.11.1 Component Basics ...... 57 2.11.2 Component Modelling ...... 58 2.11.3 Component Performance Modelling ...... 60 2.12 Platform-independent Resource Demands ...... 62 2.13 Palladio Component Model ...... 63 2.13.1 Component Modelling ...... 65 2.13.2 Execution Platform and System Usage Modelling ...... 67 2.14 Quantitative Impact of JVM Optimizations ...... 69
3 Evaluating and Selecting Methods for Time Measurement 73 3.1 Issues and Challenges with Obtaining Timing Values for Perform- ance Analysis ...... 74 3.2 Foundations of Timer Methods ...... 76 3.2.1 Quality Properties for Counters, Timers and Timer Methods 79 3.2.2 The Influence of Quantisation, Accuracy and Method In- vocation Costs on Measured Timing Values ...... 82 3.2.3 The Effects of Rounding and Truncating ...... 85 3.3 Quantifying Accuracy and Invocation Cost of Timing Methods . . . 89 3.3.1 A Naive Approach to Estimating Timer Invocation Costs . . 89 3.3.2 Using Clustering for Quantifying Accuracy and Invocation Cost ...... 91 3.3.3 Timer Method Invocation in Detail ...... 98 3.4 Analysing Units, Monotonicity and Stability ...... 100 3.4.1 Quantifying Units of Counters and Timers ...... 102
xii Contents
3.4.2 Analysing Monotonicity during Concurrent Access to Tim- ing Methods ...... 105 3.4.3 Analysing Stability of a Timer ...... 109 3.5 Computing the Maximum Measurable Time Interval and the Epochs113 3.5.1 Foundations ...... 115 3.5.2 Impact of Overflow on Timer Methods with High Precision . 117 3.5.3 Impact of Overflow on Measuring Time Intervals ...... 119 3.5.4 Computing the Last and Next Epochs ...... 121 3.6 A Unified Quality Metric for Timer Methods ...... 122 3.6.1 Accounting for Different CPU Processing Speeds ...... 123 3.6.2 Factors Contributing to the Unified Timer Quality Metric . . 123 3.6.3 Designing the Unified Timer Quality Metric ...... 125 3.6.4 Choice of the Exponents for the Unified Timer Quality Metric126 3.7 Summary ...... 128
4 Quantifying Resource Demands for Performance Prediction 131 4.1 Timing Values versus Resource Demands ...... 133 4.1.1 Effects on Preemption on Response Time Measurements . . . 134 4.1.2 Addressing Preemption during Time Measurements . . . . . 134 4.1.3 Resource Demands ...... 136 4.2 Requirements for Resource Demand Usage in the PCM ...... 138 4.3 Using Java Bytecode for Resource Demand Quantification ...... 140 4.3.1 Foundations of Java Bytecode ...... 141 4.3.2 Black-box Java Bytecode ...... 143 4.3.3 Bytecode Instructions with Special Roles and Properties . . . 145 4.3.4 Parameters of Bytecode Instructions ...... 147 4.3.5 Methods in Bytecode and Java Platform API ...... 153 4.3.6 Native Methods in Java Bytecode ...... 158 4.3.7 Static Methods in Java Bytecode ...... 160 4.3.8 Working with Calling Context Trees ...... 161 4.3.9 Considering Subtrees of Calling Context Trees ...... 166 4.3.10 Usage of Passive Resources from Java Bytecode ...... 168
xiii Contents
4.3.11 Bytecode Instruction Equivalence Classes ...... 170 4.4 Using Transparent Application Instrumentation for Bytecode Counting ...... 171 4.4.1 Requirements for the Instrumentation Process ...... 173 4.4.2 Evaluating and Storing Counting Results ...... 176 4.4.3 Analysis of Bytecode Invariants and Basic Blocks ...... 178 4.4.4 Inserting Bytecode Infrastructure for Runtime Counting . . . 182 4.4.5 Quantifying the Impact of the Instrumentation ...... 184 4.4.6 Recording Calling Context Details ...... 187 4.4.7 Reporting and Aggregating Counting Results ...... 188 4.5 Assumptions and Limitations ...... 190 4.6 Summary ...... 190
5 Benchmarking the JVM Operations for Performance Prediction 193 5.1 Challenges of Translating Resource Demands into Timing Values . 194 5.2 Bytecode Instruction Benchmarking ...... 196 5.2.1 Unsuitability of Source Code for Bytecode Instruction Benchmarking ...... 200 5.2.2 Unsuitability of Kernel Collections for Bytecode Instruction Benchmarking ...... 201 5.2.3 Attempting to Measure Bytecode Instructions using Byte- code Engineering ...... 203 5.2.4 Attempting to Create Bytecode Benchmarks Randomly . . . 206 5.2.5 Preconditions and Postconditions of Bytecode Instructions . 208 5.2.6 Bytecode Benchmarking Scenarios ...... 211 5.2.7 Overview of Scenario-driven Automated Bytecode Bench- marking ...... 215 5.3 Method and API benchmarking ...... 217 5.3.1 Scientific Challenges ...... 218 5.3.2 Foundations ...... 221 5.3.3 Overview of the APIBENCHJ Framework ...... 224 5.3.4 Satisfying Preconditions using Heuristics ...... 227 xiv Contents
5.3.5 Heuristic Exception Handler ...... 234 5.3.6 Generating and Executing Microbenchmarks ...... 242
6 Performance Prediction and PCM Integration 247 6.1 Computing the Predicted Execution Duration ...... 249 6.1.1 Selecting the Input for Prediction Calibration ...... 250 6.1.2 Computing the Calibration Factor ...... 252 6.2 Integration into the Palladio Component Model ...... 256 6.2.1 Existing Resource Demand Modelling in the PCM ...... 256 6.2.2 Bytecode-based Performance Prediction: Unsuitability of existing PCM Resource Modelling ...... 258 6.2.3 Scenarios and Requirements for Extending the PCM Meta- model ...... 261 6.2.4 Extensions of the PCM Metamodel ...... 263 6.2.5 Modelling the JVM and the Bytecode Components ...... 268 6.2.6 Representing JVM Instructions and Methods as Resource Services ...... 269 6.2.7 Expressing the Platform-specific Nature of JVM Bench- marking Results ...... 271 6.2.8 Modelling the Calibration Factor ...... 273 6.3 Summary ...... 276
7 Validation 277 7.1 Bytecode-based Performance Prediction ...... 277 7.1.1 Validation Overview ...... 278 7.1.2 Subjects and Scenarios for the Validation ...... 280 7.1.3 Performance Prediction: Goals, Questions and Metrics . . . . 282 7.1.4 Performance Prediction: Results of Validation ...... 284 7.1.5 Resource Demand Quantification: Goals, Questions and Metrics for Validation ...... 299 7.1.6 Resource Demand Quantification: Validation Results . . . . . 300
xv Contents
7.1.7 Execution Platform Benchmarking: Goals, Questions and Metrics for Validation ...... 306 7.1.8 Execution Platform Benchmarking: Validation Results . . . . 307 7.1.9 Summary and Discussion ...... 313 7.2 Timer Evaluation ...... 315 7.2.1 Stability and Monotonicity ...... 318 7.2.2 Units: Computing and Verifying ...... 323 7.2.3 Accuracy, Invocation Cost and Invocation Cost Spread . . . . 327 7.2.4 Effect of Just-in-Time compilation on Timer Methods . . . . . 333 7.2.5 Epochs and Maximum Measurable Time Intervals ...... 335 7.2.6 Unified Timer Quality Metric ...... 338 7.2.7 Summary and Discussion ...... 339
8 Related Work 343 8.1 Timer Methods ...... 343 8.2 Runtime Counting of Executed Bytecode Instructions and Method Invocations ...... 345 8.3 JVM Benchmarking ...... 349 8.4 Performance Prediction ...... 353 8.4.1 Component-based Performance Prediction and Engineering 353 8.4.2 Bytecode-based Performance Prediction ...... 354 8.4.3 Cross-platform Performance Prediction ...... 355 8.5 Resource and Execution Platform Modelling in Component Metamodels ...... 357
9 Conclusion 361 9.1 Summary ...... 361 9.2 Future Work ...... 366 9.2.1 Bytecode-based Resource Demand Quantification ...... 366 9.2.2 Benchmarking of the Java Virtual Machine ...... 368 9.2.3 Timer Methods and Performance Indicators ...... 371 9.2.4 Resource Modelling and Palladio Component Model . . . . . 371 xvi Contents
A Appendix 373 A.1 Performance Equivalence Classes of Java Bytecode Instructions . . 373
B List of Figures 377
C List of Tables 381
D Listings 383
Bibliography 384
xvii
Chapter 1. Introduction
This chapter motivates the work pursued in this thesis, sets the context and the preconditions for the research that is performed, and states the problems that the thesis addresses. The shortcomings of existing approaches are presented to support the focus of the thesis, and to make the targeted field of research more precise. After formulating the resulting scientific challenges and goals, the contributions of the thesis are summarised and the validation of the developed approaches is sketched. Finally, the organisation of the thesis is explained.
1.1. Motivation
Software engineering is concerned with efficient and systematic development and evolution of software applications, following customer requirements and existing best practices. In addition to functional requirements which target the results of the application execution, non-functional requirements such as per- formance or reliability are of substantial importance to the software users. Non- functional requirements and software properties describe the quality of the soft- ware, and how effective the software is in performing its tasks. Software performance has been a major concern and a field of intense re- search, with scientific publications on it appearing in 1969 [1, 2] and possibly even earlier. Yet as the software and underlying hardware have grown and become increasingly complex and concurrent, performance has remained a fo- cal point for researchers and engineers. Performance problems and associated costs have received public attention [3, 4, 5, 6], and have lead to significant ex- penses [7] to correct the underlying issues in the design and implementation of Chapter 1. Introduction the concerned software products. To provide approaches for dealing with these challenges, performance engineering [8] has established itself as a subfield of software engineering. However, when facing budgetary and time constraints in projects, practition- ers deal with performance only at the end of software development projects, which means that the “fix it later” approach is followed. But this delay is prob- lematic since performance flaws are often caused by the architecture and the overall design of an application, in addition to performance-unconscious imple- mentation. Attempting to solve the problem by replacing the originally planned execution platform with one having higher performance causes additional costs, and is ineffective when the software does not scale, as exemplified in [6]. In such cases, correction of performance issues requires architecture-level changes, which turn out to be very expensive since the completed implementation has to be corrected as well. Consequently, design-time analysis and prediction of software performance is required to address potential performance issues as early as possible. As the implementation progresses, performance predictions can be compared to meas- urements, allowing timely corrective actions of need arises. To allow design- time prediction of software performance, several architecture-level approaches (e.g. [9, 10, 11, 12], see [5, 13] for an overview) have emerged and continue to flourish. However, design-time performance analysis is challenging, since no measurable implementation but only an architectural view exists at that time. Making performance analysis a part of already happening design-time activit- ies is particularly practical and promises effort savings through synergies. When an explicit software architecture is being modelled, its artefacts are static as well as dynamic models, which serve as a blueprint during later development. Enriching these models with performance information is especially attractive when the model can be executed (e.g. by simulation), since the model execution then can provide a performance prediction. Rather than developing applications as large, monotonic blocks, decompos- ition into smaller entities has established itself as a maintainability “best prac-
2 1.1. Motivation tice”. The prevalent kind of entities in architectural models are software com- ponents [14] and their connectors. Software components encapsulate design de- cisions and interact with other components over interfaces, while exposing their functionality as services. Examples of well-known and popular implementations of the software com- ponents paradigm are Enterprise Java Beans [15] and Common Object Model (COM [16]). At the same, many advanced software component metamodels (i.e. formal descriptions of components, their roles and properties) have been de- veloped in academia, as surveyed in [17]. Among existing component metamodels targeting business software applica- tions, the Palladio Component Metamodel (PCM [9]) has a particularly extens- ive support for performance predictions. It explicitly parametrises the dynamic performance model of a component over the four performance-influencing factors which are shown in Figure 1.1. These factors are the usage profile [18], the component implementation, external components (addressed over required interfaces) and the execution platform.