Performance Analysis and Optimization of the Java Memory System Carl Stephen Lebsack Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2008 Performance analysis and optimization of the Java memory system Carl Stephen Lebsack Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Computer Sciences Commons Recommended Citation Lebsack, Carl Stephen, "Performance analysis and optimization of the Java memory system" (2008). Retrospective Theses and Dissertations. 15792. https://lib.dr.iastate.edu/rtd/15792 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Performance analysis and optimization of the Java memory system by Carl Stephen Lebsack A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Engineering Program of Study Committee: J. Morris Chang, Major Professor Arun K. Somani Akhilesh Tyagi Zhao Zhang Donna S. Kienzler Iowa State University Ames, Iowa 2008 Copyright c Carl Stephen Lebsack, 2008. All rights reserved. 3307127 3307127 2008 ii DEDICATION I would first like to dedicate this thesis to God by whom all things were made and in whom they are held together. Only by His grace and mercy through the gift of His Son, Jesus Christ, was I afforded the opportunity to live and work and play. I would also like to dedicate this thesis to my parents, Stephen and Debra Lebsack, whose support was pivotal in the completion of this work. I want to thank my siblings, Mary and her husband, Aaron Hudlemeyer, David and Laurel, and the rest of my family for their unwavering belief in my ability to complete PhD study and their continual love and support during my unbelief. I also thank Mary for her help in proofing and editing the thesis text. I, am, terrible, with, commas. Laurel deserves an extra thanks for enduring the front lines of battle while living with me during her own transition into college. I would also like to thank my fiancee, Tegwin Taylor, who, although she came along near the end of my study, was extremely uplifting and provided significant motivation to finish at long last. Thanks also goes out to the rest of my friends and family for their continued encouragement and prayers. I could not have done this alone. iii TABLE OF CONTENTS LIST OF TABLES ................................... vii LIST OF FIGURES .................................. viii CHAPTER 1 Introduction ............................. 1 CHAPTER 2 Constructing a Memory Hierarchy Simulation Environment 7 2.1 Abstract ........................................ 7 2.2 Introduction ...................................... 7 2.3 Environment Construction .............................. 9 2.3.1 Components ................................. 9 2.3.2 Integration .................................. 11 2.3.3 Cluster Deployment ............................. 13 2.4 Experimental Measurement ............................. 15 2.4.1 Object Accesses ............................... 15 2.4.2 Scratchpad Modeling ............................. 17 2.5 Related Work ..................................... 18 2.5.1 bochs ..................................... 19 2.5.2 Dynamic Simple Scalar ........................... 19 2.5.3 Simics ..................................... 19 2.5.4 Vertical Profiling ............................... 20 2.6 Conclusion ...................................... 20 CHAPTER 3 Access Density: A Locality Metric for Objects ........ 21 3.1 Abstract ........................................ 21 iv 3.2 Introduction ...................................... 21 3.3 Theory ......................................... 24 3.4 Experimental Setup ................................. 28 3.4.1 Calculating Access Density ......................... 28 3.4.2 Measuring Memory Traffic .......................... 30 3.5 Results ......................................... 31 3.5.1 Access Density Calculations ......................... 31 3.5.2 Access Density Analysis ........................... 38 3.5.3 Memory Traffic Measurements ....................... 39 3.6 Related Work ..................................... 42 3.7 Conclusion ...................................... 45 CHAPTER 4 Using Scratchpad to Exploit Object Locality in Java ..... 46 4.1 Abstract ........................................ 46 4.2 Introduction ...................................... 46 4.3 Experimental Setup ................................. 49 4.3.1 Experiments ................................. 50 4.4 Results ......................................... 51 4.4.1 Generational Garbage Collection Results .................. 51 4.4.2 Memory Traffic Results ........................... 55 4.5 Related Work ..................................... 61 4.6 Conclusions ...................................... 62 CHAPTER 5 Cache Prefetching and Replacement for Java Allocation .. 63 5.1 Abstract ........................................ 63 5.2 Introduction ...................................... 64 5.3 Cache Management Techniques ........................... 67 5.3.1 Hardware Prefetching ............................ 67 5.3.2 Software Prefetching ............................. 68 5.3.3 Cache Replacement Policy .......................... 69 v 5.4 Experimental Framework .............................. 73 5.5 Experimental Evaluation ............................... 76 5.5.1 Hardware Prefetching for Java Applications ................ 76 5.5.2 Software Prefetching on Allocation for Java Applications ........ 77 5.5.3 Biased Cache Replacement for Java Applications ............. 79 5.5.4 Memory Read Traffic ............................. 81 5.5.5 Prefetch Accuracy .............................. 83 5.5.6 Prefetch Coverage .............................. 83 5.5.7 Bandwidth Aware Performance Model ................... 85 5.6 Related Work ..................................... 89 5.7 Conclusions ...................................... 91 CHAPTER 6 Real Machine Performance Evaluation of Java and the Mem- ory System ..................................... 93 6.1 Abstract ........................................ 93 6.2 Introduction ...................................... 94 6.3 Background ...................................... 96 6.4 Experimental Framework and Methodology .................... 98 6.4.1 Page Coloring Details ............................ 99 6.4.2 Paging and Swap Space ........................... 100 6.4.3 Java Environment .............................. 100 6.4.4 Measurement Methodology ......................... 101 6.4.5 Metrics .................................... 102 6.5 Experimental Results ................................. 103 6.5.1 Java minimum heap values ......................... 103 6.5.2 Java user and system time .......................... 104 6.5.3 Java runtime sensitivity heap size ...................... 105 6.5.4 Java runtime sensitivity to nursery size .................. 106 6.5.5 Java runtime sensitivity to cache performance ............... 110 vi 6.5.6 Java runtime sensitivity to L2 cache size .................. 114 6.6 Related Work ..................................... 118 6.7 Conclusions ...................................... 119 Appendix: Java runtime sensitivity to nursery L2 cache partitioning ......... 120 CHAPTER 7 Conclusion .............................. 128 BIBLIOGRAPHY ................................... 130 ACKNOWLEDGEMENTS .............................. 137 vii LIST OF TABLES Table 3.1 Object parameters and access density ................... 27 Table 4.1 Reduced memory traffic using scratchpad (Bytes) ............ 59 Table 5.1 Memory parameters for select processors ................. 75 Table 6.1 Minimum heap values on JDK 1.6 for DaCapo and SPECjvm98 bench- marks .................................... 104 Table 6.2 Breakdown of user and system time for DaCapo and SPECjvm98 bench- marks .................................... 105 Table 6.3 Runtime increase at 2X heap over 5.5X heap with a half-heap nursery for DaCapo and SPECjvm98 benchmarks ................ 106 Table 6.4 Runtime increase at 2X heap with optimal nursery over 5.5X heap with a half-heap nursery for DaCapo and SPECjvm98 benchmarks ..... 109 Table 6.5 Linear correlation between cache misses and runtime for DaCapo and SPECjvm98 ................................. 112 Table 6.6 Predicted runtime with no cache misses, best achievable runtime and ratio for DaCapo and SPECjvm98 .................... 113 Table 6.7 Cache size sensitivity for DaCapo and SPECjvm98 ........... 117 Table 6.8 Best and worst partitioning runtime ratios for DaCapo and SPECjvm98 126 viii LIST OF FIGURES Figure 1.1 Java System ................................. 2 Figure 1.2 Full Java System .............................. 2 Figure 2.1 Simulation environment components ................... 9 Figure 2.2 Mirrored Java heap in simulator ...................... 16 Figure 2.3 Memory hierarchy with cache and scratchpad .............. 18 Figure 3.1 Object arrangements in scratchpad .................... 26 Figure 3.2 Access density for javac ........................... 32 Figure 3.3 Average and weighted access density vs. object size and vs. object lifetime for javac .............................. 32 Figure 3.4 Follow size trend - Average and weighted access density vs. object size - Small objects have best locality ....................