Towards Enabling Better Understanding and Performance for Managed Languages

UNIVERSITY OF CALIFORNIA Santa Barbara Towards Enabling Better Understanding and Performance for Managed Languages UCSB Tech Report #2012-16, July 2012 A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Nagy Mostafa Committee in Charge: Professor Chandra Krintz, Chair Professor Tim Sherwood Professor Tevfik Bultan June 2012 The Dissertation of Nagy Mostafa is approved: Professor Tim Sherwood Professor Tevfik Bultan Professor Chandra Krintz, Committee Chairperson April 2012 Towards Enabling Better Understanding and Performance for Managed Languages UCSB Tech Report #2012-16, July 2012 Copyright © 2012 by Nagy Mostafa iii To my parents, my wife, my sister, and all the brave men and women of Tahrir Square. iv Acknowledgements I am indebted to all the people who contributed in some way to the progress of this work. I am deeply grateful to Prof. Chandra Krintz for all the support, help, guidance and encouragement that she has provided during the entire process. Chandra’s positive attitude and confidence has always been a motivating factor throughout this work. I would also like to thank Prof. Tim Sherwood and Prof. Tevfik Bultan for serving on my Ph.D. committee and for all the insightful feedback and discussions. I am deeply grateful to my parents, my wife, my sister and my friends for their continuous support and love. Special thanks to my friends Haytham and Hassan for their encouragement. Finally, my deepest gratitude to the courageous men and women of Tahrir Square, Egypt, for their rise against injustice and tyranny. To all the martyrs and injured I dedicate this work. You have made us all proud. v Curriculum Vitæ Nagy Mostafa Education 2012 Doctor of Philosophy in Computer Science, University of California, Santa Barbara. 2011 Master of Science in Computer Science, University of California, Santa Barbara. 2006 Master of Science in Computer Science, Alexandria University, Egypt. 2003 Bachelor of Science in Computer Science, Alexandria University, Egypt. Experience 2011 – Present Performance Engineer, Intel Corp. 2011 Summer Internship, Intel Corp. 2007 Summer Internship, Citrix Online. 2007 – 2011 Research Assistant, University of California, Santa Barbara. 2003 – 2006 Teaching Assistant, Alexandria University, Egypt vi Publications N. Mostafa, C. Krintz, C. Cascaval, D. Edelsohn, P. Nagpurkar, P. Wu. Understanding the Potential of Interpreter-based Optimizations for Python. UCSB Technical Report #2010-14, 2010. N. Mostafa and C. Krintz. Tracking Performance Across Software Revisions. Proceed- ings of the 7th International Conference on Principles and Practice of Programming in Java (PPPJ), pp. 162 - 171, 2009. N. Chohan, C. Bunch, S. Pang, C. Krintz, N. Mostafa, S. Soman, and R. Wolski. App- Scale: Scalable and Open AppEngine Application Development and Deployment. In- ternational Conference on Cloud Computing (CloudComp’09), 2009. N. Mostafa An FPGA-based Chip-Multiprocessor Model. Unpublished Master’s The- sis, Alexandria University, Egypt, 2006. vii Abstract Towards Enabling Better Understanding and Performance for Managed Languages UCSB Tech Report #2012-16, July 2012 Nagy Mostafa Computer systems today are ubiquitous and come in variety of forms. On the low end, there are resource-constrained battery-powered handheld devices that are widely used such as smart phones, tablets and netbooks. On the high end, there are powerful highly parallel multi-core devices such as desktops, workstations and servers. However, the diversity of these devices and the the heterogeneity of their platforms complicate software development. Developers must possess knowledge of a variety of architectures, platforms and languages in order to build, optimize, tune, and deploy software efficiently. Fortunately, there are means to facilitate software development across platforms. Revision Control (RC) systems enable concurrent collaboration between many developers with different languages and platforms expertise. Additionally, automated software deployment is made easier via across-platform software repositories that provide updates, fixes and patches. Finally, advances in managed programming languages (e.g. Java, C#, Python, Ruby, JavaScript) and their managed runtime environments (MREs) viii simplify portable software development by abstracting the details of the target platforms. Despite their benefits, these remedies complicate understanding of software behav- ior and extracting performance. First, RC systems allow source code changes to be made in isolation with no regard to how different modifications interact to affect be- havior and performance. Second, the ease of software deployment leads to different versions of the software being used by millions of users over diverse platforms making it difficult to reason about how the application will be used “in the wild”. Third, MREs abstract the hardware, hinder performance understanding, and advanced MREs usually have high startup cost and footprint. They are also complex to build and maintain, particularly for Dynamic Scripting Languages (DSLs). In this dissertation, we investigate the question of whether we can devise novel pro- file analysis and collection techniques to address the above drawbacks and enable better understanding and improve performance of managed languages. We answer this question by exploring novel solutions that exploit the use of modern collaboration technolo- gies, open source managed runtime systems, and popular software distribution mech- anisms. Our techniques include a performance-aware RC system for Java programs, interpreter-based optimizations and remote compilation framework for DSLs. We de- scribe each of our techniques in detail and present empirical evidence of its efficacy and potential. ix Contents Acknowledgementsv Curriculum Vitæ vi Abstract viii List of Figures xiv List of Tables xvii 1 Introduction1 1.1 Addressing the Challenges of Complex Software Development ::::::::::::::::::::::::::::::::: 3 1.2 Dissertation Contributions ::::::::::::::::::::::: 7 2 Performance-Aware Revision Control 15 2.1 Introduction :::::::::::::::::::::::::::::: 16 2.2 PARCS ::::::::::::::::::::::::::::::::: 19 2.2.1 Performance Profiling and Representation ::::::::::: 20 2.2.2 Identifying Topological Differences :::::::::::::: 23 2.3 PARCS Implementation :::::::::::::::::::::::: 29 2.3.1 CCT Collection :::::::::::::::::::::::: 31 2.3.2 Method-level Bytecode Comparison :::::::::::::: 31 2.3.3 Incremental Topological Comparison ::::::::::::: 33 2.3.4 Identifying Weight Differences :::::::::::::::: 35 2.3.5 Attributing Differences :::::::::::::::::::: 38 2.4 Usage Example: FindBugs ::::::::::::::::::::::: 39 2.5 Experimental Evaluation :::::::::::::::::::::::: 42 x 2.5.1 Bytecode Comparison ::::::::::::::::::::: 44 2.5.2 Topological Difference :::::::::::::::::::: 45 2.6 Related Work ::::::::::::::::::::::::::::: 52 2.7 Conclusion ::::::::::::::::::::::::::::::: 53 3 Dynamic Scripting Languages 55 3.1 Introduction :::::::::::::::::::::::::::::: 55 3.2 Dynamic Features ::::::::::::::::::::::::::: 57 3.2.1 Dynamic Typing :::::::::::::::::::::::: 58 3.2.2 Dynamic Objects ::::::::::::::::::::::: 59 3.2.3 Meta-Programming :::::::::::::::::::::: 61 3.3 Implementations :::::::::::::::::::::::::::: 62 3.3.1 Interpretation ::::::::::::::::::::::::: 62 3.3.2 Just-in-time Compilation ::::::::::::::::::: 66 3.3.3 Ahead-of-time compilation :::::::::::::::::: 69 3.4 Optimizations ::::::::::::::::::::::::::::: 71 3.4.1 Specialization ::::::::::::::::::::::::: 72 3.4.2 Caching :::::::::::::::::::::::::::: 73 3.4.3 Inlining :::::::::::::::::::::::::::: 74 4 Understanding the Efficacy of Interpreter Dispatch Optimization on Mod- ern Architectures for Dynamic Scripting Languages 75 4.1 Introduction :::::::::::::::::::::::::::::: 76 4.2 Background :::::::::::::::::::::::::::::: 78 4.2.1 Interpreters :::::::::::::::::::::::::: 79 4.2.2 Dispatch optimizations :::::::::::::::::::: 81 4.2.3 CPython VM ::::::::::::::::::::::::: 84 4.3 Methodology :::::::::::::::::::::::::::::: 85 4.3.1 Benchmarks :::::::::::::::::::::::::: 86 4.3.2 Virtual Machine Implementations ::::::::::::::: 88 4.3.3 Profiling Tools ::::::::::::::::::::::::: 88 4.4 Python VM Characteristics :::::::::::::::::::::: 89 4.4.1 VM characteristics ::::::::::::::::::::::: 89 4.5 Efficacy of Dispatch Loop Optimization :::::::::::::::: 95 4.5.1 Impact of the Language Runtime and Benchmark Characteristics ::::::::::::::::::::::::::::: 95 4.5.2 Analysis Across Architectures ::::::::::::::::: 99 4.6 Related Work ::::::::::::::::::::::::::::: 104 4.7 Conclusion ::::::::::::::::::::::::::::::: 106 xi 5 Potential of Interpreter-based Optimizations for Python 108 5.1 Introduction :::::::::::::::::::::::::::::: 108 5.2 Methodology :::::::::::::::::::::::::::::: 112 5.3 Performance Analysis ::::::::::::::::::::::::: 113 5.4 Optimizations ::::::::::::::::::::::::::::: 118 5.4.1 Attributes Caching ::::::::::::::::::::::: 119 5.4.2 Load/Store Elimination :::::::::::::::::::: 134 5.4.3 Inlining :::::::::::::::::::::::::::: 140 5.5 Related Work ::::::::::::::::::::::::::::: 144 5.6 Conclusions :::::::::::::::::::::::::::::: 146 6 The Remote Compilation Framework: A Sweetspot Between Interpreta- tion and Dynamic Compilation 147 6.1 Introduction :::::::::::::::::::::::::::::: 148 6.2 Remote Compilation Framework ::::::::::::::::::: 153 6.3 RCF Profiling ::::::::::::::::::::::::::::: 155 6.3.1 Calling Context Tree Profiles ::::::::::::::::: 155 6.3.2 Multi-Input Remote Profiling :::::::::::::::::

Load more