Enabling Richer Insight Into Runtime Executions of Systems Karthik Swaminathan Nagaraj Purdue University
Total Page:16
File Type:pdf, Size:1020Kb
Purdue University Purdue e-Pubs Open Access Dissertations Theses and Dissertations Fall 2013 Enabling Richer Insight Into Runtime Executions Of Systems Karthik Swaminathan Nagaraj Purdue University Follow this and additional works at: https://docs.lib.purdue.edu/open_access_dissertations Part of the Computer Sciences Commons Recommended Citation Nagaraj, Karthik Swaminathan, "Enabling Richer Insight Into Runtime Executions Of Systems" (2013). Open Access Dissertations. 96. https://docs.lib.purdue.edu/open_access_dissertations/96 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. Graduate School ETD Form 9 (Revised 12/07) PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance This is to certify that the thesis/dissertation prepared KARTHIK SWAMINATHAN NAGARAJ By ! Entitled! ENABLING RICHER INSIGHT INTO RUNTIME EXECUTIONS OF SYSTEMS Doctor of Philosophy For the degree of Is approved by the final examining committee: CHARLES E. KILLIAN PATRICK T. EUGSTER Chair JENNIFER L. NEVILLE DONGYAN XU RAMANA R. KOMPELLA To the best of my knowledge and as understood by the student in the Research Integrity and Copyright Disclaimer (Graduate School Form 20), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy on Integrity in Research” and the use of copyrighted material. CHARLES E. KILLIAN Approved by Major Professor(s): ____________________________________ ____________________________________JENNIFER L. NEVILLE SUNIL PRABHAKAR 10/10/2013 Approved by: Head of the Graduate Program Date ENABLING RICHER INSIGHT INTO RUNTIME EXECUTIONS OF SYSTEMS A Dissertation Submitted to the Faculty of Purdue University by Karthik Swaminathan Nagaraj In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2013 Purdue University West Lafayette, Indiana ii To my wife Ashwathi, my parents Nagaraj & Rajeswari, and sister Narmadha. iii ACKNOWLEDGMENTS There have been many people who have immensely supported me over the last five years, and it would be erroneous on my part to not mention them. To begin with, I thank my wife, Ashwathi, for all her love, encouragement, support and belief in me through this long journey. My parents and sister constantly motivated and pushed me toward scholastic excellence, for which I’ll always be grateful. My advisors Chip Killian and Jennifer Neville provided abundant encouragement and guidance, along with spirited discussions and hands-on experience, while persuad- ing me to attend many conferences. Their recommendations never failed to bring a different perspective to my work. Working with them came with abundant freedom, making for a wonderful PhD experience with little stress. Their unceasing motivation kept my enthusiasm high to bring this dissertation to a successful completion. I am also thankful to Ramana Kompella, Dongyan Xu and Patrick Eugster for their insights, suggestions and support. It would be remiss of me to not thank my collaborators at Microsoft Research – John (JD) Douceur and Bryan Parno who exposed me to a remarkably different style of research and a great work atmosphere, during my summer internship. I had the good fortune to work with great individuals in my lab on variousprojects, and I appreciate the assistance from KC Sivaramakrishnan, Lukasz Ziarek, Hyojeong Lee, Sunghwan Yoo, Hitesh Khandelwal, Jayaram KR and Bo Sang. TheCSsoftware and hardware facilities have been particularly patient and helpful in managing our server machines, allowing me to focus on my research. iv TABLE OF CONTENTS Page LIST OF TABLES ................................ vi LIST OF FIGURES ............................... vii ABBREVIATIONS ................................ viii ABSTRACT ................................... ix 1INTRODUCTION.............................. 1 1.1 Problems with Managing and Maintaining Systems Software .... 2 1.1.1 Development Problems ..................... 4 1.1.2 Maintenance Problems ..................... 4 1.2 Instrumentation: Record Runtime State ............... 5 1.3 Facilitating Large Scale Analysis Using Statistics and Machine Learning 6 1.4 Thesis Statement ............................ 7 1.5 Contributions .............................. 7 1.5.1 Distalyzer: Diagnosing Performance Problems in Systems Soft- ware ............................... 8 1.5.2 PerfDetect: Tracking Performance Changes of Systems in Code Repositories ........................... 9 1.6 Road Map ................................ 9 2DISTALYZER:DIAGNOSINGPERFORMANCEPROBLEMSINSYS- TEMS SOFTWARE ............................. 10 2.1 Instrumentation ............................. 12 2.2 Design .................................. 14 2.2.1 Feature Creation ........................ 16 2.2.2 Predictive Modeling ...................... 20 2.2.3 Descriptive Modeling ...................... 22 2.2.4 Attention Focusing ....................... 25 2.3 Implementation ............................. 27 2.3.1 Processing Text Log Messages ................. 27 2.3.2 Distalyzer ........................... 28 2.4 Case Studies ............................... 30 2.4.1 TritonSort ............................ 30 2.4.2 HBase .............................. 33 2.4.3 Transmission .......................... 36 2.5 Related Work .............................. 41 v Page 2.6 Practical Implications .......................... 43 2.7 Summary ................................ 44 3TRACKINGPERFORMANCECHANGESOFSYSTEMSINCODEREPOS- ITORIES ................................... 46 3.1 Designing PerfDetect ........................ 49 3.1.1 Performance Metrics Features ................. 52 3.1.2 Windows of Large Software Changes ............. 54 3.1.3 Change Detection by Trend Estimation ............ 55 3.1.4 Queue Extra Experiments ................... 56 3.2 Measurement & Implementation .................... 57 3.2.1 Transmission .......................... 59 3.2.2 V8 ................................ 60 3.2.3 Hadoop ............................. 61 3.3 Experiences ............................... 62 3.3.1 Quantitative Evaluation .................... 62 3.3.2 Practical Experiences ...................... 66 3.3.3 Transmission .......................... 66 3.3.4 V8 ................................ 71 3.3.5 Hadoop ............................. 74 3.4 Related Work .............................. 75 3.5 Summary ................................ 77 4SUMMARY.................................. 78 4.1 Contributions .............................. 78 4.2 Future Work ............................... 81 LIST OF REFERENCES ............................ 84 VITA ....................................... 91 vi LIST OF TABLES Table Page 2.1 Event and State feature types extracted from the logs .......... 17 2.2 Summary of performance issues diagnosed using Distalyzer ..... 29 2.3 Distalyzer diagnosis: Reduction in the number of features for each di- agnosed problem .............................. 30 3.1 Summary of software and benchmarks used in our measurement and eval- uation .................................... 58 3.2 Descriptions of the important performance changes observed in Transmis- sion ..................................... 68 3.3 Descriptions of the important performance changes observed in Chrome V8 and Hadoop ............................... 72 vii LIST OF FIGURES Figure Page 1.1 Performance management of systems software through automated detec- tion and diagnosis of degradations ..................... 8 1.2 Framework for automated analysis of runtime instrumentation to aid in system management ............................ 9 2.1 Four-step log comparison process in Distalyzer leading up to a visual interface ................................... 15 2.2 Example Dependency Network (DN). ................... 24 2.3 Algorithm for feature scoring of dependency networks, in attention focusing 26 2.4 Distalyzer logging API ......................... 28 2.5 TritonSort dependency graphs indicating the root cause of the slow runtime 31 2.6 DN for unmodified HBase events ..................... 34 2.7 DN for HBase after fixing lookups ..................... 34 2.8 Dependency graphs for unmodified Transmission ............ 38 2.9 Dependency graphs for BitTorrent after fixing the NAT problem .... 40 3.1 Performance measurement study of Transmission and V8 software reposi- tories ..................................... 50 3.2 Design of PerfDetect for detecting deviations of performance metrics from their expected behavior ........................ 51 3.3 Comparisons of prediction accuracy of PerfDetect to other methods 64 3.4 Box-plot of Transmission file download performance indicating three per- formance issues ............................... 67 3.5 Three performance problems identified in the V8 JavaScript engineacross two benchmarks ............................... 73 3.6 Performance improvement on the Hadoop TeraSort benchmark ..... 75 viii ABBREVIATIONS API Application Programming Interface CDF Cumulative Distribution Function CPD Conditional Probability Distribution DN Dependency Network IP Internet Protocol SLA Service Level Agreement TCP Transport Control Protocol UDP User Datagram Protocol ix ABSTRACT Nagaraj, Karthik Swaminathan Ph.D., Purdue University, December 2013. Enabling Richer Insight into Runtime Executions of Systems. Major Professors: Charles E. Killian and Jennifer L. Neville. Systems software of very large scales are being heavily used today in various im- portant scenarios such as online retail, banking, content services,websearchand