Improving the Performance of Database-Centric
Total Page:16
File Type:pdf, Size:1020Kb
IMPROVING THE PERFORMANCE OF DATABASE-CENTRIC APPLICATIONS THROUGH PROGRAM ANALYSIS by TSE-HSUN CHEN A thesis submitted to the School of Computing in conformity with the requirements for the degree of Doctor of Philosophy Queen’s University Kingston, Ontario, Canada September 2016 Copyright © Tse-Hsun Chen, 2016 Abstract ODERN software applications are becoming more dependent on database management systems (DBMSs). DBMSs are usually used as black M boxes by software developers. For example, Object-Relational Map- ping (ORM) is one of the most popular database abstraction approaches that devel- opers use nowadays. Using ORM, objects in Object-Oriented languages are mapped to records in the database, and object manipulations are automatically translated to SQL queries. As a result of such conceptual abstraction, developers do not need deep knowledge of databases; however, all too often this abstraction leads to in- efficient and incorrect database access code. Thus, this thesis proposes a series of approaches to improve the performance of database-centric software applications that are implemented using ORM. Our approaches focus on troubleshooting and detecting inefficient (i.e., performance problems) database accesses in the source code, and we rank the detected problems based on their severity. We first con- duct an empirical study on the maintenance of ORM code in both open source and industrial applications. We find that ORM performance-related configurations are i rarely tuned in practice, and there is a need for tools that can help improve/tune the performance of ORM-based applications. Thus, we propose approaches along two dimensions to help developers improve the performance of ORM-based appli- cations: 1) helping developers write more performant ORM code; and 2) helping developers configure ORM configurations. To provide tooling support to developers, we first propose static analysis ap- proaches to detect performance anti-patterns in the source code. We automatically rank the detected anti-pattern instances according to their performance impacts. Our study finds that by resolving the detected anti-patterns, the application per- formance can be improved by 34% on average. We then discuss our experience and lessons learned when integrating our anti-pattern detection tool into industrial practice. We hope our experience can help improve the industrial adoption of future research tools. However, as static analysis approaches are prone to false positives and lack runtime information, we also propose dynamic analysis approaches to fur- ther help developers improve the performance of their database access code. We propose automated approaches to detect redundant data access anti-patterns in the database access code, and our study finds that resolving such redundant data access anti-patterns can improve application performance by an average of 17%. Finally, we propose an automated approach to tune performance-related ORM configura- tions using both static and dynamic analysis. Our study shows that our approach can help improve application throughput by 27–138%. Through our case studies on real-world applications, we show that all of our proposed approaches can provide valuable support to developers and help improve application performance significantly. ii Acknowledgments First of all, I would like to thank my supervisor Dr. Ahmed E. Hassan for his contin- uous guidance and support throughout my graduate study. You are the best mentor and supervisor that one can ask for. I am extremely lucky that I have the chance to work under your supervision. Thank you very much for encouraging me to pursue the research topics that I love. Your great guidance has helped me grow not only as a researcher, but also a thinker. Your advice has become my mottos and is priceless to my future career. A sincere appreciation to my supervisory and examination committee members, Dr. Patrick Martin, Dr. Juergen Dingel, Dr. Hossam Hassanein, and Dr. Mark Grechanik for their continued critique and guidance. Many thanks to my examiners for their valuable and insightful feedback on my work. I am very honored to have the chance to work and collaborate with the brightest researchers during my Ph.D. career. I would like to thank all of my labmates and collaborators, Dr. Weiyi Shang, Dr. Zhen Mining Jiang, Dr. Meiyappan Nagappan, Heng Li, Dr. Emad Shihab, Dr. Stephen Thomas, Dr. Hadi Hemmati, and Dr. iii Michael Godfrey for all the fruitful discussions and collaborations. I would like to thank BlackBerry and especially the members of the Performance Engineering team. I cloud not find my thesis topic without the industrial environ- ment and thoughtful feedback provided by the BlackBerry team. I would like to thank Mohamed Nasser and Parminder Flora for their continuous support through- out my Ph.D. career. A special thanks to my family, especially my father, mother, and brother. Thank you for all the sacrifices that you have made for me during my study. Your precious and invaluable love have been my greatest support in my life. At the end, I would like to express my greatest appreciation to my beloved wife Jinqiu Yang. Jinqiu, thank you very much for your understanding and continuous support. I dedicate this thesis to you. iv Dedication To my parents, my family, and my special lady, Jinqiu Yang. v Related Publications In all chapters and related publications of the thesis, my contributions are: draft- ing the initial research idea; researching background knowledge and related work; implementing the tools; conducting experiments; and writing and polishing the writing. My co-authors supported me in refining the initial ideas, pointing me to missing related work, providing feedback on earlier drafts, and polishing the writ- ing. Earlier versions of the work in the thesis were published as listed below: 1. Improving the Quality of Large-Scale Database-Centric Software Systems by An- alyzing Database Access Code (Chapter1) Tse-Hsun Chen, 31st International Conference on Data Engineering (ICDE), PhD Symposium, 2015. Seoul, Korea. Pages 245–249. 2. An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems (Chapter4) Tse-Hsun Chen, Weiyi Shang, Jinqiu Yang, Ahmed E. Hassan, Michael W. God- frey, Mohamed Nasser, and Parminder Flora, 13th International Conference on vi Mining Software Repositories (MSR), 2016. Austin, Texas. Pages 165–176. 3. Detecting Performance Anti-patterns for Applications Developed using Object- relational Mapping (Chapter5) Tse-Hsun Chen, Weiyi Shang, Zhen Ming Jiang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora, 36th International Conference on Software En- gineering (ICSE), 2014. Hyderabad, India. Pages 1001–1012. 4. Detecting Problems in Database Access Code of Large Scale Systems - An Indus- trial Experience Report (Chapter6) Tse-Hsun Chen, Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Par- minder Flora, 38th International Conference on Software Engineering, Soft- ware Engineering in Practice Track (ICSE-SEIP), 2016. Austin, Texas. Pages 71–80. 5. Finding and Evaluating the Performance Impact of Redundant Data Access for Applications that are Developed Using Object-Relational Mapping Frameworks (Chapter7) Tse-Hsun Chen, Weiyi Shang, Zhen Ming Jiang, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora, IEEE Transactions on Software Engineering (TSE), 2016. In Press. 6. CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate- based Database-centric Web Applications (Chapter8) Tse-Hsun Chen, Weiyi Shang, Ahmed E. Hassan, Mohamed Nasser, and Par- minder Flora, 24th ACM SIGSOFT International Symposium on the Founda- tions of Software Engineering (FSE), 2016. Seattle, WA. Accepted. vii Table of Contents Abstracti Acknowledgments iii Dedicationv Related Publications vi List of Tables xi List of Figures xiii Chapter 1: Introduction............................ 1 1.1 Thesis Statement............................. 2 1.2 Thesis Overview.............................. 3 1.3 Thesis Contributions ........................... 8 Chapter 2: Literature Review......................... 9 2.1 Paper Selection Process.......................... 11 2.2 Program Analysis............................. 11 2.3 Code Transformation........................... 15 2.4 Domain specific languages and APIs................... 17 2.5 Chapter Summary............................. 17 Chapter 3: Background about Object-Relational Mapping . 18 3.1 Background on ORM Frameworks.................... 19 3.2 Translating Objects to SQL Queries ................... 19 3.3 Caching .................................. 22 3.4 Chapter Summary............................. 25 Chapter 4: Maintenance Activities of ORM Code.............. 26 4.1 Introduction................................ 27 viii 4.2 The Main Contributions of this Chapter................. 29 4.3 Related Work ............................... 30 4.4 Preliminary Study............................. 32 4.5 Case Study Results ............................ 34 4.6 Highlights and Implications of our findings............... 52 4.7 Threats to Validity ............................ 54 4.8 Chapter Summary............................. 56 Chapter 5: Statically Detecting ORM Performance Anti-patterns . 58 5.1 Introduction................................ 59 5.2 The Main Contributions of this Chapter................. 60 5.3 Motivating Examples........................... 62 5.4 Our Framework.............................. 64 5.5 Case Study................................. 72 5.6 Discussion................................