Performance Regression Detection in Devops

Performance Regression Detection in DevOps Jinfu Chen A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (Software Engineering) at Concordia University Montreal,´ Quebec,´ Canada November 2020 © Jinfu Chen, 2020 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Jinfu Chen Entitled: Performance Regression Detection in DevOps and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Software Engineering) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the Final Examining Committee: Chair Dr. Youmin Zhang External Examiner Dr. Andy Zaidman Examiner Dr. Nikolaos Tsantalis Examiner Dr. Jinqiu Yang Examiner Dr. Yan Liu Supervisor Dr. Weiyi Shang Approved by Dr. Leila Kosseim, Graduate Program Director November 9, 2020 Dr. Mourad Debbabi, Dean Gina Cody School of Engineering and Computer Science Abstract Performance Regression Detection in DevOps Jinfu Chen, Ph.D. Concordia University, 2020 Performance is an important aspect of software quality. The goals of performance are typically defined by setting upper and lower bounds for response time and throughput of a system and physical level measurements such as CPU, memory, and I/O. To meet such performance goals, several performance-related activities are needed in development (Dev) and operations (Ops). Large software system failures are often due to performance issues rather than functional bugs. One of the most important performance issues is performance regression. Although performance regressions are not all bugs, they often have a direct impact on users’ experience of the system. The process of detection of performance regressions in development and operations is faced with challenges. First, the detection of performance regression is conducted after the fact, i.e., after the system is built and deployed in the field or dedicated performance testing environments. Large amounts of resources are required to detect, locate, understand, and fix performance regressions at such a late stage in the development cycle. Second, even we can detect a performance regression, it is extremely hard to fix it because other changes are applied to the system after the introduction of the regression. These challenges call for further in-depth analyses of the performance regression. In this thesis, to avoid performance regression slipping into operation, we first perform an exploratory study on the source code changes that introduce performance regressions in order to understand root-causes of performance regression in the source code level. Second, we propose an approach that automatically predicts whether a test would manifest performance regressions in a code commit. Most of the performance issues are related to configurations. Therefore, third, we propose an iii approach that predicts whether a configuration option manifests a performance variation issue. To assist practitioners to analyze system performance with operational data, we propose an approach to recovering field-representative workload that can be used to detect performance regression. iv Related Publications The following publication is related to the materials presented in this thesis: • Jinfu Chen and Weiyi Shang. An Exploratory Study of Performance Re-gression Introduc- ing Code Changes. In2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, Shanghai, China, September17-22, 2017. 341-352. • Jinfu Chen, Weiyi Shang, Ahmed E. Hassan, Yong Wang, and Jiangbin Lin. An Experience Report of Generating Load Tests Using Log-Recovered Workloadsat Varying Granularities of User Behaviour. In34th IEEE/ACM InternationalConference on Automated Software Engi- neering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 669-681. • Jinfu Chen. Performance Regression Detection in DevOps. In the 42nd International Con- ference on Software Engineering Doctoral Symposium (ICSE DS). 2020. • Jinfu Chen, Weiyi Shang, and Emad Shihab. PerfJIT: Test-level Just-in-time Prediction for Performance Regression Introducing Commits. IEEE Transactions on Software Engineering, 2020. • Jinfu Chen, Mohammed Sayagh, Heng Li, Weiyi Shang, and Bram Adams. An Empirical Study on the Inconsistent Options Performance Variation. Empirical Software Engineering, 2020. [Under Review] The following publications are not directly related to the material presented in this thesis but were conducted as parallel work to the research presented in this thesis. v • Yi Zeng, Jinfu Chen, Weiyi Shang, and Tse-Hsun Chen, Studying the characteristics of logging practices in mobile apps: a case study on F-Droid. Empirical Software Engineering, pp. 1-41, 2019. • Zishuo Ding, Jinfu Chen, and Weiyi Shang. Towards the Use of the Readily Available Tests from the Release Pipeline as Performance Tests. Are We There Yet? In the 42nd International Conference on Software Engineering (ICSE). 2020. ACM SIGSOFT Distinguished Paper Award. • Lizhi Liao, Jinfu Chen, Heng Li, Yi Zeng, Weiyi Shang, Jianmei Guo, Catalin Sporea, Andrei Toma, and Sarah Sajedi. Using black-box performance models to detect performance regressions under varying workloads: an empirical study. Empirical Software Engineering, 1-31, 2020 • Lizhi Liao, Jinfu Chen, Heng Li, Yi Zeng, Weiyi Shang, Catalin Sporea, Andrei Toma, Sarah Sajedi. Locating Performance Regression Root Causes in the Field for Web-based Systems. IEEE Transactions on Software Engineering, 2020. [Under Review] vi Statement of Originality I, Jinfu Chen, hereby declare that I am the sole author of this thesis. All ideas and inventions attributed to others have been properly referenced. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. vii Dedications To my family. viii Acknowledgments First, I would like to express my greatest gratitude to my supervisor Dr. Weiyi (Ian) Shang for all his guidance and unconditional support. This thesis would have been impossible to complete without your aid and support. Ian, thank you for giving me the opportunity to tackle this challenge, for believing in me. You were there every time I hesitated to take the next step, and guided me to success. I have learned a great deal from you beyond as a researcher. I would also like to show my sincere gratitude to my examination committee members, Dr. Nikolaos Tsantalis, Dr. Emad Shihab, Dr. Jinqiu Yang, Dr. Yan Liu, and Dr. Andy Zaidman, for taking their precious time to consider my work and offer insightful comments, support, and guidance. Many thanks to my examiners for their valuable feedback. I extend my gratitude to the professors and collaborators with whom I worked the closest for my degree and research, Dr. Tse-Hsun (Peter) Chen, Dr. Nikolaos Tsantalis, Dr. Emad Shihab, Dr. Jinqiu Yang, Dr. Heng Li, Dr. Ahmed E. Hassan, Dr. Mohammed Sayagh, Dr. Bram Adams, and Dr. Jianmei Guo. It was a delight to follow your teachings and guidance. An immense thank you to all the members of the Software Engineering and System Engineering (SENSE) lab, Zishuo Ding, Kundi Yao, Guilherme Bicalho de Padua, Maxime Lamothe, Mehran Hassani, Muhammad Moiz Arif, Armin Najafi, Zhenhao Li, Joy Zeng, Hetong Dai, Sophia Quach, Lizhi Liao, Haonan Zhang, Roger Xia, Nian Liu. I would also like to thank the members of our peer labs DAS and SPEAR, Dr. Rabe Abdalkareem, Dr. Diego Elias Costa, Suhaib Mujahid, Giancarlo Sierra Monge, Sultan Wehaibi, Xiaowei Chen, Ahmad Abdellatif, Olivier Noury and Zi Peng, Steven Locke, An Ran Chen. I am glad to not only call you my colleagues but my friends. While pursuing this Ph.D. degree, I received the constant support of my closest friends outside ix academy. Thank you for always being there. Last but not least, I would like to thank my family for their unconditional support. A special thank you to my wife, Yuanhui. Despite the distance that sets us apart, I never felt alone on this journey with you by my side. You are the light that keeps me going. x Contents List of Figures xvi List of Tables xviii 1 Introduction 1 1.1 Introduction ...................................... 1 1.2 Research hypothesis .................................. 3 1.3 Thesis Overview ................................... 4 1.3.1 Chapter 2: Background and Literature Review ............... 4 1.3.2 Chapter 3: What are the prevalence and root-causes of performance regressions introducing code changes? ....................... 4 1.3.3 Chapter 4: Can we predict tests that manifest performance regressions at the commit level? ............................... 5 1.3.4 Chapter 5: Can we predict whether a configuration option manifests performance variation? ............................... 5 1.3.5 Chapter 6: Can we generate load tests using log-recovered workloads at varying granularities of user behavior? .................... 6 1.4 Thesis Contributions ................................. 6 1.5 Thesis organization .................................. 7 2 Background and Literature Review 8 2.1 Background ...................................... 8 xi 2.1.1 DevOps .................................... 8 2.1.2 Software performance ............................ 9 2.1.3 Performance regression ............................ 9 2.1.4 Performance testing ............................. 9 2.2 Literature review ..................................

Performance Regression Detection in Devops

Using BEAM Software to Simulate the Introduction of On-Demand, Automated, and Electric Shuttles for Last Mile Connectivity in Santa Clara County

A3: Assisting Android API Migrations Using Code Examples

A Dictionary of DBMS Terms

Ontology for Information Systems (O4IS) Design Methodology Conceptualizing, Designing and Representing Domain Ontologies

2 Database Design

Automation with the Four Rs of CASE

A COMPARISON of DATA MODELING TECHNIQUES David C

90 a Survey of Software Log Instrumentation

Software Engineering in Practice Track (ICSE-SEIP 2018)

IDEF1 Information Modeling

Dijkstra's Crisis

Modeling and Management Big Data in Databases—A Systematic Literature Review