Mining Development Knowledge to Understand and Sup- Port Software Logging Practices

MINING DEVELOPMENT KNOWLEDGE TO UNDERSTAND AND SUPPORT SOFTWARE LOGGING PRACTICES by HENG LI A thesis submitted to the Graduate Program in Computing in conformity with the requirements for the Degree of Doctor of Philosophy Queen’s University Kingston, Ontario, Canada December 2018 Copyright © Heng Li, 2018 Abstract EVELOPERS insert logging statements in their source code to trace the runtime behaviors of software systems. Logging statements print runtime log D messages, which play a critical role in monitoring system status, diagnos- ing field failures, and bookkeeping user activities. However, developers typically insert logging statements in an ad hoc manner, which usually results in fragile logging code, i.e., insufficient logging in some code snippets and excessive logging in other code snippets. Insufficient logging can significantly increase the difficulty of diagnos- ing field failures, while excessive logging can cause performance overhead and hide truly important information. The goal of this thesis is to help developers improve their logging practices and the quality of their logging code. We believe that development knowledge (i.e., source code, code change history, and issue reports) contains valuable information that explains developers’ rationale of logging, which can help us understand existing logging practices and provide helpful tooling support for such logging i practices. Therefore, this thesis proposes to mine different aspects of development knowledge to understand and support software logging practices. We mine issue reports to understand developers’ logging concerns, i.e., the benefits and costs of logging from developers’ perspective. Our findings shed lights on future research opportunities for helping developers leverage the benefits of logging while minimizing logging costs. We mine source code to learn how developers distribute logging statements in their source code, and propose an approach to provide automated suggestions about where to log. We find that the semantic topics of a code snippet provide another dimension to ex- plain the likelihood of logging a code snippet. We mine code change history to understand how developers develop and maintain their logging code, and propose an automated approach that can provide developers with log change suggestions when they change their code. We also mine code change history to understand how developers choose log levels for their logging statements, and propose an automated approach that can help developers determine the appropriate log level when they add a new logging statement. This thesis highlights the need for standard logging guidelines and automated tooling support for logging. ii Acknowledgments I would like to thank my supervisor Dr. Ahmed E. Hassan for all his guidance and support throughout these four years. His academic advising has fostered my enthusiasm in scientific research and guided me to grow as a researcher. In addition, his valuable support for my personal life has helped me go through many difficulties in my life. I feel very grateful and lucky that I had the chance to work under his supervision. I would like to express my sincere gratitude to Dr. Ahmed E. Hassan, my advisor and my friend. A sincere appreciation to my supervisory and examination committee members, Dr. Hossam S. Hassanein, Dr. Mohammad Zulkernine, and Dr. Joshua Dunfield for their continued critique, support and guidance. Many thanks to my examiners for their insightful feedback and valuable advice on my work. I am very lucky to work with some of the brightest researchers during my Ph.D. research. I would like to thank all of my colleagues and collaborators, including Dr. Weiyi Shang, Dr. Tse-Hsun (Peter) Chen, Dr. Ying (Jenny) Zou, Dr. Bram Adams, Safwat iii Hassan, Filipe Cogo, Dr. Shane McIntosh, Dr. Cor-Paul Bezemer, Dr. Shaowei Wang, Dr. Mohamed Sami Rakha, Dayi Lin, Dr. Gustavo Oliva, Dr. Muhammad Asaduzzam, and Suhas Kabinna for all the fruitful discussions and collaborations. I would like to thank the BlackBerry company and the members of the BlackBerry Performance Engineering team. In particular, I would like to thank Mohamed Nasser and Parminder Flora for their continuous support during my internship with Black- Berry. The industrial environment has encouraged and will continue to encourage my research that deals with practical challenges in the field. During these years I have been receiving so much love from many people, including Dehui Kong, Xubo Miao, Chuansheng Dong, Jiaxing Len, Zhen Zeng, Min Xie, Zhiying Zhang, Panpan Zhang, Safwat Hassan, Stephen Hung, Hannah Zhang, Nancy Smith, and Shaowei Wang, and their families. Thank you all for making my Ph.D. life enjoyable and wonderful. A special thanks to my family. My gratitude for all your support and sacrifices is beyond any words. I am so proud of being one of your family member. I dedicate this thesis to my family. iv Co-authorship The work presented in the thesis is published as listed below: • Heng Li, Weiyi Shang, Ying Zou, and Ahmed E. Hassan. 2017. Towards just-in- time suggestions for log changes. Empirical Software Engineering: An Interna- tional Journal (EMSE). Volume 22, Issue 4, pages 1831–1865. This work is described in Chapter 5. • Heng Li, Weiyi Shang, and Ahmed E. Hassan. 2017. Which log level should developers choose for a new logging statement? Empirical Software Engineering: An International Journal (EMSE). Volume 22, Issue 4, pages 1684–1716. This work is described in Chapter 6. • Heng Li, Tse-Hsun (Peter) Chen, Weiyi Shang, and Ahmed E. Hassan. 2018. Studying software logging using topic models. Empirical Software Engineering: An International Journal (EMSE). In Press. This work is described in Chapter 4. v • Heng Li, Weiyi Shang, Bram Adams, and Ahmed E. Hassan. A qualitative study of developers’ logging concerns. IEEE Transactions on Software Engineering (TSE). Under review. This work is described in Chapter 3. For each of the work, my contributions include proposing the initial research idea, investigating background knowledge and related work, proposing research methods, conducting experiments and collecting the data, analyzing the data and verifying the hypotheses, and writing and polishing the manuscript. My co-contributors supported me in refining the initial ideas, providing suggestions for potential research methods, providing feedback on experimental results and earlier manuscript drafts, and provide advice for polishing the writing. vi Table of Contents Abstract i Acknowledgments iii Co-authorship v List of Tables ix List of Figures xii 1 Introduction 1 1.1 Research Hypothesis ...................................... 3 1.2 Thesis Overview .......................................... 4 1.3 Thesis Contributions ...................................... 8 2 Literature Review 10 2.1 Literature selection ....................................... 11 2.2 Mining logging code ....................................... 13 2.3 Mining log messages ...................................... 15 2.4 Automatic log insertion .................................... 18 2.5 Learning to Log .......................................... 19 3 Understanding Developers’ Logging Concerns 21 3.1 Introduction ............................................ 22 3.2 Case Study Setup ......................................... 24 3.3 Case Study Results ........................................ 30 3.4 Threats to Validity ........................................ 54 3.5 Chapter Summary ........................................ 55 4 Understanding Software Logging Using Topic Models 56 4.1 Introduction ............................................ 57 4.2 Motivation Examples ...................................... 60 vii 4.3 Topic Modeling .......................................... 61 4.4 Case Study Setup ......................................... 64 4.5 Case Study Results ........................................ 69 4.6 Threats to Validity ........................................105 4.7 Related Work ............................................108 4.8 Chapter Summary ........................................109 5 Automated Suggestions for Log Changes 111 5.1 Introduction ............................................112 5.2 Case Study Setup .........................................117 5.3 Case Study Results ........................................121 5.4 Discussion ..............................................151 5.5 Threats to Validity ........................................153 5.6 Chapter Summary ........................................157 6 Automated Suggestions for Choosing Log Levels 159 6.1 Introduction ............................................160 6.2 Case Study Setup .........................................164 6.3 Preliminary Study ........................................166 6.4 Case Study Results ........................................174 6.5 Discussion ..............................................196 6.6 Threats to Validity ........................................199 6.7 Chapter Summary ........................................201 7 Automated Suggestions for Logging Stack Traces 203 7.1 Introduction ............................................204 7.2 Methodology ............................................207 7.3 Experimental Results ......................................213 7.4 Chapter Summary ........................................217 8 Conclusions and Future Work 219 8.1 Thesis Contributions ......................................220 8.2 Limitations .............................................222 8.3 Future Research

Mining Development Knowledge to Understand and Sup- Port Software Logging Practices

Tracking Known Security Vulnerabilities in Third-Party Components

Vimal Daga Chief Technical Officer (CTO) – Linuxworld Informatics Pvt Ltd Professional Experience & Certifications

Return of Organization Exempt from Income

PROJECT REPORT IT2901 - Informatics Project II

BUILDING RESILIENT MICROSERVICES with APACHE QPID PROTON

Full-Graph-Limited-Mvn-Deps.Pdf

Information Extraction from Semi-Structured Documents Msci

A Survey of Distributed Message Broker Queues

How We Use Open Source at Salesforce.Com – Part 1

Message-Oriented Middleware As a Queue Management Solution to Improve Job Handling Within an E- Commerce System

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"

Universal Messaging Concepts