Tackling Build Failures in Continuous Integration
Total Page:16
File Type:pdf, Size:1020Kb
TACKLING BUILD FAILURES IN CONTINUOUS INTEGRATION by FOYZUL HASSAN, M.Sc. DISSERTATION Presented to the Graduate Faculty of The University of Texas at San Antonio In Partial Fulfillment Of the Requirements For the Degree of DOCTOR OF PHILOSOPHY IN COMPUTER SCIENCE COMMITTEE MEMBERS: Xiaoyin Wang, Ph.D., Chair Jianwei Niu, Ph.D. Wei Wang, Ph.D. Palden Lama, Ph.D. Lingming Zhang, Ph.D. THE UNIVERSITY OF TEXAS AT SAN ANTONIO College of Sciences Department of Computer Science May 2020 ProQuest Number:27957408 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent on the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. ProQuest 27957408 Published by ProQuest LLC (2020). Copyright of the Dissertation is held by the Author. All Rights Reserved. This work is protected against unauthorized copying under Title 17, United States Code Microform Edition © ProQuest LLC. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 Copyright 2020 Foyzul Hassan All rights reserved. DEDICATION To my parents To my wife Fariha To rest of my family ACKNOWLEDGEMENTS First of all, I would like to thank my adviser, Dr. Xiaoyin Wang. Without his guidance, sup- port, and assistance, it would not be possible for me to finish my dissertation. I very much enjoyed spending time and working with him. Dr. Wang always helped in identifying, formulating, and solving research problems, and patiently advised the publications and presentations to improve the quality of my research results. Apart from research work, his advises on professional career paths and preparation for professional carrier have been invaluable. Besides my advisor, I would like to thank the rest of my dissertation committee: Prof. Jianwei Niu, Prof. Palden Lama, Prof. Wei Wang and Prof. Lingming Zhang for their support, insightful comments, constructive feedback, and challenging questions. I am thankful to Prof. Na Meng from Virginia Tech for being a great collaborator and imparting her knowledge of fault-localization techniques. I am also grateful to my research internship mentors at Microsoft Research Nachi Nagappan, Tom Zimmermann, Chetan Bansal and Ahmed Hassan Awadallah for their guidance and fruitful discussion on identifying new research problems. I am privileged to be associated with the current software engineering group at UTSA and like to thank my fellow lab mates: Shaikh Nahid Mostafa, and Xue Qin. Without your great and enjoy- able company, it was not possible for me to pass the long journey here. My sincerest gratitude and appreciation go to my lovely wife Fariha Nusrat for constantly sup- porting me and going through hardships during my PhD years. I would also like to thank my parents, my brother and my sister for their never-ending love, care, and support. Finally, I would like to thank the financial support provided by the Department of Com- puter Science at UTSA, as well as the funding support from the National Science Foundation (Grant:1464425,1846467). iv This Masters Thesis/Recital Document or Doctoral Dissertation was produced in accordance with guidelines which permit the inclusion as part of the Masters Thesis/Recital Document or Doc- toral Dissertation the text of an original paper, or papers, submitted for publication. The Masters Thesis/Recital Document or Doctoral Dissertation must still conform to all other requirements explained in the Guide for the Preparation of a Masters Thesis/Recital Document or Doctoral Dis- sertation at The University of Texas at San Antonio. It must include a comprehensive abstract, a full introduction and literature review, and a final overall conclusion. Additional material (proce- dural and design data as well as descriptions of equipment) must be provided in sufficient detail to allow a clear and precise judgment to be made of the importance and originality of the research reported. It is acceptable for this Masters Thesis/Recital Document or Doctoral Dissertation to include as chapters authentic copies of papers already published, provided these meet type size, margin, and legibility requirements. In such cases, connecting texts, which provide logical bridges between different manuscripts, are mandatory. Where the student is not the sole author of a manuscript, the student is required to make an explicit statement in the introductory material to that manuscript describing the students contribution to the work and acknowledging the contribution of the other author(s). The signatures of the Supervising Committee which precede all other material in the Masters Thesis/Recital Document or Doctoral Dissertation attest to the accuracy of this statement. May 2020 v TACKLING BUILD FAILURES IN CONTINUOUS INTEGRATION Foyzul Hassan, Ph.D. The University of Texas at San Antonio, 2020 Supervising Professor: Xiaoyin Wang, Ph.D. In popular continuous integration(CI) practice, coding is followed by building, integration and system testing, pre-release inspection and deploying artifacts. This can reduce integration time and speed up the development process. At CI environment, dedicated infrastructure with different build systems such as Make, Ant, Maven, Gradle, Bazel, etc. are used to automate CI tasks like compilation, test invocation, etc. For continuous integration, developers describe build process though build scripts such as build.xml for Ant, pom.xml for Maven, and build.gradle for Gradle. But with the growing functionality and CI requirement, build scripts can very complex and may require frequent maintenance. Meanwhile, a large number of continuous integration failures may interrupt normal software development so that they need to be repaired as soon as possible. According to the TravisTorrent dataset of open-source software continuous integration, 22% of code commits include changes in build script files to maintain build scripts or to resolve CI failures. CI failures bring both challenges and opportunities to fault localization and program repair techniques. On the challenge side, unlike traditional fault localization scenarios (e.g., unit testing, regression testing) where only source code needs to be considered, CI failures can also be caused by build configuration errors and environment changes. Actually, the CI practice has make it a necessity for developers to automate the software build process with build scripts and configuration files. As a result, the build maintenance needs a substantial amount of developer efforts, and developer’s carelessness may lead to defects in build scripts also. On the opportunity side, the scenario of continuous integration provides rich code commit history and build logs from previous passing builds and current failing builds. Such information sources are often not available in other application scenarios. Taking advantage of these additional vi information sources, we may be able to further improve the accuracy of automatic repair and fault localization for CI failures. Automated program repair techniques have great potential to reduce the cost of resolving soft- ware failures, but the existing techniques mostly focus on repairing source code so that they cannot directly help resolving software CI failures. To address this limitation, we proposed the first ap- proach to automatic patch generation for build scripts, using fix patterns automatically generated from existing build script fixes and recommending fix patterns based on build log similarity. Apart from build script, CI failures are often a combination of test failures and build failures, and sometimes both source code and various build scripts need to be touched in the repair of one failure. To address this issue, we proposed a unified technique to localize faults in both source code and build scripts given CI failures. Adopting the information retrieval (IR) strategy, UniLoc locates buggy files by treating source code and build scripts as documents to search, and by considering build logs as search queries. However, instead of naively applying an off-the-shelf IR technique to these software artifacts, for more accurate fault localization, UniLoc (Unified Fault Localiza- tion) applies various domain-specific heuristics to optimize the search queries, search space, and ranking formulas. However, UniLoc can localize faults up to file level and limited to find faults in source code and build scripts. In the future, we are planning to expand our fault localization approach to the source code and build script block level to assist developers and automatic repair approaches better. Finally, beyond source code and build scripts, there are also other types of files to be involved during software repair, especially in other scenarios. For example, in the fault locali- zation of web applications, we need to consider Html files, css files, client-side JavaScript files and server side source code. We plan to adapt our technique to more scenarios with heterogeneous bug locations. Proposed CI fault localization technique with program repair can be a handy solution to fix a range of CI failures. vii TABLE OF CONTENTS Acknowledgements . iv Abstract . vi List of Tables . xiii List of Figures . xv Chapter 1: Introduction . 1 1.1 Motivation . .1 1.2 Thesis Statement . .4 1.3 Contributions . .4 1.4 Organization . .6 Chapter 2: Background and Related Work . 7 2.1 Analysis of Build Configuration Files . .7 2.1.1 Automatic Program Repair . .8 2.2 Automatic Fault Localization . .9 2.3 Containerized Software Development . 10 Chapter 3: Automatic Building of Java Projects in Software Repositories: A Study on Feasibility and Challenges . 12 3.1 Introduction . 12 3.2 Study Design . 15 3.2.1 Research Questions . 15 3.2.2 Study Setup . 15 3.3 Study on Successfulness of Default Build Commands (RQ1)............ 16 3.4 A Taxonomy of Root Causes of Build Failures (RQ2)................ 17 viii 3.4.1 Environment Issues . 18 3.4.2 Process Issues . 21 3.4.3 Project Issues . 22 3.5 Identifying Root Causes of Build Failures (RQ3).................