Increasing Developer Productivity by Improving Build Performance and Automating Logging Code Injection
Total Page:16
File Type:pdf, Size:1020Kb
A Dissertation for the Degree of Ph.D. in Engineering Increasing Developer Productivity by Improving Build Performance and Automating Logging Code Injection February 2021 Graduate School of Science and Technology Keio University Takafumi Kubota Acknowledgment I would like to thank my advisor, Prof. Kenji Kono. His constant guidance helped me in all the time of research. I would like to express my sincere gratitude to Ph.D. Takeshi Yoshimura, Ph.D. Yusuke Suzuki, and Naohiro Aota for their in- valuable insight. This dissertation would not have been possible without their advice and encouragement. I am grateful to the members of my thesis committee as well: Prof. Shingo Takada, Prof. Motomichi Toyama, and Prof. Hideya Iwasaki. This dissertation was greatly improved by their valuable feedback. I am also thankful to my colleagues in the sslab. Their surprising enthusiasm and skills have always inspired me. I appreciate the nancial supports from the Core Research for Evolutional Science and Technology of Japan Science and Technology Agency and scholar- ships of our university. Finally, I thank my family, my parents, and sister for their support all these years. Without their support and encouragement, many accomplishments in my life including this dissertation would not have been possible. 2 Abstract As software is growing in size and complexity, it is critical to develop software eciently and reliably. For example, Continuous Integration (CI) has become a de-facto practice of the daily development in large software projects, in which builds and tests are automated, resulting in numerous developers’ modications are eciently integrated in a mainline. However, as an obstacle for improving development eciency, software en- gineers spend lots of time outside of the actual development of the software. This dissertation addresses two problems. 1) Build time: builds occur frequently dur- ing software development. As a result, the times spent on builds is a noticeable overhead. 2) Logging code insertion: the quality of log messages is critical to the eciency in failure diagnosis. However, appropriately inserting the logging code is time-consuming because it depends on developers’ expertise and engineering eort. To deal with these problems, I introduce two eective tools. For build times, I present a new build system, called Cauldron, which aims to improve the build performance for large C++ projects. Cauldron supports sophisticated unity builds and adaptive build behavior based on the number of les to be compiled. My experiments show that Cauldron outperforms existing approaches; for ex- ample, it reduces build times of WebKit by 23% in continuous builds. For logging code insertion, I introduce a new logging tool, called K9, which automatically inserts the logging code to trace inter-thread data dependencies caused by shared data among threads. In multi-threaded systems, the traceability of inter-thread data dependencies is essential in failure diagnosis because the thread actually causing the failure may be dierent from the thread executing the buggy code. In my experiments, I show that the log of K9 provides useful clues for debugging four bugs in the Linux kernel, including one unknown bug. 3 The contribution of this dissertation is summarized as follows. As software development consists of multiple tasks, it is important to consider the various processes in the development cycle. This dissertation proposes two tools to im- prove the eciency of two specic parts of software development: build time and logging code insertion. I describe in detail the design, implementation, and evaluations of the two tools. 4 Contents 1 Introduction 1 1.1 Motivation . .2 1.2 Dissertation Contributions . .3 1.2.1 Build System for Sophisticated Unity Builds . .3 1.2.2 Logging Automation Tool for Logging Inter-thread De- pendencies . .6 1.3 Organization . .9 2 Related Work 10 2.1 Improving C++ Build Performance . 10 2.1.1 Compile Caching Tools . 10 2.1.2 Compiler Approach . 11 2.1.3 Language Approach . 12 2.2 Failure Diagnosis . 13 2.2.1 Diagnosis without Reproducing the Failure . 13 2.2.2 Diagnosis with Failure Reproduction . 16 2.3 Other Related Work . 16 2.3.1 Test Case Generation & Selection . 16 2.3.2 Static Analysis for Bug Detection . 17 2.4 Summary . 17 3 Build System for Unity Builds with Sophisticated Bundle Strate- gies 18 3.1 Background . 18 3.1.1 Build systems . 18 3.1.2 Incremental builds . 20 i 3.1.3 Long build times of large C++ projects . 21 3.1.4 Unity builds . 22 3.2 Problems in Unity Builds . 24 3.3 A Case Study on Unity Builds in WebKit . 25 3.3.1 Research Questions . 25 3.3.2 Metrics . 27 3.3.3 Experimental Results . 28 3.4 Design and Implementation of Cauldron . 36 3.4.1 Design Choice: Meta-Build System vs. Native Build System 36 3.4.2 Bundle strategies in Cauldron . 39 3.4.3 Overview . 41 3.4.4 Dependency graph analysis . 42 3.4.5 Build behavior decision . 42 3.4.6 Bundling source les . 43 3.4.7 Bundle Conguration Renement . 47 3.5 Experiments . 48 3.5.1 Build Performance in Continuous Builds . 49 3.5.2 Incremental-build performance . 51 3.5.3 Full-build performance . 54 3.6 Summary . 55 4 Logging Automation for Inter-thread Data Dependencies 56 4.1 Motivation . 57 4.1.1 Inter-Thread Data Dependency . 57 4.1.2 Bug Examples in Linux . 59 4.2 Design Goals and Overview of K9 . 61 4.3 Inter-thread Data Dependency Model . 63 4.3.1 Collections and Items . 63 4.3.2 Dependencies between Collections and Items . 66 4.3.3 Log Points for Collections and Items . 66 4.4 Design and Implementation of K9 . 67 4.4.1 Collection Support Library . 67 4.4.2 Data-ow Graph of K9 . 68 4.4.3 Direct Dependency Analysis . 70 4.4.4 Indirect Dependency Analysis . 72 4.5 Experiments . 73 4.5.1 Scalability . 74 4.5.2 Precision of Log Points . 75 4.5.3 Diagnosing failures . 79 4.5.4 Performance Overheads . 84 4.6 Summary . 86 5 Conclusion 87 5.1 Contribution Summary . 87 5.2 Future Directions . 88 Bibliography 90 List of Figures 1.1 Typical development cycle . .2 1.2 Example of unity les in WebKit. .5 1.3 Traditional fault-error-failure model. .6 1.4 Logging code examples. .7 2.1 An overview of previous studies . 11 3.1 Example of the dependency graph in build systems. 20 3.2 How much time does the compiler spend parsing? . 23 3.3 Benets of unity builds . 24 3.4 A patch to avoid bundling source les in WebKit. 26 3.5 The impact of the header similarity on the unity build. 29 3.6 The impact of the front-end ratio on the unity build. 31 3.7 Heat map of header similarity among subdirectories. 34 3.8 Example of the header les dynamically generated by llvm- tblgen................................ 38 3.9 Dependency graph including the dynamically generated header le. .................................. 38 3.10 Compile time estimation . 40 3.11 Overview of the work-ow . 41 3.12 Example of bundling source les in Cauldron. 46 3.13 Compilation of B.cc will be nished before A.cc and B.cc are bundled . 47 3.14 Build performance during 101 builds of real git commits: LLVM 48 3.15 Build performance during 101 builds of real git commits: WebKit 50 3.16 CDF of incremental-build overheads caused by unity builds. 52 iv 4.1 Two types of inter-thread data dependencies. 57 4.2 Inter-thread data dependencies in write system call in Linux. 58 4.3 A bug in Btrfs in Linux kernel v3.17-rc5. An error propagates from kworker to sync through shared extent_buffer . 60 4.4 CFQ priority violation. I/O throughput is not proportional to pri- ority. A thread with priority 4 submits all I/O requests. 61 4.5 Simplied log of CFQ priority violations. 61 4.6 The work-ow of K9 . 63 4.7 Example of an array collection and item in Linux. 64 4.8 Typical structure of the graph collection and item. 64 4.9 Examples of graph collection, head, and item in Linux. 65 4.10 Simplied example of queuing a socket buer into a socket. 70 4.11 Example of data-ow graph construction of Figure 4.10. 71 4.12 Type-ow graphs showing the results of indirect dependency analysis. 77 4.13 Kernel workqueue bug: data race on cwq->nr_activate.. 81 4.14 Failure logs for Figure 4.13 . 82 4.15 An unknown bug in Btrfs: remaining writeback bit. 82 4.16 Failure logs for Figure 4.15. 83 List of Tables 3.1 The compile times of unity builds (UB) with changing bundle sizes. 35 3.2 Notable results on incremental build overheads (seconds). The results of existing unity builds are shown in parentheses. 52 3.3 Full-build performance. UB: unity build. 54 3.4 The number of unity les and bundled source les in Cauldron. The results of existing unity builds are shown in parentheses. 55 4.1 Experimental Environment . 74 4.2 Analysis results (LP: “log point”, IR: “intermediate representa- tion”). 75 4.3 Characterization of log points identied in the direct dependency analysis of the le system case. G: denotes a graph collection and A: represents an array collection. 76 4.4 Diagnosed failures that are caused by three known bugs and one unkown bug. 79 4.5 Performance overheads in macro benchmarks. 84 vi Chapter 1 Introduction Software has evolved rapidly [95, 114, 151]. Numerous contributors have collaborated on the same project via version control repository hosting ser- vices [64, 65, 15].