Increasing Developer Productivity by Improving Build Performance and Automating Logging Code Injection

Total Page:16

File Type:pdf, Size:1020Kb

Increasing Developer Productivity by Improving Build Performance and Automating Logging Code Injection A Dissertation for the Degree of Ph.D. in Engineering Increasing Developer Productivity by Improving Build Performance and Automating Logging Code Injection February 2021 Graduate School of Science and Technology Keio University Takafumi Kubota Acknowledgment I would like to thank my advisor, Prof. Kenji Kono. His constant guidance helped me in all the time of research. I would like to express my sincere gratitude to Ph.D. Takeshi Yoshimura, Ph.D. Yusuke Suzuki, and Naohiro Aota for their in- valuable insight. This dissertation would not have been possible without their advice and encouragement. I am grateful to the members of my thesis committee as well: Prof. Shingo Takada, Prof. Motomichi Toyama, and Prof. Hideya Iwasaki. This dissertation was greatly improved by their valuable feedback. I am also thankful to my colleagues in the sslab. Their surprising enthusiasm and skills have always inspired me. I appreciate the nancial supports from the Core Research for Evolutional Science and Technology of Japan Science and Technology Agency and scholar- ships of our university. Finally, I thank my family, my parents, and sister for their support all these years. Without their support and encouragement, many accomplishments in my life including this dissertation would not have been possible. 2 Abstract As software is growing in size and complexity, it is critical to develop software eciently and reliably. For example, Continuous Integration (CI) has become a de-facto practice of the daily development in large software projects, in which builds and tests are automated, resulting in numerous developers’ modications are eciently integrated in a mainline. However, as an obstacle for improving development eciency, software en- gineers spend lots of time outside of the actual development of the software. This dissertation addresses two problems. 1) Build time: builds occur frequently dur- ing software development. As a result, the times spent on builds is a noticeable overhead. 2) Logging code insertion: the quality of log messages is critical to the eciency in failure diagnosis. However, appropriately inserting the logging code is time-consuming because it depends on developers’ expertise and engineering eort. To deal with these problems, I introduce two eective tools. For build times, I present a new build system, called Cauldron, which aims to improve the build performance for large C++ projects. Cauldron supports sophisticated unity builds and adaptive build behavior based on the number of les to be compiled. My experiments show that Cauldron outperforms existing approaches; for ex- ample, it reduces build times of WebKit by 23% in continuous builds. For logging code insertion, I introduce a new logging tool, called K9, which automatically inserts the logging code to trace inter-thread data dependencies caused by shared data among threads. In multi-threaded systems, the traceability of inter-thread data dependencies is essential in failure diagnosis because the thread actually causing the failure may be dierent from the thread executing the buggy code. In my experiments, I show that the log of K9 provides useful clues for debugging four bugs in the Linux kernel, including one unknown bug. 3 The contribution of this dissertation is summarized as follows. As software development consists of multiple tasks, it is important to consider the various processes in the development cycle. This dissertation proposes two tools to im- prove the eciency of two specic parts of software development: build time and logging code insertion. I describe in detail the design, implementation, and evaluations of the two tools. 4 Contents 1 Introduction 1 1.1 Motivation . .2 1.2 Dissertation Contributions . .3 1.2.1 Build System for Sophisticated Unity Builds . .3 1.2.2 Logging Automation Tool for Logging Inter-thread De- pendencies . .6 1.3 Organization . .9 2 Related Work 10 2.1 Improving C++ Build Performance . 10 2.1.1 Compile Caching Tools . 10 2.1.2 Compiler Approach . 11 2.1.3 Language Approach . 12 2.2 Failure Diagnosis . 13 2.2.1 Diagnosis without Reproducing the Failure . 13 2.2.2 Diagnosis with Failure Reproduction . 16 2.3 Other Related Work . 16 2.3.1 Test Case Generation & Selection . 16 2.3.2 Static Analysis for Bug Detection . 17 2.4 Summary . 17 3 Build System for Unity Builds with Sophisticated Bundle Strate- gies 18 3.1 Background . 18 3.1.1 Build systems . 18 3.1.2 Incremental builds . 20 i 3.1.3 Long build times of large C++ projects . 21 3.1.4 Unity builds . 22 3.2 Problems in Unity Builds . 24 3.3 A Case Study on Unity Builds in WebKit . 25 3.3.1 Research Questions . 25 3.3.2 Metrics . 27 3.3.3 Experimental Results . 28 3.4 Design and Implementation of Cauldron . 36 3.4.1 Design Choice: Meta-Build System vs. Native Build System 36 3.4.2 Bundle strategies in Cauldron . 39 3.4.3 Overview . 41 3.4.4 Dependency graph analysis . 42 3.4.5 Build behavior decision . 42 3.4.6 Bundling source les . 43 3.4.7 Bundle Conguration Renement . 47 3.5 Experiments . 48 3.5.1 Build Performance in Continuous Builds . 49 3.5.2 Incremental-build performance . 51 3.5.3 Full-build performance . 54 3.6 Summary . 55 4 Logging Automation for Inter-thread Data Dependencies 56 4.1 Motivation . 57 4.1.1 Inter-Thread Data Dependency . 57 4.1.2 Bug Examples in Linux . 59 4.2 Design Goals and Overview of K9 . 61 4.3 Inter-thread Data Dependency Model . 63 4.3.1 Collections and Items . 63 4.3.2 Dependencies between Collections and Items . 66 4.3.3 Log Points for Collections and Items . 66 4.4 Design and Implementation of K9 . 67 4.4.1 Collection Support Library . 67 4.4.2 Data-ow Graph of K9 . 68 4.4.3 Direct Dependency Analysis . 70 4.4.4 Indirect Dependency Analysis . 72 4.5 Experiments . 73 4.5.1 Scalability . 74 4.5.2 Precision of Log Points . 75 4.5.3 Diagnosing failures . 79 4.5.4 Performance Overheads . 84 4.6 Summary . 86 5 Conclusion 87 5.1 Contribution Summary . 87 5.2 Future Directions . 88 Bibliography 90 List of Figures 1.1 Typical development cycle . .2 1.2 Example of unity les in WebKit. .5 1.3 Traditional fault-error-failure model. .6 1.4 Logging code examples. .7 2.1 An overview of previous studies . 11 3.1 Example of the dependency graph in build systems. 20 3.2 How much time does the compiler spend parsing? . 23 3.3 Benets of unity builds . 24 3.4 A patch to avoid bundling source les in WebKit. 26 3.5 The impact of the header similarity on the unity build. 29 3.6 The impact of the front-end ratio on the unity build. 31 3.7 Heat map of header similarity among subdirectories. 34 3.8 Example of the header les dynamically generated by llvm- tblgen................................ 38 3.9 Dependency graph including the dynamically generated header le. .................................. 38 3.10 Compile time estimation . 40 3.11 Overview of the work-ow . 41 3.12 Example of bundling source les in Cauldron. 46 3.13 Compilation of B.cc will be nished before A.cc and B.cc are bundled . 47 3.14 Build performance during 101 builds of real git commits: LLVM 48 3.15 Build performance during 101 builds of real git commits: WebKit 50 3.16 CDF of incremental-build overheads caused by unity builds. 52 iv 4.1 Two types of inter-thread data dependencies. 57 4.2 Inter-thread data dependencies in write system call in Linux. 58 4.3 A bug in Btrfs in Linux kernel v3.17-rc5. An error propagates from kworker to sync through shared extent_buffer . 60 4.4 CFQ priority violation. I/O throughput is not proportional to pri- ority. A thread with priority 4 submits all I/O requests. 61 4.5 Simplied log of CFQ priority violations. 61 4.6 The work-ow of K9 . 63 4.7 Example of an array collection and item in Linux. 64 4.8 Typical structure of the graph collection and item. 64 4.9 Examples of graph collection, head, and item in Linux. 65 4.10 Simplied example of queuing a socket buer into a socket. 70 4.11 Example of data-ow graph construction of Figure 4.10. 71 4.12 Type-ow graphs showing the results of indirect dependency analysis. 77 4.13 Kernel workqueue bug: data race on cwq->nr_activate.. 81 4.14 Failure logs for Figure 4.13 . 82 4.15 An unknown bug in Btrfs: remaining writeback bit. 82 4.16 Failure logs for Figure 4.15. 83 List of Tables 3.1 The compile times of unity builds (UB) with changing bundle sizes. 35 3.2 Notable results on incremental build overheads (seconds). The results of existing unity builds are shown in parentheses. 52 3.3 Full-build performance. UB: unity build. 54 3.4 The number of unity les and bundled source les in Cauldron. The results of existing unity builds are shown in parentheses. 55 4.1 Experimental Environment . 74 4.2 Analysis results (LP: “log point”, IR: “intermediate representa- tion”). 75 4.3 Characterization of log points identied in the direct dependency analysis of the le system case. G: denotes a graph collection and A: represents an array collection. 76 4.4 Diagnosed failures that are caused by three known bugs and one unkown bug. 79 4.5 Performance overheads in macro benchmarks. 84 vi Chapter 1 Introduction Software has evolved rapidly [95, 114, 151]. Numerous contributors have collaborated on the same project via version control repository hosting ser- vices [64, 65, 15].
Recommended publications
  • Empirical Comparison of Scons and GNU Make
    Großer Beleg Empirical Comparison of SCons and GNU Make Ludwig Hähne August 21, 2008 Technical University Dresden Department of Computer Science Institute for System Architecture Chair for Operating Systems Professor: Prof. Dr. rer. nat. Hermann Härtig Tutor: Dipl.-Inf. Norman Feske Dipl.-Inf. Christian Helmuth Erklärung Hiermit erkläre ich, dass ich diese Arbeit selbstständig erstellt und keine anderen als die angegebenen Hilfsmittel benutzt habe. Dresden, den 26. Juni 2008 Ludwig Hähne Abstract Build systems are an integral part of every software developer’s tool kit. Next to the well-known Make build system, numerous alternative solutions emerged during the last decade. Even though the new systems introduced superior concepts like content signa- tures and promise to provide better build accuracy, Make is still the de facto standard. This paper examines GNU Make and SCons as representatives of two conceptually distinct approaches to conduct software builds. General build-system concepts and their respective realizations are discussed. The performance and scalability are empirically evaluated by confronting the two competitors with comparable real and synthetic build tasks. V Contents 1 Introduction 1 2 Background 3 2.1 Design Goals . .3 2.1.1 Convenience . .3 2.1.2 Correctness . .3 2.1.3 Performance . .3 2.1.4 Scalability . .4 2.2 Software Rebuilding . .4 2.2.1 Dependency analysis . .4 2.2.1.1 File signatures . .4 2.2.1.2 Fine grained dependencies . .5 2.2.1.3 Dependency declaration . .5 2.2.1.4 Dependency types . .5 2.2.2 Build infrastructure . .6 2.2.3 Command scheduling . .6 2.3 Build System Features .
    [Show full text]
  • Release 0.11 Todd Gamblin
    Spack Documentation Release 0.11 Todd Gamblin Feb 07, 2018 Basics 1 Feature Overview 3 1.1 Simple package installation.......................................3 1.2 Custom versions & configurations....................................3 1.3 Customize dependencies.........................................4 1.4 Non-destructive installs.........................................4 1.5 Packages can peacefully coexist.....................................4 1.6 Creating packages is easy........................................4 2 Getting Started 7 2.1 Prerequisites...............................................7 2.2 Installation................................................7 2.3 Compiler configuration..........................................9 2.4 Vendor-Specific Compiler Configuration................................ 13 2.5 System Packages............................................. 16 2.6 Utilities Configuration.......................................... 18 2.7 GPG Signing............................................... 20 2.8 Spack on Cray.............................................. 21 3 Basic Usage 25 3.1 Listing available packages........................................ 25 3.2 Installing and uninstalling........................................ 42 3.3 Seeing installed packages........................................ 44 3.4 Specs & dependencies.......................................... 46 3.5 Virtual dependencies........................................... 50 3.6 Extensions & Python support...................................... 53 3.7 Filesystem requirements........................................
    [Show full text]
  • Q1 Where Do You Use C++? (Select All That Apply)
    2021 Annual C++ Developer Survey "Lite" Q1 Where do you use C++? (select all that apply) Answered: 1,870 Skipped: 3 At work At school In personal time, for ho... 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ANSWER CHOICES RESPONSES At work 88.29% 1,651 At school 9.79% 183 In personal time, for hobby projects or to try new things 73.74% 1,379 Total Respondents: 1,870 1 / 35 2021 Annual C++ Developer Survey "Lite" Q2 How many years of programming experience do you have in C++ specifically? Answered: 1,869 Skipped: 4 1-2 years 3-5 years 6-10 years 10-20 years >20 years 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ANSWER CHOICES RESPONSES 1-2 years 7.60% 142 3-5 years 20.60% 385 6-10 years 20.71% 387 10-20 years 30.02% 561 >20 years 21.08% 394 TOTAL 1,869 2 / 35 2021 Annual C++ Developer Survey "Lite" Q3 How many years of programming experience do you have overall (all languages)? Answered: 1,865 Skipped: 8 1-2 years 3-5 years 6-10 years 10-20 years >20 years 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ANSWER CHOICES RESPONSES 1-2 years 1.02% 19 3-5 years 12.17% 227 6-10 years 22.68% 423 10-20 years 29.71% 554 >20 years 34.42% 642 TOTAL 1,865 3 / 35 2021 Annual C++ Developer Survey "Lite" Q4 What types of projects do you work on? (select all that apply) Answered: 1,861 Skipped: 12 Gaming (e.g., console and..
    [Show full text]
  • Empirical Comparison of Scons and GNU Make
    Großer Beleg Empirical Comparison of SCons and GNU Make Ludwig Hähne June 26, 2008 Technical University Dresden Department of Computer Science Institute for System Architecture Chair for Operating Systems Professor: Prof. Dr. rer. nat. Hermann Härtig Tutor: Dipl.-Inf. Norman Feske Dipl.-Inf. Christian Helmuth Erklärung Hiermit erkläre ich, dass ich diese Arbeit selbstständig erstellt und keine anderen als die angegebenen Hilfsmittel benutzt habe. Dresden, den 26. Juni 2008 Ludwig Hähne Abstract Build systems are an integral part of every software developer’s tool kit. Next to the well-known Make build system, numerous alternative solutions emerged during the last decade. Even though the new systems introduced superior concepts like content signa- tures and promise to provide better build accuracy, Make is still the de facto standard. This paper examines GNU Make and SCons as representatives of two conceptually distinct approaches to conduct software builds. General build-system concepts and their respective realizations are discussed. The performance and scalability are empirically evaluated by confronting the two competitors with comparable real and synthetic build tasks. V Contents 1 Introduction 1 2 Background 3 2.1 Design Goals . .3 2.1.1 Convenience . .3 2.1.2 Correctness . .3 2.1.3 Performance . .3 2.1.4 Scalability . .4 2.2 Software Rebuilding . .4 2.2.1 Dependency analysis . .4 2.2.1.1 File signatures . .4 2.2.1.2 Fine grained dependencies . .5 2.2.1.3 Dependency declaration . .5 2.2.1.4 Dependency types . .5 2.2.2 Build infrastructure . .6 2.2.3 Command scheduling . .6 2.3 Build System Features .
    [Show full text]