Delft University of Technology an Empirical Evaluation of Feedback
Total Page:16
File Type:pdf, Size:1020Kb
Delft University of Technology An Empirical Evaluation of Feedback-Driven Software Development Beller, Moritz DOI 10.4233/uuid:b2946104-2092-42bb-a1ee-3b085d110466 Publication date 2018 Document Version Final published version Citation (APA) Beller, M. (2018). An Empirical Evaluation of Feedback-Driven Software Development. https://doi.org/10.4233/uuid:b2946104-2092-42bb-a1ee-3b085d110466 Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. This work is downloaded from Delft University of Technology. For technical reasons the number of authors shown on this cover page is limited to a maximum of 10. AN EMPIRICAL EVALUATION OF FEEDBACK-DRIVEN SOFTWARE DEVELOPMENT Moritz Beller An Empirical Evaluation of Feedback-Driven Software Development An Empirical Evaluation of Feedback-Driven Software Development Proefschrift ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof. dr. ir. T.H.J.J. van der Hagen, voorzitter van het College voor Promoties, in het openbaar te verdedigen op vrijdag 23 november 2018 om 15.00 uur door Moritz Marc BELLER Master of Science in Computer Science, Technische Universität München, Duitsland, geboren te Schweinfurt, Duitsland. Dit proefschrift is goedgekeurd door de promotoren: Dr. A.E. Zaidman, Prof. dr. A. van Deursen copromotor: Dr. ir. G. Gousios Samenstelling promotiecommissie: Rector Magnificus, voorzitter Prof. dr. A. van Deursen, Technische Universiteit Delft Dr. A.E. Zaidman, Technische Universiteit Delft Dr. ir. G. Gousios, Technische Universiteit Delft Onafhankelijke leden: Prof. dr. ir. G.J.P.M. Houben, Technische Universiteit Delft Prof. dr. P. Runeson, Lund Universitet, Sweden Dr. Th. Zimmermann, Microsoft Research, United States of America Prof. dr. D. Spinellis, Athens University of Economics and Business, Greece Prof. dr. ir. E. Visser, Technische Universiteit Delft, reservelid Prof. dr. D. Spinellis has contributed to the end phase of writing Chapter 6. The work in the thesis has been carried out under the auspices of the research school IPA (Institute for Programming research and Algorithmics) and was financed by the Ned- erlandse Organisatie voor Wetenschappelijk Onderzoek (NWO), project TestRoots, grant number 016.133.324. Keywords: Feedback-Driven Development (FDD), Developer Testing, Empirical Software Engineering, Continuous Integration Printed by: ProefschriftMaken, www.proefschriftmaken.nl Cover: Cloud of ‘2,443 points’ by Zsófia Varga The author set this thesisA inLTEX using the Libertinus and Inconsolata fonts. ISBN 978-94-6380-065-5 An electronic version of this dissertation is available at http://repository.tudelft.nl/. I [...] like to give the maximum in everything I do. The maximum I have. The maximum I can give. I am not perfect. But if I do something, I do it [as best I can]. Reinhold Messner vii Contents Summary xi Samenvatting xiii Acknowledgments xv 1 Introduction 1 1.1 Background & Context . 2 1.1.1 A Model of Feedback-Driven Development. 2 1.1.2 The Case for FDD in a Collaborative Coding World. .5 1.2 Feedback-Driven Development in Practice . 6 1.3 Research Goal and Questions. 8 1.4 Research Methodology . 8 1.4.1 Research Method Categorization . 9 1.4.2 Enablement of Large-Scale Studies . 10 1.4.3 Ethical Implications . 11 1.5 Replicability, Open Science & Source . 12 1.5.1 Open Data Sets. 12 1.5.2 Open-Source Contributions . 13 1.6 Outline & Contribution . 14 1.6.1 Thesis Structure . 16 1.6.2 Other Contributions . 18 2 Analyzing the State of Static Analysis 21 2.1 Related Work. 23 2.1.1 Automatic Static Analysis Tools . 23 2.1.2 Defect Classifications . 23 2.2 Research Questions . 24 2.3 Prevalence Analysis (RQ I.1) . 25 2.3.1 Methodology . 25 2.3.2 Results . 26 2.4 General Defect Classification (GDC) . 27 2.5 Configuration & Evolution (RQ I.2, RQ I.3) . 28 2.5.1 Study Design. 29 2.5.2 Methods . 29 2.5.3 Study Objects . 31 2.5.4 Results . 32 2.6 Discussion . 36 2.6.1 Results . 36 2.6.2 Threats to Validity . 39 viii Contents 2.7 Tool Construction UAV. 40 2.7.1 Introduction . 40 2.7.2 User Story . 41 2.7.3 Related Work. 41 2.7.4 Implementation . 43 2.7.5 Evaluation . 46 2.7.6 Development Roadmap. 47 2.8 Future Work & Conclusions . 48 3 The Last Line Effect Explained 51 3.1 Study Setup . 54 3.1.1 Study Design : Spread and Prevalence of the Last Line Effect within Micro-Clones퐶1 . 54 3.1.2 Study Design : Analyzing Reasons Behind the Existence of the Last Line Effect.퐶2 . 55 3.1.3 Study Objects . 56 3.1.4 How to Replicate This Study . 56 3.2 Methods . 56 3.2.1 Inaptness of Current Clone Detectors . 56 3.2.2 How to Find Faulty Micro-Clones Instead . 57 3.2.3 Inferring the Origin of an Erroneous Micro-Clone Instance. 57 3.2.4 Putting Commit Sizes in Perspective . 59 3.3 Results . 59 3.3.1 Overview Description of Results . 59 3.3.2 In-Depth Investigation of Findings . 60 3.3.3 Statistical Evaluation . 63 3.3.4 Origin of Micro-Clones. 64 3.3.5 Developer Interviews. 66 3.3.6 Usefulness of Results . 69 3.4 Discussion . 69 3.4.1 Technical Complexity & Reasons . 70 3.4.2 Psychological Mechanisms & Reasons . 70 3.4.3 Threats to Validity . 72 3.5 Related Work. 74 3.6 Future Work & Conclusion . 75 4 Developer Testing in the IDE: Patterns, Beliefs, and Behavior 77 4.1 Study Infrastructure Design . 79 4.1.1 Field Study Infrastructure . 79 4.1.2 WatchDog Developer Survey & Testing Analytics . 81 4.1.3 IDE Instrumentation . 84 4.2 Research Methods . 88 4.2.1 Correlation Analyses (RQ III.1, RQ III.2) . 88 4.2.2 Analysis of Induced Test Failures (RQ III.3). 88 4.2.3 Sequentialization of Intervals (RQ III.3, RQ III.4) . 89 4.2.4 Test Flakiness Detection (RQ III.3) . 89 Contents ix 4.2.5 Recognition of Test-Driven Development (RQ III.4). 89 4.2.6 Statistical Evaluation (RQ III.1–RQ III.5) . 92 4.3 Study Participants . 92 4.3.1 Acquisition of Participants . 92 4.3.2 Demographics of Study Subjects . 93 4.3.3 Data Normalization . 95 4.4 Results . 95 4.4.1 RQ III.1: Which Testing Patterns Are Common In the IDE?. 95 4.4.2 RQ III.2: What Characterizes The Tests Developers Run In The IDE? . 97 4.4.3 RQ III.3: How Do Developers Manage Failing Tests? . 98 4.4.4 RQ III.4: Do Developers Follow TDD In The IDE? . 101 4.4.5 RQ III.5: How Much Do Developers Test In The IDE?. 102 4.5 Discussion . 103 4.5.1 RQ III.1: Which Testing Patterns Are Common In the IDE?. .104 4.5.2 RQ III.2: What Characterizes The Tests Developers Run? . .105 4.5.3 RQ III.3: How Do Developers Manage Failing Tests? . 106 4.5.4 RQ III.4: Do Developers Follow TDD? . 108 4.5.5 RQ III.5: How Much Do Developers Test? . 110 4.5.6 A Note On Generality And Replicability . 112 4.5.7 Toward A theory of Test-Guided Development . 112 4.6 Threats to Validity . .113 4.6.1 Limitations. 113 4.6.2 Construct Validity . 114 4.6.3 Internal Validity . 114 4.6.4 External Validity . 115 4.7 Related Work. 116 4.7.1 Related Tools and Plugins . 116 4.7.2 Related Research . 116 4.8 Conclusion . 117 5 Oops, My Tests Broke the Build: An Analysis of Travis CI 119 5.1 Background . 122 5.1.1 Related Work. 122 5.1.2 Travis CI . 122 5.2 Research Setup . 125 5.2.1 Study Design. 125 5.2.2 Tools . 125 5.2.3 Build Linearization and Mapping to Git . 127 5.2.4 Statistical Evaluation . 129 5.3 The TravisTorrent Data Set. .129 5.3.1 Descriptive Statistics . 129 5.3.2 Data-Set-as-a-Service . ..