Continuous Experimentation for Software Developers

Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2019 Continuous experimentation for software developers Schermann, Gerald Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-172623 Dissertation Published Version Originally published at: Schermann, Gerald. Continuous experimentation for software developers. 2019, University of Zurich, Faculty of Economics. Department of Informatics Continuous Experimentation for Software Developers Dissertation submitted to the Faculty of Business, Economics and Informatics of the University of Zurich to obtain the degree of Doktor der Wissenschaften, Dr. sc. (corresponds to Doctor of Science, PhD) presented by Gerald Schermann from Austria approved in July 2019 at the request of Prof. Dr. Harald C. Gall Dr. Philipp Leitner Prof. Dr. Elisabetta Di Nitto The Faculty of Business, Economics and Informatics of the University of Zurich hereby authorizes the printing of this dissertation, without indicating an opinion of the views expressed in the work. Zurich, July 17, 2019 Chairman of the Doctoral Board: Prof. Dr. Thomas Fritz Acknowledgments First and foremost, I thank my advisors Philipp Leitner and Harald Gall. Not only their continuous guidance and support but also their expertise was invaluable during the course of my doctoral studies. Thank you for giving me the opportunity to pursue a PhD paired with the freedom to work on topics that interest me the most. I thank Elisabetta Di Nitto for serving on my PhD committee as an external examiner, dedicating her valuable time for evaluating my work, and the great feedback that I received. Special thanks go to Erik Wittern and Fábio Oliveira who gave me the opportunity for doing an internship at IBM Research in New York. Thanks for the great discussions and valuable feedback. I especially thank my current and former colleagues from the Software Evo- lution and Architecture Lab (SEAL), without them this adventure would have been only half the fun: Carol Alexandru, Martin Brandtner, Adelina Ciurumelea, Thomas Fritz, Christian Inzinger, André Meyer, Sebastiano Panichella, Sebas- tian Proksch, Manuela Züger, and in particular Jürgen Cito, Giovanni Grano, Katja Kevic, Christoph Laaber, Sebastian Müller, and Carmine Vassallo. Being it endless discussions during coffee breaks or lunches, playing Uno and Explod- ing Kittens, or all the other fun we had in our offices, at retreats, or on conference trips, all these little fragments made the time of my PhD memorable. Finally, I want to thank my family for their unconditional support and encouragement during my studies. Gerald Schermann Zürich, July 2019 Abstract For staying competitive in highly-contested and fast-growing markets such as the Web companies need to continuously adapt their software. Releasing incremental changes fast, while at the same time guaranteeing high quality, requires release processes that are strongly based on tools to automate software build, test, and deployment. While previous methods of releasing changes hardly involved evi- dence to support decisions (e.g., do users appreciate my new feature?), nowadays, sophisticated telemetry solutions keep track of releases and captured live production data has become the basis for data-driven decision making. High automation bundled with telemetry promotes the advent of continuous experimentation practices (e.g., canary releases, or A/B testing) that guide development activities based on data collected on a fraction of the user population on a new experimen- tal version of the software in the production environment. However, adopting continuous experimentation to move towards data-driven decision making is not a straightforward process, it involves setting up a complex experimentation infrastructure and requires methods and tools to cover the entire life cycle of experiments, from their design to the assessment of their outcome. In the context of this thesis, we address challenges surfacing within experiment life cycle phases with the goal to devise research approaches to support the thesis statement: “A detailed understanding of the characteristics of continuous experiments enables building a conceptual framework for planning, executing, and analyzing experiments”. To pay attention to the trend towards decentralized microservice teams independently running experiments, our approaches are tailored to software developers and release engineers and designed to foster the parallel execution of experiments with as little overhead as possible, to identify iv optimal plans to collect required sample sizes for sound statistical interpretation, and to provide means for experiment health assessment. Informed by the findings from an empirical study on the state of practice, we characterized experimentation practices into regression-driven experiments (e.g., canary releases) and business-driven experiments (e.g., A/B testing), and derived a conceptual framework for experimentation. This framework built the basis for three research approaches and prototypes as concrete instantiations that have been extensively validated through numerical experimentation: Fenrir targets the planning phase of experiments and scheduling in particular. We formulate scheduling as an optimization problem with the aim of fostering the parallel execution of experiments, while at the same time ensuring that enough data is collected for every experiment and the collected data is not skewed by overlapping experiments. Fenrir outperforms other approaches not only in the quality of the schedules identified but also in terms of execution time. Bifrost supports the execution phase and involves the automated, data- driven execution of multi-phased experiments (e.g., an A/B test follows a canary release). Experiments are specified in a domain-specific language and our concept of conditional chaining allows triggering automated actions such as rollbacks in case of spotted irregularities. Bifrost supports running more than a hundred experiments in parallel without introducing a significant performance degradation. Finally, we investigated approaches and devised a research prototype for experiment health assessment with the goal of raising the developer’s awareness about (topological) changes in the context of experiments. We characterized change types that surface within the evolution of microservice-based applications and developed and evaluated multiple heuristics to rank identified changes according to their potential impact on the application’s health state. Overall, we demonstrated that our framework enables planning, executing, and analyzing large-scale continuous experiments. There are multiple opportunities for future work to extend our framework and approaches including smarter experimentation platforms that dynamically decide how experimentation logic is executed, visualization extensions to IDEs (integrated developer environments), and providing means for experiment verification based on statistical models. Zusammenfassung Um Wettbewerbsfähigkeit in hart umkämpften und schnell wachsenden Märkten wie dem Web zu gewährleisten, ist es für Unternehmen essentiell, fortlaufend ihre Software anzupassen. Schnelle und kontinuierliche Releases von inkrementellen Änderungen bei gleichzeitiger Gewährleistung einer hohen Qualität, erfordern Releaseprozesse, die stark auf Werkzeuge zur Automatisierung von Software Builds, Tests und Deployments basieren. Während bisherige Releaseverfahren kaum Echtzeitinformationen zur Entscheidungsfindung herangezogen haben (bei- spielsweise, schätzen Benutzer mein neues Feature?), überwachen heutzutage ausgefeilte Telemetrielösungen Software Releases und die dabei gesammelten Echtzeitinformationen bilden die Basis für datengetriebene Entscheidungsfindung. Telemetrie gebündelt mit einem hohen Grad an Automatisierung fördert das Aufkommen von Continuous Experimentation Praktiken (z.B. Canary Relea- ses oder A/B-Tests), welche Entwicklungstätigkeiten auf der Grundlage von Informationen lenken, die für einen Bruchteil der Benutzerpopulation an einer neuen experimentellen Version der Software direkt in der Produktivumgebung gesammelt werden. Die Einführung von Continuous Experimentation mit dem Ziel der datengetriebenen Entscheidungsfindung ist jedoch kein simpler Prozess. Die Einführung erfordert nicht nur den Aufbau einer komplexen Infrastruktur, sondern auch Methoden und Werkzeuge, die den gesamten Lebenszyklus von Continuous Experiments abdecken, von der Planung bis zur Ergebnisevaluation. Im Rahmen dieser Dissertation beschäftigen wir uns mit Herausforderun- gen, welche in den verschiedenen Phasen der Lebenszyklen von Experimenten auftreten, mit dem Ziel, Forschungsansätze zu entwickeln, die die These dieser vi Dissertation unterstützen: “Ein umfassendes Verständnis der Charakteristiken von Continuous Experiments ermöglicht die Umsetzung eines konzeptuellen Fra- meworks für das Planen, Durchführen und Analysieren von Experimenten”. Um dem Trend zu dezentralisierten Microservice-Teams Rechnung zu tragen, die unabhängig voneinander Experimente durchführen, sind unsere Ansätze auf Soft- wareentwickler und Release Engineers zugeschnitten. Darüber hinaus sollen diese Forschungsansätze die parallele Durchführung von Experimenten mit möglichst geringen Leistungseinbussen unterstützen, Wege identifizieren um erforderliche Stichprobengrössen für robuste statistische Auswertung zu gewährleisten und Möglichkeiten für eine aussagekräftige

Continuous Experimentation for Software Developers

A Multi-Method Empirical Study on Continuous Experimentation

Branching Strategies for Developing New Features Within the Context of Continuous Delivery

Open Source Version Control Software

Development and Evolution of Agile Changes in a World of Change

ALM Rangers Home Page – Branching Strategies – Foreword

Investigating Modern Release Engineering Practices

Feature Toggles: Practitioner Practices and a Case Study

Synthesizing Continuous Deployment Practices Used in Software Development