Federal University of Rio Grande do Norte Center of Exact and Earth Sciences Department of Informatics and Applied Mathematics Graduate Program in Systems and Computing Academic Master’s Degree in Systems and Computing

The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: An Empirical Study

João Helis Junior de Azevedo Bernardo

Natal, Brazil July, 2017 João Helis Junior de Azevedo Bernardo

The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: An Empirical Study

A dissertation submitted to the Computer Science Graduation Program of the Cen- ter of Exact and Earth Sciences in confor- mity with the requirements for the Degree of Master in Systems and Computing.

PPgSC - Graduate Program in Systems and Computing DIMAp - Department of Informatics and Applied Mathematics UFRN - Federal University of Rio Grande do Norte

Advisor: Uirá Kulesza Co-Advisor: Daniel Alencar da Costa

Natal, Brazil July, 2017

Catalogação da Publicação na Fonte. UFRN / SISBI / Biblioteca Setorial Especializada do Centro de Ciências Exatas e da Terra – CCET.

Bernardo, João Helis Junior de Azevedo. The impact of adopting continuous integration on the delivery time of merged pull requests: an empirical study / João Helis Junior de Azevedo Bernardo. – Natal, RN, 2017. 96 f.: il.

Orientador: Prof. Dr. Uirá Kulesza. Coorientador: Prof. Dr. Daniel Alencar da Costa.

Dissertação (mestrado) – Universidade Federal do Rio Grande do Norte. Centro de Ciências Exatas e da Terra. Departamento de Informática e Matemática Aplicada. Programa de Pós-Graduação em Sistemas e Computação.

1. Engenharia de software – Dissertação. 2. Integração contínua – Dissertação. 3. Desenvolvimento baseado em pull requests – Dissertação. 4. Pull request – Dissertação. 5. Tempo de entrega – Dissertação. 6. Atraso de entrega – Dissertação. 7. Mineração de repositórios de software – Dissertação. I. Kulesza, Uirá. II. Costa, Daniel Alencar da. III. Título.

RN/UF/BSE-CCET CDU 004.41

Acknowledgements

First and foremost, I would like to thank God, the Almighty, for giving me the strength and support in all this quest for knowledge, especially for showing me the way forward in the most difficult moments of my life. Without His blessings, I certainly would not have got here. My deep gratitude to my parents, João Helis Bernardo and Rosilda de Azevedo Bernardo, and to my sister Juliana Raffaely de Azevedo Bernardo, without their love, dedication and support in all single part of my life, I would not be who I am. Thanks for teaching me that I can never give up on my dreams. I would like to express my deepest gratitude and special thanks to my girlfriend Milenna Veríssimo, for her love, support and constant patience. Thanks for always encourage me to be a better man. I love you. I would like to express my extreme sincere gratitude to my advisor Uirá Kulesza, who gave me the opportunity to work with him, and expertly guided me on the path that I walked during my master’s degree. I would also like to thank my co-advisor and friend Daniel Alencar da Costa, for mentoring me and provide me all support that I needed to conduct the studies that we performed in this dissertation. Without his precious guidance, I could not be able to achieve the state of this work. I would like to extend my appreciation to my laboratory colleagues, Leo Moreira, Fabio Penha, and Eduardo Nascimento who helped to make lighter the pressures that we were facing together on the final stages of our master’s degree, by providing moments of sharing knowledge and fun through the so-called "coffee time". Ultimately, I am very grateful to CNPq for the financial support. Society must learn that we Indians can and should use technology and information in our everyday activities. That doesn’t make us any less Indians. Being Indian is in the blood that flows through our veins, not in clothing and utensils that we use or any external characteristic. Abstract

Continuous Integration (CI) is a software development practice that leads developers to integrate their work more frequently. Software projects have broadly adopted CI to ship new releases more frequently and to improve code integration. The adoption of CI is usually motivated by the allure of delivering new software content more quickly and frequently. However, there is little empirical evidence to support such claims. Over the last years, many available software projects from social coding environments such as GitHub have adopted the CI practice using CI facilities that are integrated in these environments (e.g., Travis-CI). In this dissertation, we empirically investigate the impact of adopting CI on the time-to-delivery of pull requests (PRs), through the analysis of 167,037 PRs of 90 GitHub projects that are implemented in 5 different programming languages. On analyzing the percentage of merged PRs per project that missed at least one release prior being delivered to the end users, the results show that before adopting CI, a median of 13.8% of merged PRs are postponed by at least one release, while after adopting CI, a median of 24% of merged PRs have their delivery postponed to future releases. Contrary to what one might speculate, we find that PRs tend to wait longer to be delivered after the adoption of CI in the majority (53%) of the studied projects. The large increase of PR submissions after CI is a key reason as to why these projects deliver PRs more slowly after adopting CI. 77.8% of the projects increase the rate of PR submissions after adopting CI. To investigate the factors that are related to the time-to-delivery of merged PRs, we train linear and logistic regression models, which obtain sound median R-squares of 0.72-0.74, and good median AUC values of 0.85-0.90. A deeper analysis of our models suggests that, before and after the adoption of CI, the intensity of code contributions to a release may increase the delivery time due to a higher integration-load (in terms of integrated commits) of the development team. Finally, we are able to accurately identify merged pull requests that have a prolonged delivery time. Our regression models obtained median AUC values of 0.92 to 0.97.

Keywords: Continuous Integration; Pull-based Development; Pull Request; Delivery Time; Delivery Delay; Mining Software Repositories. Resumo

A Integração Contínua (IC) é uma prática de desenvolvimento de software que leva os desenvolvedores a integrarem seu código-fonte mais frequentemente. Projetos de software têm adotado amplamente a IC com o intuito de melhorar a integração de código e lançar novas releases mais rapidamente para os seus usuários. A adoção da IC é usualmente motivada pela atração de entregar novas funcionalidades do software de forma mais rápida e frequente. Todavia, há poucas evidências empíricas para justificar tais alegações. Ao longo dos últimos anos, muitos projetos de software disponíveis em ambientes de codificação social, como o GitHub, tem adotado a prática da IC usando serviços que podem ser facilmente integrados nesses ambientes (por exemplo, Travis-CI). Esta dissertação investiga empiricamente o impacto da adoção da IC no tempo de entrega de pull requests (PRs), através da análise de 167.037 PRs de 90 projetos do GitHub que são implementados em 5 linguagens de programação diferentes. Ao analisar a porcentagem de merged PRs por projeto que perderam pelo menos uma release antes de serem entregues aos usuários finais, os resultados mostraram que antes da adoção da IC, em mediana 13.8% dos merged PRs tem sua entrega adiada por pelo menos um release, enquanto que após a adoção da IC, em mediana 24% dos merged PRs tem sua entrega adiada para futuras releases. Ao contrário do que se pode especular, observou-se que PRs tendem a esperar mais tempo para serem entregues após a adoção da IC na maioria (53%) dos projetos investigados. O grande aumento das submissões de PRs após a IC é uma razão fundamental para que projetos demorem mais tempo para entregar PRs depois da adoção da IC. 77,8% dos projetos aumentam a taxa de submissões de PRs após a adoção da IC. Com o propósito de investigar os fatores relacionados ao tempo de entrega de merged PRs, treinou-se modelos de regressão linear e logística, os quais obtiveram R-Quadrado mediano de 0.72-0.74 e bons valores medianos de AUC de 0.85-0.90. Análises mais profundas de nossos modelos sugerem que, antes e depois da adoção da IC, a intensidade das contribuições de código para uma release pode aumentar o tempo de entrega de PRs devido a uma maior carga de integração (em termos de commits integrados) da equipe de desenvolvimento. Finalmente, apresentamos heurísticas capazes de identificar com precisão os PRs que possuem um tempo de entrega prolongado. Nossos modelos de regressão obtiveram valores de AUC mediano de 0.92 a 0.97.

Palavras-chave: Integração Contínua; Desenvolvimento Baseado em Pull Requests; Pull Request; Tempo de Entrega; Atraso de Entrega; Mineração de Repositórios de Software. List of Figures

Figure 1 – An overview of the scope of the dissertation...... 17 Figure 2 – An overview of the pull-based development model that is integrated with Continuous Integration...... 22 Figure 3 – An illustrative example of how we compute delivery time in terms of days...... 25 Figure 4 – An illustrative example of how we compute delivery time in terms of releases...... 26 Figure 5 – The basic life-cycle of a released pull request...... 29 Figure 6 – Training Linear and Logistic Regression Models...... 33 Figure 7 – Percentage of merged pull requests that have a long delivery time. . 38 Figure 8 – An overview of our project selection process...... 41 Figure 9 – Number of projects grouped by programming language...... 43 Figure 10 – An overview of our data collection process...... 44 Figure 11 – Distribution of pull requests per bucket before and after continuous integration...... 50 Figure 12 – Number of days between the studied releases of the projects, before and after continuous integration...... 51 Figure 13 – Merge timing metric. We present the distribution of the merge timing metric for merged pull requests that are prevented from integration in at least one release...... 52 Figure 14 – The required number of days to merge and deliver pull requests (pull request lifetime)...... 53 Figure 15 – Pull request submission, merge, and delivery rates per release. . . . 55 Figure 16 – Number of pull request submissions (per release) before and after the adoption of continuous integration...... 56 Figure 17 – Distribution of the Brier Score and the Brier optimism of the models before and after CI...... 58 Figure 18 – Distribution of the AUC and the AUC optimism of the models before and after CI...... 59 Figure 19 – Distributions of models’ R2 and R2 optimism...... 59 Figure 20 – Explanatory power of variables before adopting continuous integra- tion (Delivery Time in terms of releases)...... 62 Figure 21 – Explanatory power of variables after adopting continuous integra- tion (Delivery Time in terms of releases)...... 63 Figure 22 – The relationship between the most influential variables and delivery time in terms of releases...... 64 Figure 23 – The number of models per most influential variables (Delivery Time in terms of releases)...... 65 Figure 24 – Explanatory power of variables before adopting continuous integra- tion (Delivery Time in terms of days)...... 66 Figure 25 – Explanatory power of variables after adopting continuous integra- tion (Delivery Time in terms of days)...... 66 Figure 26 – The number of models per most influential variables (Delivery Time in terms of days)...... 68 Figure 27 – The relationship between the most influential variables and delivery time in terms of days...... 69 Figure 28 – Distribution of the Brier Score and the Brier optimism of the models before and after CI...... 70 Figure 29 – Distribution of the AUC and the AUC optimism of the models before and after CI...... 72 Figure 30 – Explanatory power of variables before adopting continuous integra- tion (Prolonged delivery time analysis)...... 74 Figure 31 – Explanatory power of variables after adopting continuous integra- tion (Prolonged delivery time analysis)...... 74 Figure 32 – The number of models per most influential variables (Prolonged delivery time analysis)...... 75 List of Tables

Table 1 – Long delivery time thresholds (PART I) ...... 39 Table 2 – Long delivery time thresholds (PART II)...... 40 Table 3 – Summary of the number of projects and released pull requests grouped by programming language...... 43 Table 4 – Metrics that are used in our explanatory models (Contributor, Pull Request and Project families)...... 47 Table 5 – Metrics that are used in our explanatory models (Process family). . . 48 Table 6 – Brier Score and AUC values for the models that we fitted using pull requests data of before continuous integration...... 57 Table 7 – Brier Score and AUC values for the models that we fitted using pull requests data of after continuous integration...... 57 Table 8 – R2 and R2 optimism values for the linear models that we fitted using pull requests data of before continuous integration...... 60 Table 9 – R2 and R2 optimism values for the linear models that we fitted using pull requests data of after continuous integration...... 61 Table 10 – Descriptive metrics for the percentage of the explanatory power of each variable of our models, before and after the adoption of contin- uous integration (Delivery Time in terms of releases)...... 62 Table 11 – Descriptive metrics for the percentage of the explanatory power of each variable of our models, before and after the adoption of contin- uous integration (Delivery Time in terms of days)...... 67 Table 12 – Brier Score and AUC values for the models that we fitted using pull requests data of before the adoption of continuous integration. . . . 70 Table 13 – Brier Score and AUC values for the models that we fitted using pull requests data of after the adoption of continuous integration. . . . . 71 Table 14 – Descriptive metrics for the percentage of the explanatory power of each variable of our models, before and after the adoption of contin- uous integration (Prolonged delivery time analysis)...... 73 List of abbreviations and acronyms

OSS Open Source Software

OSD Open Source Definition

CI Continuous Integration

ITS Issue Tracker System

PR Pull Request

XP Extreme Programming

ARE Agile Release Engineering

DVCS Distributed Version Control Systems

DF Degrees of Freedom Contents

1 INTRODUCTION...... 14 1.1 Problem Statement ...... 15 1.2 Current Research Limitations ...... 16 1.3 Dissertation Proposal ...... 17 1.3.1 Chronology of Analyses ...... 18 1.4 Dissertation Contributions ...... 19 1.5 Dissertation Organization ...... 20

2 BACKGROUND & DEFINITIONS...... 21 2.1 The pull-based development model ...... 21 2.2 Continuous Integration ...... 23 2.3 Delivery Time ...... 24 2.4 Chapter Summary ...... 26

3 EMPIRICAL STUDY...... 27 3.1 Research Questions ...... 27 3.1.1 RQ1: How often are merged pull requests prevented from being released? ...... 27 3.1.2 RQ2 - Are pull requests released more quickly using continuous integration? ...... 28 3.1.3 RQ3 - Does the increased development activity after adopting continuous integration increase the delivery time of pull requests? 30 3.1.4 RQ4: How well can we model the delivery time of merged pull re- quests? ...... 31 3.1.5 RQ5: What are the most influential attributes for modeling deliv- ery time? ...... 35 3.1.6 RQ6: How well can we identify the merged pull requests that will suffer from a long delivery time? ...... 37 3.1.7 RQ7: What are the most influential attributes for identifying the merged pull requests that will suffer from a long delivery time? . 41 3.2 Studied Projects ...... 41 3.3 Data Collection ...... 44 3.4 Chapter Summary ...... 46

4 STUDY RESULTS...... 49 4.1 Analysis I — What is the impact of continuous integration on the delivery time of pull requests? ...... 49 4.2 Analysis II — What is the impact of continuous integration on the prolonged delivery time? ...... 69 4.3 Threats to the Validity ...... 75

5 CONCLUSION...... 78 5.1 Dissertation Contributions ...... 78 5.2 Related Work ...... 80 5.3 Future Work ...... 82

BIBLIOGRAPHY ...... 84

APPENDIX 90

APPENDIX A – STUDIED PROJECTS...... 91

APPENDIX B – R2 AND R2 OPTIMISM FOR THE LINEAR MODELS 93

APPENDIX C – PERCENTAGE OF DELIVERED PULL REQUESTS PER PROJECT IN THE NEXT AND LATER RELEASE BUCKETS ...... 95 14

1

Introduction

The increasingly user demands for new functionalities and performance im- provements rapidly changes customer requirements and turn software development into a competitive market (WNUK; GORSCHEK; ZAHDA, 2013). In this scenario, current software development teams need to deliver new functionalities more quickly to their customers to improve the time-to-market (DEBBICHE; DIENÉR; SVENSSON, 2014; LAUKKANEN; PAASIVAARA; ARVONEN, 2015). This faster delivery may lead customers to become engaged in the project and to give valuable feedback. The failure of provid- ing new functionalities and bug-fixes, on the other hand, may reduce the number of users and the project’s success. Over the last years, the agile methodologies, such as Scrum (SCHWABER, 1997) and Extreme Programming (XP) (BECK, 2000), brought a series of practices with the allure of providing a more flexible software development and a faster delivery of new software releases. The frequency of releases is one of the factors that may lead a soft- ware project to success (CHEN; REILLY; LYNN, 2005; WOHLIN; XIE; AHLGREN, 1995). The releasing frequency may also indicate the vitality level of a software project (CROW- STON; ANNABI; HOWISON, 2003). In order to improve the process of shipping new releases, i.e., in terms of soft- ware integration and packaging, Continuous Integration (CI) appears as an important practice that may quicken the delivery of new functionalities (LAUKKANEN; PAASI- VAARA; ARVONEN, 2015). In addition, continuous integration may reduce problems of code integration in a collaborative environment (VASILESCU et al., 2014). The continuous integration practice has been widely adopted by the software community in open source and industrial settings (DUVALL; MATYAS; GLOVER, 2007). It is especially important for open source projects given their lack of requirement documents and geographically distributed teams (VASILESCU et al., 2014). 70% of the most popular GitHub projects use continuous integration and the percentage of projects that use continuous integration is growing (HILTON et al., 2016). GitHub is considered the most popular version hosting worldwide (GOUSIOS; SPINELLIS, 2012), with more than 14 million of registered users, a wide variety of Chapter 1. Introduction 15 projects of different programming languages, sizes and characteristics. Any user can send contributions to any public repository that is hosted on GitHub by sending a pull request (VASILESCU et al., 2015). A pull request is a change proposal that is to be applied in the project code-base. Pull requests may fix bugs, provide enhancements or new functionalities. In some cases, a pull request is linked to a change request (a.k.a, an issue report) that is registered in the Issue Tracking System (ITS). Pull requests are reviewed by core developers or project integrators and are accepted when the changes are useful and meet the project pre-set quality standards. The basic life-cycle of a pull request is comprised of four steps. First, a pull request is submitted to a software project by a contributor. Once submitted, the contin- uous integration service automatically builds the whole project and runs the test suite to verify whether the pull request breaks the codebase. In case that all tests pass during the continuous integration process, the integrators thoroughly review the pull request and decide to merge or reject the pull request. A merged pull request means that such a pull request is integrated into the project codebase, i.e., a solution is provided, tested, and it is ready to be delivered to the end users through an official software release. Finally, the merged pull request is delivered to the end users through a software release.

1.1 Problem Statement

Once a pull request is merged (i.e., ready to be delivered to the end users of a software system through an official software release), such a pull request may still have delay before being released. In this dissertation, we use the term delivery time to refer to the delay that merged pull requests suffer prior to their delivery to end users. This delay can be frustrating to end users because these users care most about when a new functionality is delivered, so they can benefit from it (COSTA et al., 2016). Furthermore, a higher delay to deliver pull requests may lead software projects to lose their users, given the increasingly competition between software organizations (BASKERVILLE; PRIES- HEJE, 2004). This competition has forced organizations to release new functionalities at a faster pace, i.e., projects such as and Unity3D shifted from a traditional release cycle (12–18 months) to a rapid release cycle (1–3 months) to meet the pressure of the market (SOUZA; CHAVEZ; BITTENCOURT, 2014). A long delivery time can also frustrate contributors of open source projects, once one of their motivations to contribute is to see their proposed contributions available to the end users in a timely manner (JIANG; ADAMS; GERMAN, 2013). An important reason as to why developers contribute to an open source project is that such developers (so- called contributors) are always users of the produced contributions, hence they do not want to wait for long to benefit from those contributions. Furthermore, researchers on Chapter 1. Introduction 16 social-psychological feedback effect reveal that people get more involved in one task if they get feedbacks, also, the feedback loop of attention is important to motivating contributors to persist (LIU; LI; HE, 2016). Contributors who stop receiving attention (e.g., often have its pull requests merged and released late) tend to stop contributing (WU; WILKINSON; HUBERMAN, 2009). On the other hand, attracting and retain the interest of talented developers is crucial to open source projects achieve sustained success (LONG, 2006). In this matter, the present dissertation has the goal of reducing the lack of empirical understanding of the impact of adopting continuous integration on the delivery time of merged pull requests. A deep understanding of such delays can help software projects to diminish such undesired delays. Also, this understanding may help project managers to be aware of which factors most impact the delay to deliver merged pull requests to end users, and hence, they may handle it properly.

1.2 Current Research Limitations

Prior work have analyzed the usage of continuous integration in open source projects that are hosted in GitHub (HILTON et al., 2016; BELLER; GOUSIOS; ZAIDMAN, 2016; YU et al., 2016; VASILESCU et al., 2014; VASILESCU et al., 2015). For instance, Vasilescu et al. (VASILESCU et al., 2015) investigated the productivity and quality out- comes of projects that use continuous integration in GitHub. They found that projects that use continuous integration merge pull requests more quickly when they are sub- mitted by core developers. Also, core developers discover a significantly larger amount of bugs when they use continuous integration. YU et al. (YU et al., 2016) show that the more succinct a pull request is, the greater the probability that such a pull request is reviewed and merged earlier. Finally, Ståhl and Bosch (STÅHL; BOSCH, 2014b) stated that continuous integration may also improve the release frequency, which hints that software functionalities may be delivered more quickly for users. Recent research work has studied the delivery time of new features, enhance- ments, and bug fixes (COSTA et al., 2014; COSTA et al., 2016; CHOETKIERTIKUL et al., 2015; CHOETKIERTIKUL et al., 2017). For instance, Costa et al. (COSTA et al., 2014) mined data from the VCSs and ITSs of the Firefox, ArgoUML and projects to investigate how frequent is delivery time of fixed issues in such projects. In a follow up research, Costa et al. (COSTA et al., 2016) investigated the impact of switching from traditional releases to rapid releases on the delivery time of fixed issues of the Firefox project. They used predictive models to discover which factors significantly impact the delivery time of fixed issues in each release strategy. However, to the best of our knowledge, no prior work has investigated the impact of adopting continuous integra- Chapter 1. Introduction 17 tion on the delivery time of merged pull requests. Hence, understanding the impact of the adoption of continuous integration on the different delivery time dimensions (see Definitions 1 and2) that were already proposed in the literature remain as an open challenge.

1.3 Dissertation Proposal

The general research question that is investigated in this dissertation is what is the impact of the adoption of continuous integration on the delivery time of merged pull requests?

This dissertation proposes empirically analyze the impact of the adoption of continuous integration on the delivery time of merged pull request in two perspectives. First, we investigate the impact of adopting continuous integration on the delivery time of pull requests in terms of days and releases (Definitions 1 and2). Finally, we analyze the impact of the adoption continuous integration on the prolonged delivery time (Definition 3). Figure1 shows an overview of the scope of the analyses that we perform in this dissertation.

Figure 1 – An overview of the scope of the dissertation.

Based on our general research question, seven research questions were pro- posed in order to guide this work. RQ1— RQ5 perform analyses that investigate the impact of adopting continuous integration on the delivery time of pull requests, while RQ6 and RQ7 analyze the impact of continuous integration on the prolonged delivery time. To address the research questions of this study, we analyzed data of 90 GitHub projects that are implemented in 5 different programming languages (See Appendix A). We investigate a total of 167,037 pull requests with 40,321 pull requests before and 126,716 pull requests after the adoption of continuous integration. For each group of analysis, we present their respective research questions in the following. Furthermore, Chapter 1. Introduction 18 for each RQ we provide a detailed description of its motivation and research approach in Section 3.1.

Analysis I — What is the impact of continuous integration on the delivery time of pull requests?

RQ1 How often are merged pull requests prevented from being released?

RQ2 Are pull requests released more quickly using continuous integration?

RQ3 Does the increased development activity after adopting continuous integration increase the delivery time of merged pull requests?

RQ4 How well can we model the delivery time of merged pull requests?

RQ5 What are the most influential attributes for modeling delivery time?

Analysis II — What is the impact of continuous integration on the prolonged de- livery time?

RQ6 How well can we identify the merged pull requests that will suffer from a long delivery time?

RQ7 What are the most influential attributes for identifying the merged pull requests that will suffer from a long delivery time?

1.3.1 Chronology of Analyses

The arrow in Figure1 shows which analysis inspired the other. We based our first analysis on the study performed by Costa et al. (COSTA et al., 2017), which shows that despite issues being addressed well before an upcoming release, 34% to 98% of such addressed issues are delayed by at least one release in the ArgoUML, Eclipse and Firefox projects. Based on their results, in our first analysis we intend to study the impact of adopting continuous integration on the delivery time of merged pull requests of open source projects. We find that despite most pull requests being merged well before the release date, 13.8% (median) of them miss at least one release before continuous integration, while 24% miss at least one release after continuous integration (see RQ1). Also, we observe that the time from submission to release of a pull request (i.e., pull request lifetime) is shorter before the adoption of continuous integration in 53% of the studied projects (see RQ2). After conducting Analysis I, we perform an exploratory analysis in our data and we observe that in median 24% of the pull requests of the investigated projects have a prolonged delivery time. This results motivate the Analysis Chapter 1. Introduction 19

II of this dissertation, which intend to investigate the impact of adopting continuous integration on the prolonged delivery time. Such an investigation help us to better understand which factors are most influential to predict pull requests that are going to have a prolonged delivery time, hence, it may help contributors and project managers to avoid such undesired delays.

1.4 Dissertation Contributions

The main contribution of this dissertation is to provide an empirical understand- ing of the impact of the adoption of continuous integration on the time-to-delivery of merged pull requests. Through an analysis of 90 GitHub projects and 167,037 pull requests, we outline the contributions of this dissertation below. We grouped the contributions by their respective dimension of analysis.

Analysis I — What is the impact of continuous integration on the delivery time of pull requests?

• On analyzing the percentage of merged PRs per project that missed at least one release prior being delivered to the end users, the results show that before adopting CI, a median of 13.8% of merged PRs are postponed by at least one release, while after adopting CI, a median of 24% of merged PRs have their delivery postponed to future releases. Furthermore, we find that many pull requests that miss at least one release were merged well before the release date of the missed releases (RQ1).

• We find that the time from submission to release of a pull request (i.e., pull request lifetime) is shorter before the adoption of continuous integration in most of the studied projects (53%) (RQ2).

• In the majority of the studied projects (68.9%), the merge time of pull requests is increased after adopting continuous integration (RQ2).

• It is not clear whether the adoption of continuous integration increase/decrease the delivery time of merged pull requests (RQ2).

• We find that the large increase in the number of pull requests submissions af- ter adopting continuous integration is a key reason as to why projects deliver pull requests more slowly after adopting continuous integration. 77.8% of the projects increase the rate of pull request submissions after adopting continuous integration (RQ3). Chapter 1. Introduction 20

• We are able to create heuristics that obtain sound results on estimating the delivery time of merged pull requests in terms of number of days and releases, both before and after continuous integration. Ou explanatory models achieve sound median R2 values of of.72 to 0.74 (RQ4).

• The number of commits performed to produce a release is the most influential factor to estimate delivery time of merged pull requests in terms of days and in terms of releases, both before and after continuous integration (RQ5).

• The time at which a pull request is merged (i.e., queue rank) and the amount of pull requests competing for being merged (i.e., merge workload) also have a strong impact on estimating the delivery time in terms of days and releases, both before and after continuous integration (RQ5).

Analysis II — What is the impact of continuous integration on the prolonged de- livery time?

• In median, 24% of the merged pull requests of the investigated projects have a prolonged delivery time (RQ6).

• Our models that identify merged pull requests that have a prolonged delivery time obtain excellent median AUC values of 0.92 to 0.97 (RQ6).

• Prolonged delivery time is more closely associated with the required number of commits to produce a release, and with project characteristics, such as the queue rank and merge workload. Moreover, the contributor experience and contributor delivery variables also play an influential role on identifying a prolong delivery time, both before and after continuous integration (RQ7).

1.5 Dissertation Organization

The remainder of this dissertation is organized as follows. In Chapter2, we present the necessary background and definitions to the reader. In Chapter3, we explain the design of our empirical study. In Section 3.1, we present each RQ and its re- spective motivation and research approach, while we present the project selection and data collection processes in Sections 3.2 and 3.3, respectively. In Chapter4, we present the results of this study and their treats to the validity. Finally, we draw conclusions in Chapter5. 21

2

Background & Definitions

In this chapter, we outline the key concepts and definitions that are necessary to understand the analyses that are performed in this dissertation.

2.1 The pull-based development model

The Distributed Version Control Systems (DVCS), e.g., Git, have revolutionized the way people develop software. The purpose of distributed development is to enable contributors around the world to contribute to a software project that are managed by a core team (GOUSIOS; PINZGER; DEURSEN, 2014). There are two general ways that potential contributors can submit their contributions to a software project in a distributed code-hosting environment (e.g., GitHub): (i) shared repository, and (ii) pull-based development. We explain each one of these approaches in the following.

(i) Shared repository

The core team shares the read and write accesses to the central repository, enabling external contributors to clone the repository, work locally and push their code contributions back to the central repository.

(ii) Pull-based development

Pull-based development is a paradigm broadly used by contributors of open source projects to develop software in a distributed and collaborative way (VASILESCU et al., 2015). By definition, open source software is a software for which interested users have access to its source code (MADEY; FREEH; TYNAN, 2002). Generally, open source can be seen as a computer software that is freely available in source code form and that allow users to freely use, study and change its source code, providing improvements on the software as per his/her requirements (TIWARI, 2010). Open source projects typically use code hosting providers (i.e., GitHub) to manage their code contributions. Chapter 2. Background & Definitions 22

The most popular code hosting providers, e.g., GitHub and Bitbucket provide support to the pull-based development model. On GitHub, almost half of all collabo- rative projects use pull requests in their development process (GOUSIOS et al., 2015). GitHub and Bitbucket allow any user to fork and clone any public repository and send pull requests (GOUSIOS; PINZGER; DEURSEN, 2014). A pull request is a mechanism enabled by Git that allows contributors to work locally on the forked repository and ask to have their contributions merged into the main repository. The writing access to a repository is not mandatory to submit pull requests (VASILESCU et al., 2015). The pull-based development process is explained in Figure2 that shows an overview of the process to send contributions to a repository using pull requests. We explain each step of the process below:

Figure 2 – An overview of the pull-based development model that is integrated with Continuous Integration.The Step 4 is only performed when continuous inte- gration is used.

• Step 1. Fork a repository: The main repository of a project is not shared to exter- nal contributors. Instead, contributors can clone the main repository by forking it, so they can modify the code without interfering in other repositories and with no need of being a team member.

• Step 2. Work locally the forked repository: The contributors develop new func- tionalities, fix bugs or provide features and enhancements to the forked reposi- tory. Chapter 2. Background & Definitions 23

• Step 3. Submit the local changes to the main repository. When changes are ready to be submitted, contributors request a pull of such changes to the main repository by sending a pull request (YU et al., 2016). Such pull request specifies the local branch that has to be merged into a given branch of the main repository.

• Step 4. Verify whether the pull request breaks the build. The continuous inte- gration service automatically merge the pull request into a test branch. Next, the continuous integration service builds the whole project and runs the test suite to verify whether the pull request breaks the codebase. Typically, if tests fail during the process of continuous integration, the pull request is rejected and additional changes are required to the external contributor to improve his/her pull request (YU et al., 2016). In case that all tests pass during the CI process, the integrators thoroughly review the pull request before deciding to accept the contributions. This decision is based on the quality, technical design, and the priorities of the submitted pull requests (GOUSIOS et al., 2015).

• Step 5. Accept or reject a pull request: After the pull request submission, an integrator of the main repository must inspect the changes to decide whether they are satisfactory. In case that the changes fulfill the requirements of the project, the integrator pulls them to the specified branch of the main repository. Otherwise, the core team may request additional changes to the external contributor to make his/her pull request acceptable. In the pull-based development, the integrator plays a crucial role by managing contributions (GOUSIOS et al., 2015).

Projects that use a shared repository strategy can also use pull requests in a complementary way, so that the core team members push their contributions directly, while external contributors submit their contributions via pull requests. Therefore, projects can also use pull requests for conduct code reviews and to discuss new features. In many projects, all contributions are submitted via pull requests, even when the contributions are sent by core developers. By using this approach, the projects ensure that only reviewed code gets merged (GOUSIOS et al., 2015).

2.2 Continuous Integration

Continuous Integration is a set of practices that lead developers to integrate their work more frequently, i.e., at least daily (FOWLER; FOEMMEL, 2006; MEYER, 2014). Basically, the main goal of continuous integration is to integrate early, so that the developers do not have to keep their code changes localized in their workspace for long. Instead, an automatic system must verify if the changes do not broke the Chapter 2. Background & Definitions 24 codebase of the software project, then these changes must be shared with the devel- opment team quickly (VIRMANI, 2015). In this , continuous integration aims to avoid the unpredictability of the code and a large integration effort (LAUKKANEN; PAASIVAARA; ARVONEN, 2015), by identifying software errors and defects quickly, so that the developers can correct such errors sooner (LAI; LEU, 2015). In continuous integration, all code must be maintained in a single repository. When a contributor commits to the repository, an automated system verifies whether the change breaks the codebase (Step 4 of Figure2)(MEYER, 2014). The entire process must be automated. Ideally, a build should compile the code and include a test suite to verify whether the codebase is broken after adding new changes. In continuous integration, the work of developers is continually compiled, built, and tested (YU et al., 2016). Continuous integration was originally proposed as one of the twelve Extreme Programming (XP) practices, but it is often used outside the context of XP (BELLER; GOUSIOS; ZAIDMAN, 2016). Continuous integration is widely used on GitHub. Accord- ing to Gousios et al. (GOUSIOS et al., 2015), 75% of GitHub projects that makes a heavy use of pull requests also tend to use continuous integration. Several CI services, such as Jenkins, TeamCity, Bamboo, CloudBees and Travis-CI (MEYER, 2014) are available for development teams. Jenkins and Travis-CI are the most used by GitHub projects (VASILESCU et al., 2015). Travis-CI is a CI platform for open source and private GitHub projects. Currently, over 300k projects are using this tool.1 The wide adoption of continuous integration is related to the perceived benefits that are brought by this practice. According to Fowler (FOWLER; FOEMMEL, 2006), the greatest benefit of continuous integration is reduce risk. The study of Duvall at al. (DUVALL; MATYAS; GLOVER, 2007) also stated that the adoption of continuous integration contribute to a higher confidence of the development team regarding their software product. Furthermore, continuous integration is often adopted by software projects with the allure of delivering new features more quickly (LAUKKANEN; PAA- SIVAARA; ARVONEN, 2015) and to increase the release frequency and predictability (STÅHL; BOSCH, 2014b).

2.3 Delivery Time

Delivery time refers to the time between the moment at which a pull request is merged to the time at which such a pull request is delivered to end users of a software system through an official software release. In this dissertation, we investigate two 1 Chapter 2. Background & Definitions 25 dimensions of delivery time: (i) delivery time in terms of number of days; and (ii) delivery time in terms of number of releases. Additionally, we investigate characteristics of pull requests that have a (iii) prolonged delivery time.

Definition 1 — Delivery time in terms of days

Figure3 shows the basic life-cycle of a released pull request, and provide an example of how we measure delivery time in terms of days. To compute delivery time in terms of days, we count the number of days between the moment at which a pull request was merged and the moment at which such a pull request was released (t2).

Figure 3 – An illustrative example of how we compute delivery time in terms of days.

Definition 2 — Delivery time in terms of releases

Figure4 provides an example of how we measure delivery time of merged pull requests in terms of releases. To compute the delivery time in terms of releases, we count the number of releases that a given merged pull request is prevented from delivery. For instance, in Figure4, PR #05 is submitted at time t1, merged at t2, and shipped at time t3. The delivery time in terms of releases for the PR #05 is the number of official releases that are shipped between t2 and t3. In the given example, PR #05 was prevented from delivery in the release v1.1, and it was delivered in the release v2.0, hence PR #05 has a delivery time of one release.

Definition 3 — Prolonged delivery time

We follow an approach similar to the one used by Costa et al. (COSTA et al., 2017) to identify pull requests that suffer from a prolonged delivery time. Let T = {t1, t2, ..., tn} be the set of delivery times for the pull requests p1, p2, ..., pn of a given project, we consider that pi has a long delivery time ti if ti > MAD(T) + median(T). The MAD refers to the Median Absolute Deviation of the distribution of delivery time of the pull requests of a given project. The greater the MAD, the higher the variation of a distribution with respect to its median (HOWELL, 2014; EFRON, 1986). The MAD is Chapter 2. Background & Definitions 26

Figure 4 – An illustrative example of how we compute delivery time in terms of releases. commonly used as an alternative approach to detect outliers. Instead of use standard deviation around the mean, we use absolute deviation around the median.

2.4 Chapter Summary

In this chapter, we provide the key concepts and terms that we use in this dissertation to the reader. We first describe the pull based development model and how developers contribute to a software project by sending pull requests (Section 2.1). Next, we outline the key concepts of continuous integration, which is a set of practices that lead developers to integrate their work at least daily. Furthermore, we described how continuous integration works with pull-based development (Section 2.2). Finally, we define the two different types of delivery time that we study in this dissertation (Section 2.3). 27

3

Empirical Study

In this chapter, we outline the motivation and research approach for each research question that is addressed in this study. Finally, we explain how we select the studied projects and construct the dataset that we use to perform the analyses that compose this dissertation.

3.1 Research Questions

In this section, we present the motivation and research approach for each studied RQ of this dissertation. In the following, we present each RQ grouped by its re- spective dimension of analysis, i.e., RQ1—RQ5 compose the Analysis I, which intend to study the impact of continuous integration on the delivery time of pull requests, while RQ6 and RQ7 compose Analysis II, which study the impact of continuous integration on the prolonged delivery time.

Analysis I — What is the impact of continuous integra- tion on the delivery time of pull requests?

3.1.1 RQ1: How often are merged pull requests prevented from be- ing released?

RQ1: Motivation

A higher delay to release pull requests can be frustrating to users and contrib- utors of a software project, once they care most about the time for a pull request to become available rather than the required time to merge such a pull request into the project code base. In this matter, it is important to investigate whether pull requests are being delivered immediately (e.g., in the next possible release after they have being merged) or not, because a long delivery time may frustrate users and contributors. In Chapter 3. Empirical Study 28

RQ1, we study how often merged pull requests are being prevented from delivery, both before and after continuous integration. The investigation of RQ1 is our first step to understand how long are the delivery time of pull requests in terms of releases.

RQ1: Approach

We use an approach similar to the one used by Costa et al. (COSTA et al., 2017) to investigate how often pull requests are prevented from being released. First, we compute the delivery time in terms of releases for each merged pull request of the investigated projects (see Definition 2). Next, for each of our investigated projects, we observe the percentage of pull requests that were delivered in the next upcoming release, and the percentage of pull requests that were prevented from being delivered in at least one release. Next, we grouped the pull requests of each project into two buckets: before and after continuous integration. For each bucket, we also observe the percentage of pull requests by project that were prevented from being delivered in one or more releases. The pull requests that do not miss any release were grouped into the next release bucket, while the pull requests that miss one or more releases were grouped into the later release bucket. Finally, we analyze whether merged pull requests are being prevented from being released because their merge occurs near to an upcoming release date, i.e., one day or week before the release date. For this purpose, we compute the merge timing metric, which represents the moment at which a pull request is merged in the release cycle. The merge timing ranges from 0 to 1. A merge timing value nearby to 1 indicates that the pull request was merged early in the release cycle, while merge timing values close to 0 represent the opposite. To compute the merge timing metric we use the following equation: (i) the remaining number of days after a pull request is merged —for an upcoming release over (ii) the duration in terms of days of its release cycle (See Equation 3.1).

# days that is remaining for a release (3.1) release cycle duration

3.1.2 RQ2 - Are pull requests released more quickly using continu- ous integration?

RQ2: Motivation

In recent years, many software companies have adopted the continuous inte- gration practice in their development life cycle. This wide adoption is related to the Chapter 3. Empirical Study 29 perceived benefits that are brought by continuous integration. For instance, the risk reduction, a higher confidence of the development team regarding their software prod- uct (DUVALL; MATYAS; GLOVER, 2007), higher productivity, higher release frequency and predictability (STÅHL; BOSCH, 2014b), and the allure of delivering new features more quickly (LAUKKANEN; PAASIVAARA; ARVONEN, 2015). However, there is a lack of studies that empirically check whether continuous integration really reduces the time-to-delivery of merged pull requests. In RQ2, we study the delivery time of merged pull requests before and after the adoption of continuous integration.

RQ2: Approach

Figure5 shows the basic life cycle of a released pull request: ( t1) merge phase; and (t2) delivery phase. We refer to the t1 + t2 time as to the lifetime of a pull request. In RQ2, we analyze the merge and delivery phases. The merge phase (t1) is the required time for pull requests to be merged into the codebase, whereas the delivery phase (t2) refers to the required time for pull requests to be released after they have been merged, i.e., ready to be delivered to end-users.

Figure 5 – The basic life-cycle of a released pull request.

We use beanplots (KAMPSTRA et al., 2008) to visually compare the different distributions of delivery time (see Figure 14). The higher the data frequency for a given value, the wider the bean is plotted on the Y axis for that particular value. In addition, we use Mann-Whitney-Wilcoxon (MWW) tests (WILKS, 2011) followed by Cliff’s delta effect-size measures (CLIFF, 1993). The MWW test is a non-parametric test whose null hypothesis is that two distributions come from the same population (α = 0.05). Cliff’s delta is a non-parametric effect-size metric to verify the magnitude of the difference between the values of two distributions. The higher the Cliff’s delta value, the greater the difference between distributions. A positive Cliff’s delta shows how larger are the values of the first distribution, while a negative Cliff’s delta shows the opposite. We use the thresholds provided by Romano et al. (ROMANO et al., 2006), i.e. delta < 0.147 (negligible), delta < 0.33 (small), delta < 0.474 (medium), and delta >= 0.474 (large). We use such statistical tools to analyze the entire life-cycle of a pull request before and after continuous integration. First, we analyze the pull request lifetime (t1 + t2). Then, we analyze the (t1) merge and (t2) delivery phases of a pull request separately. Chapter 3. Empirical Study 30

3.1.3 RQ3 - Does the increased development activity after adopt- ing continuous integration increase the delivery time of pull requests?

RQ3: Motivation

In RQ2, we find that 53% (48/90) of our studied projects deliver submitted pull requests more quickly before adopting continuous integration. However, since the adoption of continuous integration is motivated by the increase of the release frequency and predictability (STÅHL; BOSCH, 2014b), we suspected that pull requests would be delivered more quickly after the adoption of continuous integration. Nevertheless, the results suggest an opposite trend, which lead us to the following question: Why do 53% of our studied projects deliver submitted pull requests more quickly before adopting continuous integration? This investigation is important to better understand the impact of adopting continuous integration in software development.

RQ3: Approach

Similar to RQ2, we use Mann-Whitney-Wilcoxon tests (WILKS, 2011) and Cliff’s deltas (CLIFF, 1993) to analyze the data. We also use box plots (WILLIAMSON; PARKER; KENDRICK, 1989) to visually summarize and perform comparisons. In this research question, we investigate whether the increase on the delivery time of pull requests after adopting continuous integration is related to a significant increase in the pull request submissions after adopting continuous integration. We group our dataset into two buckets: before and after the adoption of continuous integration. For each bucket, we count the number of pull requests that are submitted, merged and delivered per release. We perform three comparisons in this RQ. First, we compare whether pull request submissions (per release) significantly increase after adopting continuous integration. Next, we organize our projects into two groups: (i) the projects for which the delivery time of pull requests increased after adopting continuous integration and (ii) the projects for which the delivery time of pull requests decreased after adopting continuous integration. For each group, we compare whether the submissions of pull requests significantly increased after adopting continuous integration. Chapter 3. Empirical Study 31

3.1.4 RQ4: How well can we model the delivery time of merged pull requests?

RQ4: Motivation

Several studies have proposed approaches to investigate the required time to merge a pull request (YU et al., 2015; YU et al., 2016) and to prioritize pull requests based on their characteristics (VEEN; GOUSIOS; ZAIDMAN, 2015). These studies could help integrators to prioritize their work in the face of multiple concurrent pull requests, they could also help to estimate when a pull request will be merged by an integrator of a software project. However, even though most pull requests are merged well before to the next release date, many of them are not delivered in the next release. In this matter, knowing the delivery time of merged pull requests is of great interest for the users and contributors of a software project. In RQ4, we investigate whether we can accurately model the delivery time of merged pull requests in terms of number of days and releases (see Definitions 1 and2 of delivery time). Our explanatory models are important to understand which variables may impact the delivery time of pull requests. Furthermore, the models could be used in future works and by practitioners to estimate when a merged pull request will likely be delivered (i.e., in the next release or after-1 or more releases).

RQ4: Approach

To study when a merged pull request is released, we use an approach on ap- plying supervised machine learning. The input for the learning algorithm is a set of attributes that would describe each pull request as detailed as possible. During the feature selection process, we collect information from the VCSs of the studied projects to include attributes that belong to one of the following families: contributor, pull request, project and process. We choose these families of attributes because we intend to investigate a variety of perspectives that may have influence on the delivery time of a merged pull request. Furthermore, Tables4 and5 show the complete description of the attributes that we compute for each family, and show the rationale that we use to include each attribute as a predictor of delivery time. We train explanatory models to study whether a merged pull request will be delivered into the next possible release or whether such a pull request will be prevented from delivery in one or more releases (see Definition 2). To study delivery time in terms of releases we use Logistic Regression Models (DAYTON, 1992; HILBE, 2009). We model the response variable Y as Y = 1 for the merged pull requests that were delayed, i.e., the pull requests missed at least one release before being released, and Y = 0 otherwise. Chapter 3. Empirical Study 32

In this context, our models are intended to explain why a given merged pull request has its delivery delayed (i.e., Y = 1). We use the Area Under the Curve (AUC) and Brier score metric to evaluate the performance of our models. The AUC metric is used to evaluate the degree of discrimination achieved by the models (HANLEY; MCNEIL, 1982). For instance, AUC can be used to evaluate how well our models can distinguish between merged pull requests that are delivered into the next possible release after they have been merged, and the pull requests that are prevented from delivery in one or more releases. The AUC refers to the area below the curve plotting the true positive rate against false positive rate. The values of AUC ranges from 0 (worst) to 1 (best). An area greater than 0.5 indicates that the explanatory model outperforms a random guessing (COSTA et al., 2017). Mehdi et al. (MEHDI et al., 2011) provide a rough guide for classifying the accuracy of a diagnostic test by using AUC metric, i.e., .90-1: excellent; .80-.90: good; .79-.80: fair; .60-.70: poor; .50-.60: fail. On the other hand, the Brier score (EFRON, 1986) metric is used to evaluate the accuracy of probabilistic predictions. The Brier score measures the mean squared difference between the probability of delay assigned by our models for a particular pull request P and the actual outcome of P (i.e., if P is actually delayed or not). Hence, the lower the Brier score, the more accurate the probabilities that are produced by our explanatory models (COSTA et al., 2016). We also study delivery time in terms of number of days (Definition 1). Toperform this analysis, we use multiple linear regression modeling (Ordinary Least Squares). Linear regression models are simple and often provide an adequate and interpretable description of how one or more explanatory variables X affects the dependent variable Y(HASTIE; TIBSHIRANI; FRIEDMAN, 2009). Regression models fit a curve of the form

n Y = β0 + ∑ Xjβj (3.2) j=1

The Y variable is the dependent variable (i.e., delivery time in terms of days in our study), while the X is the set of explanatory variables that may share a relation- ship with Y (e.g., churn and description length in our case). The set of β coefficients represents the weights given by the model to adjust the values of X in order to better estimate the dependent variable Y. Tables4 and5 show the set of explanatory variables that we use in our study to predict delivery time in terms of days. They also show the definition and rationale that is used to adopt each variable of our set of explanatory variables. We assess the fit of our linear regression models using the R2. The R2 corre- sponds the proportion of the variability in Y that can be explained by using X. In general, it is a challenge to determine what is a good R2 value, since it depends on the Chapter 3. Empirical Study 33 nature of the problem that is being investigated (JAMES et al., 2014). In this study, we consider in our analyses, only the models that achieve R2 values higher than 0.5. In other words, we ensure that at least 50% of the variability of our data is explained by our models. We analyze 91 models in total — 41 using pull requests data before continuous integration, and 50 using data after continuous integration. AppendixB provides the R2 value for each model that we fit. We follow the guidelines of Harrell Jr. (HARRELL, 2015) to build our explanatory models (Logistic and Linear Regression Models). Figure6 provides an overview of the process that we use to build our models. First, for each studied project we group its data into two buckets: before and after continuous integration. Then, we intend to create two Logistic Regression Models for each project, one using the pull requests data of before continuous integration, and another using the pull requests data of after continuous integration. Also, we train two Linear Regression Models for each project, one using pull requests data of the before-CI bucket, and another using pull requests data of the after-CI bucket.

Figure 6 – Training Linear and Logistic Regression Models. We follow the guidelines that are provided by Harrel Jr. (HARRELL, 2015) to train the explanatory mod- els, which involves eight activities, from data collection to model validation. We present a description of the Step 5.2 and 5.3 in RQ3.

In step 1 and 2 we account for co-linearity in our explanatory variables. In step 1, we check the redundancy of our explanatory variables. Redundant variables do not increase the explanatory power of the models and can distort the relationship between explanatory (X) and response (Y) variables. We use the redun function from the rms R package to remove the redundant variables from our set of explanatory variables. The redun function fits models to explain each explanatory variable using other explanatory variables (COSTA et al., 2017). We discard explanatory variables that are estimated with R2 >= 0.9. In step 2, we check the correlation of the surviving Chapter 3. Empirical Study 34 explanatory variables. We remove the high correlated variables by using a variable clustering analysis (SARLE, 1990). For variables within a cluster have a correlation of |p| > 0.7, we choose only one of them to include in our models. In step 3 and 4, we compute and allocate the budget of the degrees of freedom (D.F.)that the data of each studied project can accommodate while keeping the risk of overfit the models low. When using Logistic Regression Model we compute the n D.F.budget that we can spent in our models by using the equation 10 , where n is the number of instances for the class with the lowest number of instances, and 10 is a denominator that is recommended by Harrell Jr. (HARRELL, 2015). Furthermore, the n value of 10 must be greater or equal than the number of explanatory variables of our models. For example, we have two possible classes in our models (Next and Later) and 13 explanatory variables. If a given project have 1000 pull requests of before the adoption of continuous integration, which 100 belongs to the Later class, and the remaining 900 belongs to the Next class, then the D.F.budget restriction is not satisfied, e.g., 13 100 (the number of explanatory variables of the model) are greater then 10 , where 100 is the number of instance of the class with the lowest number of instances (Later). We filtered out models with similar settings. In step 5, we fitted 54 Logistic Regression models (12 using data of before contin- uous integration, and 42 using data of after continuous integration). It is important to highlight that the pull requests data of a given project may be used to build 0, 1, or 2 models. For instance, if the project data of before continuous integration do not satisfies the D.F.budget restriction, but the project data of after continuous integra- tions does, then we train just one model using the data of after continuous integration and vice versa. Furthermore, if the project data of both before and after continuous integration do not satisfies the D.F.budget restriction, we train no model with the data of such a project. The reason as to why the number of models that use data of after continuous integration be greater then the number models that use data of before continuous integration is that most studied projects have less pull requests of before continuous integration, also as 86.2% (median) of the pull requests of before contin- uous integration are delivered in the Next release/class, then for most projects the number of instances (pull requests) of the Later release/class do not satisfies the D.F. budget restriction. In step 5.1, we assess the stability of our Logistic Regression models by com- puting the optimism-reduced AUC and Brier Score. The optimism of each metric is computed as follows: (i) we count the D.F.that are spent to fit the original model, then we select a bootstrap sample to fit another model with the same D.F.of the original model; (ii) the model built from the bootstrap sample is applied both on the bootstrap and original samples (AUC and Brier score are computed for each sample). The opti- Chapter 3. Empirical Study 35 mism is the difference in the AUC and Brier score of the bootstrap and original sample. In our analyzes, we fit models for 1,000 bootstrap samples and the average optimism is computed. The AUC and Brier score optimism-reduced is calculated by subtracting the average optimism from the initial AUC and Brier score estimate. In step 5.1, we also evaluate the stability of our Linear Regression models by computing the optimism-reduced R2. While R2 gives an indication of how much vari- ability may be explained by our Linear Regression models, this metric may also be very dependent of the specific data to which our models were fitted, i.e., overfitted (MCIN- TOSH et al., 2016). Therefore, optimism-reduced R2 measure how stable are our models. The optimism of the R2 is computed by fitting models using bootstrap samples of the original data. For each model fit to a bootstrap sample, we calculate the difference of the R2 of such a model from the model fit to the original data. This difference is a measure of the optimism in the original model (COSTA et al., 2017). In this study, the bootstrap-calculated optimism is computed by computing the average optimism obtained using a set of 1,000 bootstrap samples. The smaller the bootstrap-calculated optimism the higher the stability of our explanatory models (EFRON, 1986). In step 5.2, we evaluate the impact that each variable of our set of explanatory variables has on the models that we fit, while we study the relationship that the most influential variables share with the response variable (delivery time) in step 5.3. We use these steps to answer RQ5 and RQ7 of this study. In these section we detailed each of the above mentioned steps (5.2 and 5.3).

3.1.5 RQ5: What are the most influential attributes for modeling delivery time?

RQ5: Motivation

In RQ4, we found that our models can accurately model delivery time of pull requests, both in terms of days and in terms of releases (Definitions 1 and2). To fit our models, we use attributes that we collect from VCSs of the studied projects. As described in Tables4 and5, we collected attributes that belongs to different families (contributor, pull request, project and process) that may be related to the delivery time of merged pull requests. In RQ5, we investigate what are the most influential attributes to model delivery time of merged pull requests, both before and after continuous integration. Chapter 3. Empirical Study 36

RQ5: Approach

In RQ5, we separately investigate what are the most influential variables to model delivery time according to the models that we fit using pull requests data of before continuous integration, and according the models that we fit using data of after continuous integration. Next, we show the relationship that the most influential variables share with delivery time. To identify the most influential variables for estimating the delivery time of merged pull requests both in terms of days (Definition 1) and in terms of releases (Definition 2), we use the Wald χ2 maximum likelihood tests (Step 5.2 of Figure6). The larger the χ2 value for a variable, the higher the influence that such variable has on our explanatory models. To calculate the χ2 value for each explanatory variable of the models that we fitted, we use the anova function of the rms R package. We use the following approach to calculate the percentage of the explanatory power of each variable of our models. Let V = (v1, v2,..., vk) be the set of explanatory variables of our 2 models, and f (vi) be the function that represents the χ value for vi. The explanatory power of vi on model delivery time, denoted as P(vi), can be computed using Equation 3.3. The explanatory power of a variable ranges from 0 to 1. The higher the explanatory power of a variable, the larger the influence of such a variable to model the delivery time.

f (v ) P(v ) = i i k (3.3) ∑ f (vj) j=1

To study the relationship that the most influential variables of our models share with the response variable (delivery time), we use the Predict function of the rms package of R language. The Predict function plot the change in the delivery time against the change in each influential variable while holding the other variables constant at their median values. Chapter 3. Empirical Study 37

Analysis II — What is the impact of continuous integra- tion on the prolonged delivery time?

3.1.6 RQ6: How well can we identify the merged pull requests that will suffer from a long delivery time?

RQ6: Motivation

A long delivery time of merged pull requests may frustrate end users and con- tributors of a software project. The end users are not much interested in just have a new functionality integrated in the code base of a project, instead they care most about when such a new functionality will be released, so they can benefit from it. Moreover, if a user are not aware of such a long delivery time their frustration may increase consid- erably because these users are not used to such delivery time (COSTA et al., 2017). This investigation help us to understand how well we can model long delivery time of pull requests, hence it may also help us to mitigate the problem of a prolonged delivery time.

RQ6: Approach

We calculate prolonged delivery time of pull requests (Definition 3) as described in Section 2.3. Table1 and Table2 show the medians and MADs for each studied project to identify merged pull requests that have a long delivery time. For instance, in the Yelp/mrjob project when a pull request takes more than the threshold of 83.4 days (median delivery time + MAD) to be release, we consider that such a pull request has a long delivery time. First, we calculate a long delivery time threshold for all pull requests of each studied project, including pull requests of before and after continuous integration in a unique set. Next, we separately calculate a long delivery time threshold for the pull requests delivered before and after continuous integration. We distinguish a long delivery time threshold for pull requests delivered before and after continuous inte- gration because if a project changes its policy of shipping releases after the adoption of continuous integration (i.e., quicken the time to ship new releases), then a given delivery time may be considered long for pull requests submitted after continuous integration, while may not be considered long for pull requests submitted before con- tinuous integration. In median, delivery time higher then 91 days are considered long for pull requests delivered after continuous integration, while a delivery time of 76 days (median) are considered long for pull requests delivered before continuous integration. Chapter 3. Empirical Study 38

Figure7 shows the distribution of the percentage of pull requests per project that have a long delivery time. On investigating all pull requests (before and after continuous integration together) of each studied projects, we observe that in median 24% of them have a long delivery time. Moreover, on investigating the pull requests of each project separated into two buckets, before and after continuous integration, we observe that in median 22% of such pull requests have a long delivery time both before and after continuous integration. 40 50

24% 22% 22% % of pull requests with long delivery with time % of pull requests 0 10 20 30

General CI NO-CI Figure 7 – Percentage of merged pull requests that have a long delivery time. We present the distribution of the percentage of merged pull requests that have a long delivery time on the studied projects.

To investigate whether a given merged pull request is likely to have a long delivery time, we use explanatory models (i.e., Logistic Regression Models). As a long delivery time threshold for a pull request of a project may variate depending if the pull request was delivered before or after the projects adopt continuous integration, we separately investigate how well we can identify if a pull request will suffer from a long delivery time, both before and after continuous integration. To train the Logistic Regression models, we produce a dichotomous response variable Y, where Y = 1 means that a merged pull request has a long delivery time, while Y = 0 means that the delivery time of that pull request is normal. Similar to RQ4, when using Logistic Regression Model we must account to the budget of degrees of freedom (D.F.)that the data of each studied project can accommodate while keeping the risk of overfit low. By using the guideline provided by Harrell Jr. (HARRELL, 2015) to Chapter 3. Empirical Study 39

Table 1 – Long delivery time thresholds (PART I). We present the median delivery time in terms of days and the MAD for each project. The threshold for a long delivery time is calculated as the median delivery time + MAD (See Definition 3). # Project CI NO-CI General Median Mad Median Mad Median Mad 1 Yelp/mrjob 41.50 57.08 39.29 32.50 36.00 47.44 2 yiisoft/yii 97.00 109.71 72.65 96.00 97.00 96.37 3 roots/sage 14.00 19.27 42.25 37.00 27.00 32.62 4 vanilla/vanilla 304.50 110.45 151.97 518.50 318.00 126.02 5 processing/p5.js 13.00 14.83 8.15 7.00 12.00 14.83 6 bokeh/bokeh 17.00 17.79 13.34 13.00 16.00 17.79 7 serverless/serverless 39.50 52.63 19.27 14.00 27.00 38.55 8 craftyjs/Crafty 216.00 94.89 32.62 32.00 50.00 65.23 9 invoiceninja/invoiceninja 21.00 16.31 8.90 10.00 12.00 11.86 10 scikit-image/scikit-image 96.00 90.44 25.20 31.00 79.00 83.03 11 dropwizard/dropwizard 141.50 101.56 26.69 26.00 107.00 115.64 12 androidannotations/androidannotations 98.00 102.30 118.61 94.00 94.00 114.16 13 aframevr/aframe 31.00 41.51 43.74 56.50 44.00 45.22 14 jashkenas/backbone 71.00 84.51 80.80 67.50 70.00 84.51 15 openhab/openhab 84.50 103.04 58.56 65.00 70.00 68.20 16 bcit-ci/CodeIgniter 696.00 386.96 102.30 1138.00 924.00 330.62 17 mizzy/serverspec 0.00 0.00 0.00 0.00 0.00 0.00 18 spinnaker/spinnaker 3.00 4.45 11.86 8.00 4.00 5.93 19 sensu/sensu 13.00 19.27 2.97 2.00 7.00 10.38 20 cython/cython 81.00 76.35 66.72 90.00 83.50 74.87 21 buildbot/buildbot 236.00 277.25 139.36 169.00 217.00 243.15 22 jsbin/jsbin 0.00 0.00 48.18 32.50 0.00 0.00 23 PokemonGoF/PokemonGo-Bot 12.00 4.45 0.00 1.00 7.00 8.90 24 naver/pinpoint 99.00 102.30 5.93 59.00 92.00 91.92 25 siacs/Conversations 7.00 8.90 7.41 9.00 8.00 8.90 26 photonstorm/phaser 16.00 14.83 8.90 8.00 15.00 14.83 27 fchollet/keras 20.00 14.83 21.50 27.00 22.00 16.31 28 robolectric/robolectric 108.00 84.51 159.38 134.50 111.00 93.40 29 TelescopeJS/Telescope 17.00 23.72 27.43 19.00 18.00 25.20 30 andypetrella/spark-notebook 58.00 32.62 183.10 182.50 82.00 99.33 31 apache/incubator-airflow 33.50 30.39 8.90 11.00 20.00 23.72 32 ReactiveX/RxJava 161.00 231.29 7.41 5.00 14.00 20.76 33 driftyco/ng-cordova 8.00 11.86 16.31 12.00 9.00 13.34 34 haraka/Haraka 68.50 71.91 23.72 19.00 41.50 49.67 35 isagalaev/highlight.js 24.00 22.24 44.48 40.00 28.00 23.72 36 bundler/bundler 74.00 82.28 87.47 116.00 93.00 90.44 37 humhub/humhub 36.00 48.18 30.39 33.00 33.00 34.10 38 square/picasso 34.50 40.77 28.17 28.00 32.00 35.58 39 Netflix/Hystrix 21.00 26.69 2.97 2.00 15.50 21.50 40 dropwizard/metrics 94.50 110.45 152.71 134.00 113.00 136.40 41 refinery/refinerycms 212.00 219.42 217.94 170.00 178.50 208.31 42 gollum/gollum 19.00 23.72 31.13 23.00 22.00 29.65 43 jhipster/generator-jhipster 7.00 7.41 7.41 6.00 7.00 7.41 44 mapbox/mapbox-gl-js 8.00 10.38 5.93 5.00 7.00 8.90 45 request/request 9.00 10.38 250.56 183.00 13.00 17.79 46 alohaeditor/Aloha-Editor 2.00 2.97 43.00 29.00 6.00 7.41 47 boto/boto 16.00 19.27 38.55 34.00 19.00 23.72 48 grails/grails-core 28.00 37.06 110.45 111.50 74.50 88.21 49 Pylons/pyramid 142.00 123.80 63.75 50.00 129.00 131.21 50 mantl/mantl 28.00 28.17 19.27 18.00 20.00 22.24 calculate the budget of D.F.of the data of each studied project, we only build models n for the projects that satisfies the following restriction: 10 >= k, where n is the number of instances for the class (prolonged or normal) with the lowest number of instances, 10 is a denominator that is recommended by Harrell Jr. (HARRELL, 2015), and k is the number of explanatory variables used in the model. We build 62 models (13 before Chapter 3. Empirical Study 40

Table 2 – Long delivery time thresholds (PART II). We present the median delivery time in terms of days and the MAD for each project. The threshold for a long delivery time is calculated as the median delivery time + MAD (See Definition 3). # Project CI NO-CI General Median Mad Median Mad Median Mad 51 ether/etherpad-lite 23.00 28.17 85.99 67.00 29.00 38.55 52 jashkenas/underscore 70.00 74.13 28.91 25.00 53.00 63.75 53 apereo/cas 180.00 146.78 277.25 353.00 210.00 174.21 54 kivy/kivy 129.50 131.21 28.17 38.00 90.00 106.75 55 elastic/logstash 23.00 22.24 43.74 40.50 30.00 32.62 56 getsentry/sentry 20.00 19.27 2.97 3.00 19.00 20.76 57 hapijs/hapi 2.00 2.97 7.41 5.00 3.00 4.45 58 HabitRPG/habitica 193.00 180.14 204.60 642.00 201.00 209.05 59 pyrocms/pyrocms 95.00 120.09 35.58 34.00 61.00 69.68 60 BabylonJS/Babylon.js 50.00 43.00 23.72 27.00 41.00 40.03 61 Leaflet/Leaflet 462.00 438.11 114.90 102.50 364.00 424.02 62 laravel/laravel 20.00 26.69 23.72 18.00 19.00 23.72 63 zurb/foundation-sites 16.00 19.27 11.86 9.00 14.00 17.79 64 callemall/material-ui 17.00 22.24 8.90 9.00 14.00 17.79 65 loomio/loomio 111.00 161.60 115.64 85.00 96.50 137.14 66 scikit-learn/scikit-learn 162.00 123.06 48.93 57.00 125.00 117.13 67 frappe/erpnext 2.00 2.97 200.15 135.00 2.00 2.97 68 Theano/Theano 220.00 164.57 188.29 144.00 208.00 192.74 69 puppetlabs/puppet 42.00 38.55 29.65 41.00 41.00 37.06 70 chef/chef 28.00 40.03 662.72 1147.00 30.00 43.00 71 woocommerce/woocommerce 52.00 59.30 68.94 50.50 52.00 66.72 72 divio/django-cms 49.00 50.41 78.58 85.00 49.00 50.41 73 scipy/scipy 138.00 80.06 69.68 167.00 143.00 81.54 74 matplotlib/matplotlib 109.00 108.23 68.20 108.00 109.00 105.26 75 sympy/sympy 131.50 123.06 138.62 251.50 148.00 143.81 76 twbs/bootstrap 36.00 37.06 13.34 12.00 34.00 37.06 77 AnalyticalGraphicsInc/cesium 909.00 31.13 19.27 953.00 945.00 34.84 78 elastic/kibana 66.00 54.86 133.43 217.00 70.00 60.79 79 mozilla/pdf.js 33.00 37.06 102.30 131.00 58.00 74.13 80 appcelerator/titanium_mobile 75.50 68.94 69.68 63.00 66.00 69.68 81 StackStorm/st2 20.00 25.20 65.23 196.00 41.00 56.34 82 TryGhost/Ghost 19.00 22.24 14.83 16.00 18.00 20.76 83 fog/fog 17.00 16.31 19.27 22.00 18.00 16.31 84 ansible/ansible 121.50 71.91 28.17 36.00 52.00 45.96 85 ipython/ipython 104.00 102.30 94.15 484.50 125.00 131.95 86 cakephp/cakephp 46.00 63.75 28.17 26.00 43.00 59.30 87 owncloud/core 47.00 44.48 84.51 81.00 55.00 56.34 88 rails/rails 184.00 152.71 106.75 121.00 184.00 148.26 89 mozilla-b2g/gaia 97.00 108.23 108.23 128.00 110.00 115.64 90 saltstack/salt 44.00 44.48 512.98 664.00 44.00 44.48 continuous integration and 49 after continuous integration). Similar to RQ2, we assess the goodness of the models using the AUC and Brier score metrics. Chapter 3. Empirical Study 41

3.1.7 RQ7: What are the most influential attributes for identifying the merged pull requests that will suffer from a long delivery time?

RQ7: Motivation

The results of RQ6 shows that our explanatory models can accurately identify whether a merged pull request is likely to have a long delivery time. Nevertheless, it is also important to understand which variables are more influential to identifying merged pull requests with long delivery time.

RQ7: Approach

Similar to RQ5, in this research question we analyze our explanatory models by computing the explanatory power score of each variable. We separately investigate what are the most influential variables to model a prolonged delivery time, both before and after the adoption of continuous integration.

3.2 Studied Projects

We intend to identify projects that have a long historical data and that adopted continuous integration at some point of their life. We use such projects to better understand the impact of adopting continuous integration on the delivery time of merged pull requests. We use a similar approach as used by Vasilescu et al. (VASILESCU et al., 2015) to select our projects. The selection process of our projects is shown in Figure8. We describe each step of this process in the following.

Figure 8 – An overview of our project selection process. Chapter 3. Empirical Study 42

First, we use the GitHub API to identify the 3,000 most popular projects that are written in the five most popular programming languages (Java, Python, Ruby, PHP and JavaScript) of GitHub.1,2 The popularity of a project is measured by the number of stars that are assigned to that project.3 We performed our search on GitHub in November 11th, 2016. Next, we check whether a project adopts a continuous integration service. In our study, we only consider projects that use Travi-CI. Similar to Vasilescu et al. (VASILESCU et al., 2015), we avoid projects that use Jenkins, since the entire build history of such projects is not available. We identify that a given project use Travis-CI when there are builds that are associated with the Travis-CI API. We use the date of the first Travis-CI build as the moment at which a project started to adopt continuous integration. Out of 3,000 projects, 1,784 (59,5%) have used Travis-CI. In step 3, we use the GitHub API to gather the number of merged pull requests of each project. We group the pull requests into the before- and after-CI buckets. We exclude projects that have less than 100 merged pull requests in the before or the after buckets to maintain a considerable amount of data to perform our analyses. As we use Linear Regression Models to perform analyses in RQ2 and RQ3 of this study, we must take account to the degrees of freedom (D.F.)that the data of each studied project can accommodate while keeping a risk of overfit the models low. We follow the guidelines that are provided by Harrel Jr. (HARRELL, 2015) to train regression models. As we have 13 explanatory variables (see Tables4 and5), and we separately study delivery time before and after continuous integration, we must have at least 13 × 10 = 130 pull requests in each studied bucket, where 13 is the number of explanatory variables that we use in our models, and 10 is a constant that is recommended by Harrell Jr. (HARRELL, 2015). However, as we remove collinear variables before train ou regression models, instead of using 130 as a min threshold for the number of pull requests in each bucket, we use 100 pull requests. 156 projects remains after Step 3. Finally, we use the GitHub API to fetch all pull requests and their metadata for the remaining projects. We then link the pull requests to their specific releases. Such links help us to calculate the total time between when a pull request was merged and when that pull request was released. We refer to this time interval as to “delivery time” (see Definition 1). Finally, by using a similar approach as we use in Step 3, we filter out projects that have less than 100 linked pull requests in the before or after buckets. A total of 90 projects remains. Figure9 shows the number of studied projects grouped by programming language (33 JavaScript, 25 Python, 12 Java, 10 Ruby and 10 PHP). 1 2 3 Chapter 3. Empirical Study 43

33

25 20 25 30 35

12 10 10 Number of projects 0 5 10 15 Java PHP Ruby Python JavaScript

Figure 9 – Number of projects grouped by programming language.

On analyzing the studied projects, we observe that 126,716 pull requests were delivered after the adoption of continuous integration, while 40,321 were delivered before the adoption of continuous integration (a total of 167,037 pull requests, see Table3). This unbalanced number of pull requests in each bucket may occur due to an increase in the pull request submissions after the adoption of continuous integration. Also, the reason as to why our database has more pull requests from the after contin- uous integration bucket may be related to the time at which the projects start using continuous integration. For instance, in average the studied projects have 5.05 years from creation, where in 2.01 years the projects did not use continuous integration, and in 3.04 years the projects used continuous integration. Table3 shows the number of pull requests per programming language before and after continuous integration.

Table 3 – Summary of the number of projects and released pull requests grouped by programming language. Language Projects PRs total PRs after CI PRs before CI JavaScript 33 57,408 39,668 17,740 Python 25 57,750 47,754 9,996 Java 12 9,122 5,575 3,547 Ruby 10 22,902 19,695 3,207 PHP 10 19,855 14,024 5,831 Total 90 167,037 126,716 40,321 Chapter 3. Empirical Study 44

3.3 Data Collection

After we select our studied projects, we fetch pull request and release meta-data for each project. The data collection process is shown in Figure 10. Each step of the process is detailed below.

Figure 10 – An overview of our data collection process.

Step 1. Collect pull request information

We use the GitHub API to collect pull requests and their respective meta-data. For each pull request, we select the following attributes: author (GitHub user), pull- number, title, description, number of added and deleted lines (churn), number of changed files, number of activities, number of comments, date of comments, state (Open, Closed, Merged), creation date, close date, and closedBy (GitHub user).

Step 2. Link pull requests to releases

After we collect the pull request information, we collect the release information of the studied pull requests. We collect the publish date, start date, the number of commits and the number of pull requests for each release of the studied projects. We also manually verify whether the releases are user-intended, so that we do not consider pre, beta, alpha, rc (release candidate) releases in our analyses. Instead, in case that a pull request was released in a non-user-intended release, we link such a pull request with the next user-intended release. For example, if a project has the following release tags: v1.0, v1.1.pre, v1.1 and v2.0, chronologically ordered and a pull request is released in v1.1.pre release, we move the publish date of this pull request to the next user-intended release (i.e., the date of release v.1.1). We clone all repositories of our studied projects Chapter 3. Empirical Study 45 and fetch all of their releases tags. We then compute a diff between these tags to verify which commit logs were added in a given tag. Next, we parse our obtained commit logs. For instance, if we find the pattern “Merge pull request #" (which is automatically generated by GitHub when a pull request is merged) between release tags v1.1 and v2.0, we consider that such a commit log was released in v2.0. By using the pattern “Merge pull request #" we could link 85% (167,037/196,716) of the merged pull requests to its commits. The remaining 15% may still be waiting for a release to be shipped or the integrator might have cherry picked the commits of these pull requests. In the latter case, the pattern “Merge pull request #" is not automatically recorded in the respective commit logs. Finally, we link merged pull requests to their releases based on the tags that are associated to the commits.

Step 3 and 4. Compute metrics and perform analyses

We use data from Steps 1 and 2 to compute the metrics that we use in the regression analyses that we perform in RQ2, RQ3, RQ4 and RQ5 of this work. During the metric selection process, we collect information from the VCSs of the studied projects to include attributes that belong to one of the following families: contributor, pull request, project and process. We choose these families of attributes because we intend to investigate a variety of perspectives that may have influence on the delivery time of merged pull requests. We describe each of these families in the following. Furthermore, Tables4 and5 show the complete description of the attributes that we compute for each family, and show the rationale that we use to include each attribute as a predictor of delivery time. The attributes presented in Tables4 and5 were chosen based on the attributes that Costa et al. (COSTA et al., 2016) used in their work. They used a set of attributes as predictors of regression models to investigate delivery time of fixed issues in the Firefox project.

• Contributor: refers to the developers that send contributions to a software project by means of pull requests. Pull requests that are submitted by experienced contributors may be easily reviewed and merged into the project code base, also such pull requests may be quickly released to end users.

• Pull request: refers to the attributes of pull requests. Integrators may use this information to merge and deliver pull requests. For instance, a poor description of a pull request can make integrators unable to understand the content of such a pull request and assess its importance, which may increase its delivery time.

• Project: refers to the status of the project when a pull request was merged. If the project team has a higher delivery workload, i.e., many pull requests are waiting Chapter 3. Empirical Study 46

to be released, the release of a newly merged pull request is likely to be delayed in one or more releases.

• Process: refers to the process of merging a pull request. A merged pull request that is involved in a complex process (i.e., long comment threads, larger number of impacted files) could be more difficult to understand and deliver.

3.4 Chapter Summary

In this chapter, we present the procedures, methods and metrics that we use to perform the analyses that compose this dissertation. We first present each RQ of this study and its motivation and research approach (Section 3.1). Next, we describe the process used to select the studied projects (Section 3.2). Finally, we describe how we collect the pull requests data for each studied project (Section 3.3). Chapter 3. Empirical Study 47

Table 4 – Metrics that are used in our explanatory models (Contributor, Pull Request and Project families). Dimension Atributes Type Definition (d) | Rationale (r) d: The number of previously released PRs that were submitted by the contributor of a particular PR. We consider the author of the PR to be its contributor. Contributor r: The greater the experience and participation of Numeric Experience a user within a specific open source project, the greater his/her chance of having his/her PR reviewed, merged, and released quickly. (SHIHAB et al., 2010). Contributor d: The average in days of the previously released PRs that were submitted by a particular contributor. Contributor r: If a particular contributor usually submit PRs that Numeric Integration are merged and released quickly, his/her future PR might be merged and released quickly as well (COSTA et al., 2016). d: We verify if the PR report has an stack trace at- tached in its description. r: If the PR provide a bug fix, a stack trace attached Stack Trace may provide useful information regarding the causes Boolean Attached of the bug and the importance of the submitted code, Pull which may quicken the merge of the PR and its de- Request livery in a release of the project (SCHROTER et al., 2010). d: The number of characters in the body (descrip- tion) of a PR. Description Numeric r: PRs that are well described might be more easier Size to merge and release than PRs that are more difficult to understand (COSTA et al., 2016). d: The number that represents the moment when a PR is merged compared to other merged PRs in the release cycle. For example, in a queue that contains Queue 100 PRs, the first merged PR has position 1, while the Numeric Rank last merged PR has position 100. r: A PR with a high queue rank is a recently merged PR. A merged PR might be released faster/slower de- pending of its queue position (COSTA et al., 2016). Project d: The amount of PRs that were created and still wait- ing to be merged by a core integrator at the moment at which a specific PR is submitted. r: A PR might be released faster/slower depending of Merge Numeric the amount of submitted PRs waiting to be merged. Workload The higher the amount of created PRs waiting to be analyzed and merged, the greater the workload of the contributors to analyze these PRs, which may impact the delivery time of them. Chapter 3. Empirical Study 48

Table 5 – Metrics that are used in our explanatory models (Process family). Dimension Atributes Type Definition (d) | Rationale (r) d: The number of files linked to a PR submission. Number of r: The delivery time might be related to the high num- Impacted Numeric ber of files of a PR, because more effort must be spent Files to integrate it (JIANG; ADAMS; GERMAN, 2013). d: The number of added lines plus the number of deleted lines to a PR. r: A higher churn suggests that a great amount Churn Numeric of work might be required to verify and integrate the code contribution sent by means of PR (JIANG; ADAMS; GERMAN, 2013; NAGAPPAN; BALL, 2005). d: Number of days between the submission and merge of a PR. Merge Time Numeric r: If a PR is merged quickly, it is more likely to be released faster. d: An activity is an entry in the PR’ history. r: A high number of activities might indicate that Number of Numeric much work was required to turn the PR acceptable, Activities which may impact the integration of such PR into a Process release (JIANG; ADAMS; GERMAN, 2013). d: The number of comments of a PR. r: A high number of comments might indicate the Number of Numeric importance of a PR or the difficulty to understand it Comments (GIGER; PINZGER; GALL, 2010), which may impact its delivery time (JIANG; ADAMS; GERMAN, 2013). d: The sum of the time intervals (days) between com- ments divided by the total number of comments of a PR. Interval of Numeric r: A short interval of comments indicates the discus- Comments sion was held with priority, which suggest that the PR is important, thus, the PR might be delivered faster (COSTA et al., 2016). d: Number of commits in the release associated with a PR. Release r: The higher the number of commits in a release, the Numeric Commits greater the amount of contribution to be delivered in such release, which might impact its duration, and hence, the delivery time of the PRs. 49

4

Study Results

In this chapter, we present the results of each investigated RQ grouped by its re- spective dimension of analysis. First, we present the results of RQ1—RQ5 that compose the Analysis I, which investigate the impact of continuous integration on the delivery time of pull requests. Finally, we present the results of RQ6 and RQ7, which compose the Analysis II, and investigate the impact of continuous integration on the prolonged delivery time.

4.1 Analysis I — What is the impact of continuous inte- gration on the delivery time of pull requests?

RQ1: How often are merged pull requests prevented from being re- leased?

Merged pull requests are usually delivered in the next upcoming release after they have been merged, both before and after continuous integration. On analyzing how often pull requests are prevented from being released considering all pull requests of each studied project, we observe that in median 18.3% of them are delayed in one or more releases (see Figure 11a). In addition, Figures 11b and 11c show the distribution of the percentage of pull requests/project delivered in each release bucket (next or later), before and after continuous integration, respectively. Before continuous integration, projects deliver 86.2% (median) of their pull requests into the next bucket, while deliver 76% after continuous integration. One possible reason for the percentage of pull requests that are delivered into the next bucket before continuous integration to be superior to the percentage of pull requests that are delivered into the next bucket after continuous integration may be related to the increase of the development activities workload after continuous integration, what may lead pull requests to be delayed (see RQ3). Projects ship new releases in a time interval of 35.7 days (median) before continu- ous integration, while taken 33.1 days to ship new releases after continuous integration. Chapter 4. Study Results 50

86.2% 81.7%

80 100 80 100 76% 80 100

24% Proportion of pull requests 18.3% 13.8% 0 20 40 60 0 20 40 60 0 20 40 60

Next Later Next Later Next Later (a) PRs per bucket (All Time) (b) PRs per bucket after CI (c) PRs per bucket before CI Figure 11 – Distribution of pull requests per bucket before and after continuous inte- gration. The pull requests are grouped into the next and later buckets.

Figure 12 shows the distributions of the time interval between the studied releases of the projects, before and after continuous integration. The difference regarding the required time to ship new releases in the studied projects may be due to the release policies that are followed by each project (COSTA et al., 2017). According to Stahl and Bosch (STÅHL; BOSCH, 2014b), one of the benefits of continuous integration is to increase the release frequency and predictability. However, when comparing the me- dian duration of the release cycles of each studied project before and after continuous integration, we do not observe a significant difference (p − value = 0.1109, MWW test; Negligible Cliffs’ Delta of 0.138). Chapter 4. Study Results 51 400 Number of days

33.1 35.7 0 100 200 300 CI NO-CI Figure 12 – Number of days between the studied releases of the projects, before and after continuous integration. The number shown over each boxplot is the median interval.

In median, 13.8% of the merged pull requests per project miss at least one re- lease before being delivered to end users before continuous integration, while 24% miss at least one release after continuous integration. This result indicates that even though a pull request is merged, its delivery may be prevented by one or more releases, both before and after continuous integration, which can frustrate end users and con- tributors. AppendixC shows the percentage of delivered pull requests per project in the next and later release buckets, before and after the adoption of continuous integration. Many pull requests that were prevented from delivery are merged well before the upcoming release date. Pull requests that are submitted and merged late in the release cycle, e.g., one day or week before the upcoming release may be prevented from delivery (COSTA et al., 2017). To check whether merged pull requests are being prevented from delivery mostly because they are being merged late in the release cycle, we compute the fix timing metric. Figure 13 shows the distribution of the median fix timing metric of the pull requests for each project, before and after continuous inte- gration. It shows that many pull requests that were prevented from delivery in at least one release were merged well before the end of the release cycle. The median merge timing for the pull requests are 0.78 and 0.8 before and after continuous integration, respectively. Hence, it is unlikely that most merged pull requests are prevented from delivery solely because they were merged too close to an upcoming release date. Chapter 4. Study Results 52 1.0

0.8 0.78 Merge Timing Merge 0.0 0.2 0.4 0.6 0.8 CI NO-CI Figure 13 – Merge timing metric. We present the distribution of the merge timing metric for merged pull requests that are prevented from integration in at least one release.

In median, 13.8% of the merged pull requests per project were prevented from delivery in at least one release before continuous integration, while 24% missed at least one release after continuous integration. Furthermore, we found that many pull requests that missed at least one release were merged well before the release date of the missed releases.

RQ2 - Are pull requests released more quickly using continuous in- tegration?

In 53% of the projects, the time from submission to delivery of pull requests (i.e., pull request lifetime) is shorter before the adoption of continuous integration. Surprisingly, the majority of our studied projects (48/90) increase the time from sub- mission to delivery of pull requests after the adoption of continuous integration. We observe that pull requests take 46.5 days (median) to be merged and released after adopting continuous integration, while taking 45 days before continuous integration. Figure 14a compares the distributions of lifetime of pull requests before and after the adoption of continuous integration.

We observe that 74.4% (67/90) of the projects have a statistically significant differ- ence (p-value < 0.05) and a non − negligible median delta between the distributions of lifetime of pull requests (delta >= 0.147). 34.3% (23/67) of such projects obtained a large delta (median 0.631), while 25.3% (17/67) and 40.3% (27/67) of the projects obtained medium and small deltas, respectively (medians of 0.360 and 0.222). Regarding the Chapter 4. Study Results 53 800 50.0

CI NO-CI Number of days

CI NO-CI CI NO-CI 0 200 400 600 800 1000 0 200 400 600 0.1 0.5 5.0

(a) Lifetime (b) Merge time (c) Delivery time Figure 14 – The required number of days to merge and deliver pull requests (pull re- quest lifetime). projects that obtained a p − value < 0.05, we observe that 52.2% (35/67) had a shorter pull request lifetime before adopting continuous integration, while 47.8% (32/67) had a shorter pull request lifetime after adopting continuous integration. We also analyze the trend of the data for projects that obtained non-significant p − values. In these projects, we verify that there is a similar trend of a shorter lifetime of pull requests before adopting continuous integration (56.5%, 13/23). Our results do not corroborate with the perceived decrease on the lifetime of pull requests that should occur after adopting continuous integration (LAUKKANEN; PAASIVAARA; ARVONEN, 2015).

In 68.9% (62/90) of the projects, pull requests are merged slightly faster before adopting continuous integration. Figure 14b shows the distribution for the merge phase (t1). We observe that submitted pull requests take 3.4 days (median) to be merged before continuous integration, and 5.2 days after continuous integration. A total of 68.9% (62/90) of the projects have a statistical difference on the time to merge pull requests with a median Cliff’s delta of 0.160 (small). With respect to such projects, we observe that 74.2% (46/62) merge pull requests more quickly before continuous integration. With respect to the 31.1% (28/90) of projects for which p − values are > 0.05, we observe a somewhat similar trend, i.e., 57.1% (16/28) of the projects merge pull requests more quickly before using continuous integration. Regarding the delivery phase (t2), we observe that roughly a half of the projects for which p − values are < 0.05 (51.3%, 39/76) have a shorter delivery time of their pull requests after the adoption of continuous integration. Nevertheless, by analyzing the projects for which p − values are > 0.05 (14/90), we verify that in 57.1% (8/14) of them, their pull requests have a shorter delivery time before adopting continuous integration. The Chapter 4. Study Results 54 median delivery time (t2) for the projects before and after the adoption of continuous integration are 39 and 43 days, respectively. Figure 14c shows the distribution for the delivery time of pull requests per project, before and after continuous integration. Our analyses indicate that 84.4% (76/90) of the projects have a statistical difference on the delivery time of merged pull requests, but a small median Cliff’s delta of 0.297.

In 68.9% (62/90) of the projects, submitted pull requests tend to wait longer to be merged after the adoption of continuous integration. On the other hand, on comparing the delivery time of merged pull requests, we observe that roughly a half of the projects have a shorter delivery time after the adoption of continuous integration.

RQ3 - Does the increased development activity after adopting con- tinuous integration increase the delivery time of pull requests?

77.8% (70/90) of the projects increase pull request submissions after adopting continuous integration. While 55.6% (50/90) of the projects obtained p − values < 0.05, the median Cliff’s delta for these projects is 0.603 (large). Although 44.4% (40/90) of the projects obtain a p − value > 0.05, it is interesting to observe that the majority of such projects (65%, 26/40) tend to increase the number of pull request submissions after adopting continuous integration. Figure 15 shows the distributions of the number of submitted, merged and delivered pull requests per release for each project. We observe that projects tend to submit a median of 44.8 pull requests per release after adopting continuous integration, while a median of 15.9 pull requests before adopting continuous integration. We also observe a significant increase in the number of merged and delivered pull requests per release (medians of 30 and 32.9, respectively) after adopting continuous integration. The median values of merged and released pull requests before adopting continuous integration are of 10.9 and 9.4, respectively. With respect to projects for which the delivery time of pull requests increased after adopting continuous integration, we find that 64.9% (24/37) significantly increased the pull request submissions per release (large median Cliff’s delta of 0.579). Only 2.7% (1/37) had higher pull requests submissions before adopting continuous integration. Although 32.4% (12/37) of these projects do not obtain a significant p − value, we ob- serve that 75% (9/12) of them tend to increase pull requests submissions after adopting continuous integration. Figure 16 shows the distributions of submitted pull requests per release for the projects in which delivery time increased after adopting continu- ous integration. Our results suggest that the adoption of continuous integration will not always decrease delivery time of merged pull requests. However, this result may be explained by the large increase of pull request submissions after the adoption of continuous integration. Chapter 4. Study Results 55

44.8 30 32.9 15.9 10.9 9.4 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 Number of pull requests per release Number of pull requests CI NO-CI CI NO-CI CI NO-CI (a) Submitted pull requests (b) Merged pull requests (c) Delivered pull requests Figure 15 – Pull request submission, merge, and delivery rates per release.

56.4% (22/39) of the projects for which delivery time of pull requests decreased after adopting continuous integration preserve the rate of pull request submissions per release. We also observe that pull request submissions are lower in 10.3% (4/39) of these projects. Moreover, 33.3% (13/39) of these projects also obtain an increase in the pull request submissions per release. The large increase of pull request submissions in the studied projects after adopting continuous integration is a possible reason as to why projects may deliver pull request more quickly before the adoption of continuous integration.

RQ4: How well can we model the delivery time of merged pull re- quests?

RQ4: Results for delivery time in terms of releases

Our models achieve a median Brier score of 0.1168 when fitted using data of pull requests that were delivered before continuous integration, while achieve 0.1166 using the data of pull requests that were delivered after continuous integra- tion. Furthermore, the median bootstrap-calculated optimism is 0.003 for the brier score of the models that were fitted using pull requests data of after continuous in- tegration, whereas is 0.007 when fitted using data of before continuous integration. Figure 17a shows the distribution of the brier score values of our models, while Figure 17b shows the distribution of the bootstrap calculated optimism for the brier score of our models. Moreover, Tables6 and7 show all the Brier Scores and AUC values for Chapter 4. Study Results 56

scikit-image dropwizard buildbot incubator-airflow openhab RxJava serverless Crafty Haraka invoiceninja serverspec humhub 0 50 100 200 300 0 20 40 60 80 100 0 50 100 150 200 250 0 50 100 150 0 100 200 300 400 0 50 100 150 200 0 100 200 300 400 0 20 40 60 0 50 100 150 200 0 10 20 30 40 0 2 4 6 8 10 12 0 10 20 30 40 50

CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI refinerycms PokemonGo-Bot pinpoint Hystrix sensu kivy Ghost generator-jhipster phaser mapbox-gl-js ansible sentry 150 0 50 100 150 200 0 100 300 500 700 0 10 20 30 40 50 0 5 10 15 20 0 50 100 0 20 40 60 80 0 20 40 60 80 100 0 100 200 300 400 0 50 100 150 0 100 300 500 700 0 50 100 200 300 0 50 100 150 200 250

CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI cakephp pyrocms Babylon.js pyramid Leaflet rails bootstrap underscore foundation-sites material-ui loomio scikit-learn 150 150 150 0 500 1000 1500 0 50 100 0 50 100 150 200 250 0 100 200 300 400 500 0 100 200 300 400 0 1000 2000 3000 4000 0 200 400 600 800 0 50 100 150 200 0 50 100 0 50 100 0 500 1000 1500 2000 0 100 200 300

CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI scikit-image dropwizard buildbot incubator-airflow openhab RxJava serverless Crafty Haraka invoiceninja serverspec humhub 0 50 100 200 300 0 20 40 60 80 100 0 50 100 150 200 250 0 50 100 150 0 100 200 300 400 0 50 100 150 200 0 100 200 300 400 0 20 40 60 0 50 100 150 200 0 10 20 30 40 0 2 4 6 8 10 12 0 10 20 30 40 50

CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI refinerycms PokemonGo-Bot pinpoint Hystrix sensu kivy Ghost generator-jhipster phaser mapbox-gl-js ansible sentry 150 0 50 100 150 200 0 100 300 500 700 0 10 20 30 40 50 0 5 10 15 20 0 50 100 0 20 40 60 80 0 20 40 60 80 100 0 100 200 300 400 0 50 100 150 0 100 300 500 700 0 50 100 200 300 0 50 100 150 200 250

CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI cakephp pyrocms Babylon.js pyramid Leaflet rails bootstrap underscore foundation-sites material-ui loomio scikit-learn 150 150 150 0 500 1000 1500 0 50 100 0 50 100 150 200 250 0 100 200 300 400 500 0 100 200 300 400 0 1000 2000 3000 4000 0 200 400 600 800 0 50 100 150 200 0 50 100 0 50 100 0 500 1000 1500 2000 0 100 200 300

CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI CI NO-CI

Figure 16 – Number of pull request submissions (per release) before and after the adoption of continuous integration.

each model that we fitted. Our models obtain median AUCs between 0.85 and 0.9, which indicate that Chapter 4. Study Results 57

Table 6 – Brier Score and AUC values for the models that we fitted using pull requests data of before continuous integration. # Project Brier Score Brier Optimism AUC AUC Optimism 1 openhab/openhab 0.022 0.004 0.995 0.003 2 refinery/refinerycms 0.084 0.006 0.951 0.007 3 grails/grails-core 0.116 0.015 0.895 0.024 4 Leaflet/Leaflet 0.039 0.014 0.983 0.012 5 loomio/loomio 0.127 0.013 0.889 0.020 6 woocommerce/woocommerce 0.019 0.002 0.997 0.001 7 matplotlib/matplotlib 0.128 0.011 0.889 0.016 8 appcelerator/titanium_mobile 0.194 0.001 0.777 0.002 9 ansible/ansible 0.118 0.001 0.912 0.002 10 cakephp/cakephp 0.153 0.015 0.857 0.024 11 owncloud/core 0.071 0.002 0.939 0.003 12 saltstack/salt 0.164 0.008 0.836 0.013

Table 7 – Brier Score and AUC values for the models that we fitted using pull requests data of after continuous integration. # Project Brier Score Brier Optimism AUC AUC Optimism 1 yiisoft/yii 0.087 0.007 0.935 0.009 2 dropwizard/dropwizard 0.156 0.008 0.841 0.014 3 aframevr/aframe 0.210 0.019 0.717 0.040 4 buildbot/buildbot 0.060 0.004 0.967 0.003 5 jsbin/jsbin 0.168 0.008 0.694 0.025 6 naver/pinpoint 0.083 0.003 0.939 0.003 7 apache/incubator-airflow 0.110 0.010 0.929 0.013 8 ReactiveX/RxJava 0.048 0.005 0.982 0.003 9 bundler/bundler 0.175 0.010 0.778 0.020 10 Netflix/Hystrix 0.190 0.010 0.704 0.027 11 refinery/refinerycms 0.048 0.007 0.980 0.005 12 jhipster/generator-jhipster 0.030 0.003 0.983 0.003 13 Pylons/pyramid 0.174 0.004 0.813 0.008 14 ether/etherpad-lite 0.133 0.008 0.789 0.028 15 getsentry/sentry 0.201 0.005 0.724 0.011 16 pyrocms/pyrocms 0.090 0.007 0.944 0.008 17 Leaflet/Leaflet 0.139 0.006 0.869 0.010 18 laravel/laravel 0.116 0.005 0.859 0.010 19 zurb/foundation-sites 0.095 0.003 0.878 0.008 20 callemall/material-ui 0.092 0.003 0.910 0.006 21 scikit-learn/scikit-learn 0.198 0.004 0.710 0.011 22 frappe/erpnext 0.104 0.002 0.895 0.003 23 puppetlabs/puppet 0.189 0.002 0.761 0.004 24 chef/chef 0.131 0.003 0.812 0.008 25 woocommerce/woocommerce 0.045 0.003 0.975 0.004 26 divio/django-cms 0.174 0.003 0.810 0.005 27 scipy/scipy 0.155 0.002 0.707 0.011 28 matplotlib/matplotlib 0.113 0.002 0.877 0.004 29 sympy/sympy 0.181 0.002 0.714 0.006 30 twbs/bootstrap 0.053 0.001 0.884 0.005 31 elastic/kibana 0.088 0.002 0.925 0.005 32 appcelerator/titanium_mobile 0.117 0.003 0.923 0.005 33 StackStorm/st2 0.165 0.003 0.725 0.008 34 TryGhost/Ghost 0.053 0.002 0.931 0.006 35 fog/fog 0.046 0.002 0.895 0.008 36 ansible/ansible 0.050 0.002 0.947 0.008 37 ipython/ipython 0.117 0.001 0.841 0.003 38 cakephp/cakephp 0.145 0.001 0.779 0.004 39 owncloud/core 0.174 0.001 0.805 0.002 40 rails/rails 0.083 0.000 0.775 0.002 41 mozilla-b2g/gaia 0.092 0.000 0.712 0.002 42 saltstack/salt 0.159 0.000 0.759 0.001 Chapter 4. Study Results 58

0.1166 0.1168

0.0071 Brier Score Brier Score optimism Brier Score 0.003 0.05 0.10 0.15 0.20

CI NO-CI 0.000 0.005CI 0.010 0.015 NO-CI (a) Brier score of the models (b) Brier score optimism of the models Figure 17 – Distribution of the Brier Score and the Brier optimism of the models before and after CI. our models estimations highly outperform random guessing (AUC of 0.5). Figure 18 shows the distribution of the AUC values and the bootstrap calculated optimism for the AUC values of our models, both before and after continuous integration. The median bootstrap calculated optimism for the AUC values of our models were of 0.007 and 0.006, when calculated for the models that were fitted with pull requests data of before and after continuous integration, respectively. Summarizing, our models provide sound brier score and AUC values, and they may be used as a starting point for studying whether a merged pull request will be prevented from being released.

We are able to accurately model whether pull requests are likely to be prevented from being released into the next upcoming release after they have been merged. Our models achieve sound AUC values of 0.85 for the models that use pull requests data of after CI, while achieve 0.9 for the models that use pull requests data of before CI.

RQ4: Results for delivery time in terms of days

Our linear models achieve a median R2 of 0.72 using pull request data before continuous integration, while achieving 0.74 after continuous integration. More- over, the median bootstrap-calculated optimism is less than 0.045 for both set of R2 of our set of models. Figure 19 shows the distribution of the R2 values and the R2 optimism values that are achieved for each of our set of models. Tables8 and9 show all the R2 and R2 optimism values for each model that we fitted using pull requests data of before and after continuous integration. These results suggest that our models are stable Chapter 4. Study Results 59 1.0 0.04 0.9 0.85 AUC AUC optimism AUC 0.009 0.006 0.0 0.2 0.4 0.6 0.8 CI NO-CI 0.00 0.01CI 0.02 0.03 NO-CI (a) AUC of the models (b) AUC optimism of the models Figure 18 – Distribution of the AUC and the AUC optimism of the models before and after CI. enough to perform the statistical inferences that we perform in RQ3. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 CI NO-CI CI NO-CI (a) Models − R2 (b) Models − R2optimism

Figure 19 – Distributions of models’ R2 and R2 optimism.

We are able to accurately estimate the delivery time in terms of number of days. The median R2 is 0.72 for the models fitted using pull request data of before continuous integration, while is 0.74 for the models fitted using data of after continuous inte- gration. Our explanatory models are quite stable with median bootstrap-calculated optimism less than 0.045. Chapter 4. Study Results 60

Table 8 – R2 and R2 optimism values for the linear models that we fitted using pull requests data of before continuous integration.

# Project R2 R2 optimism 1 yiisoft/yii 0.91 0.040 2 roots/sage 0.55 0.042 3 vanilla/vanilla 0.90 0.020 4 processing/p5.js 0.88 0.054 5 bokeh/bokeh 0.87 0.193 6 serverless/serverless 0.88 0.017 7 scikit-image/scikit-image 0.59 0.075 8 dropwizard/dropwizard 0.76 0.029 9 androidannotations/androidannotations 0.98 0.002 10 buildbot/buildbot 0.58 0.232 11 jsbin/jsbin 0.59 0.238 12 naver/pinpoint 0.53 0.046 13 siacs/Conversations 0.63 0.063 14 robolectric/robolectric 0.76 0.033 15 TelescopeJS/Telescope 0.51 0.096 16 andypetrella/spark-notebook 0.77 0.127 17 apache/incubator-airflow 0.80 0.119 18 haraka/Haraka 0.54 0.072 19 bundler/bundler 0.63 0.059 20 square/picasso 0.66 0.060 21 Netflix/Hystrix 0.91 0.027 22 dropwizard/metrics 0.64 0.077 23 jhipster/generator-jhipster 0.72 0.379 24 grails/grails-core 0.52 0.120 25 Pylons/pyramid 0.68 0.395 26 jashkenas/underscore 0.90 0.060 27 Leaflet/Leaflet 0.97 0.024 28 laravel/laravel 0.58 0.105 29 loomio/loomio 0.88 0.012 30 frappe/erpnext 0.73 0.029 31 Theano/Theano 0.70 0.028 32 chef/chef 0.55 0.060 33 woocommerce/woocommerce 0.52 0.039 34 AnalyticalGraphicsInc/cesium 0.88 0.004 35 mozilla/pdf.js 0.75 0.004 36 StackStorm/st2 0.90 0.011 37 TryGhost/Ghost 0.73 0.023 38 ansible/ansible 0.58 0.015 39 ipython/ipython 0.66 0.026 40 owncloud/core 0.63 0.003 41 mozilla-b2g/gaia 0.76 0.003

RQ5: What are the most influential attributes for modeling delivery time?

RQ5: Results for delivery time in terms of releases

The “release commits” is the most influential attribute to model the delivery time in terms of releases, both before and after continuous integration. Figures 20 and 21 use boxplots to show the distribution of explanatory power of each variable that we use in our logistic regression models before and after continuous integration. The higher the median of the explanatory power for a variable, the higher the influence that such variable has to predict if a pull request will be prevented to delivery in a Chapter 4. Study Results 61

Table 9 – R2 and R2 optimism values for the linear models that we fitted using pull requests data of after continuous integration.

# Project R2 R2 optimism 1 yiisoft/yii 0.61 0.026 2 bokeh/bokeh 0.60 0.009 3 serverless/serverless 0.96 0.003 4 craftyjs/Crafty 0.90 0.015 5 invoiceninja/invoiceninja 0.82 0.088 6 scikit-image/scikit-image 0.71 0.008 7 dropwizard/dropwizard 0.74 0.010 8 androidannotations/androidannotations 0.93 0.017 9 aframevr/aframe 0.81 0.723 10 jashkenas/backbone 0.81 0.007 11 openhab/openhab 0.93 0.007 12 bcit-ci/CodeIgniter 0.84 0.008 13 buildbot/buildbot 0.82 0.010 14 photonstorm/phaser 0.70 0.028 15 fchollet/keras 0.74 0.040 16 robolectric/robolectric 0.62 0.017 17 TelescopeJS/Telescope 0.97 0.003 18 andypetrella/spark-notebook 0.68 0.111 19 ReactiveX/RxJava 0.87 0.020 20 haraka/Haraka 0.77 0.022 21 bundler/bundler 0.53 0.040 22 humhub/humhub 0.86 0.032 23 square/picasso 0.85 0.053 24 dropwizard/metrics 0.85 0.019 25 refinery/refinerycms 0.85 0.127 26 gollum/gollum 0.61 16.788 27 jhipster/generator-jhipster 0.84 0.003 28 mapbox/mapbox-gl-js 0.92 0.025 29 jashkenas/underscore 0.74 0.040 30 apereo/cas 0.54 0.014 31 kivy/kivy 0.97 0.001 32 HabitRPG/habitica 0.66 0.077 33 pyrocms/pyrocms 0.61 0.017 34 BabylonJS/Babylon.js 0.87 0.006 35 Leaflet/Leaflet 0.54 0.029 36 callemall/material-ui 0.79 0.004 37 loomio/loomio 0.72 0.005 38 frappe/erpnext 0.76 0.006 39 Theano/Theano 0.82 0.002 40 puppetlabs/puppet 0.52 0.004 41 woocommerce/woocommerce 0.62 0.009 42 matplotlib/matplotlib 0.51 0.006 43 AnalyticalGraphicsInc/cesium 0.70 0.019 44 appcelerator/titanium_mobile 0.57 0.009 45 TryGhost/Ghost 0.55 0.010 46 ansible/ansible 0.60 0.009 47 ipython/ipython 0.76 0.002 48 owncloud/core 0.57 0.002 49 mozilla-b2g/gaia 0.51 0.001 50 saltstack/salt 0.57 0.001 upcoming release. Additionally, Figure 22 shows the relationship between the most influential variables of our models and the delivery time. The relationship between release commits and delivery time is shown in Figure 22a. We choose 3 out of the 54 models with high AUC values to plot the relationships. Nevertheless, we observe that the remaining models produce the same trend. The results indicate that the larger the number of commits that are performed to produce a release, the higher the Chapter 4. Study Results 62 development activities workload of the project in the release cycle, which may lead pull requests to be prevented to be delivered in the upcoming release. 60 80 100 % of explanatory power % of explanatory 0 20 40

churn activities comments queue rank merge time changed files release commits merge workload comments interval description length contributorstacktrace delivery attached contributor experience Figure 20 – Explanatory power of variables before adopting continuous integration (Delivery Time in terms of releases).

Table 10 – Descriptive metrics for the percentage of the explanatory power of each vari- able of our models, before and after the adoption of continuous integration (Delivery Time in terms of releases). Explanatory Variables CI NO-CI min mean median max min mean median max release commits 1.226 37.9 43.01 68.1 0.119 32.89 34.24 72.83 changed files 0.006 0.8 0.33 7.8 0.001 0.41 0.07 2.25 churn 0.000 0.8 0.49 8.8 0.079 2.38 0.54 12.90 comments 0.002 0.5 0.28 2.9 0.059 0.47 0.24 1.19 comments interval 0.006 0.5 0.04 3.2 0.000 0.13 0.13 0.26 merge workload 0.000 11.7 7.96 49.0 2.878 18.65 14.61 65.37 queue rank 1.955 36.0 37.95 66.6 2.570 31.42 33.88 58.17 description length 0.000 0.9 0.22 8.0 0.004 0.70 0.40 2.84 contributor experience 0.000 5.4 1.85 44.1 0.002 9.02 2.24 27.85 contributor delivery 0.007 9.2 1.56 72.9 0.301 9.29 5.32 39.67 stacktrace attached 0.001 0.3 0.06 2.1 0.004 0.58 0.09 3.04 activities 0.000 1.7 0.54 8.9 0.004 5.83 2.01 24.64 merge time 0.003 2.1 0.37 28.2 0.004 4.28 1.32 12.46

Figure 23 shows each explanatory variable and the number of models for which these variables are the most influential variable to predict whether a pull requests is Chapter 4. Study Results 63 60 80 100 % of explanatory power % of explanatory 0 20 40

churn activities comments queue rank merge time changed files release commits merge workload comments interval description length contributorstacktrace delivery attached contributor experience Figure 21 – Explanatory power of variables after adopting continuous integration (De- livery Time in terms of releases). likely to have its delivery delayed by one or more releases. Additionally, Table 10 shows some statistical descriptive metrics (min, mean, median and max) for the explanatory power of each variable of our models. Indeed, release commits is the most influential variable in 29 models (6 before continuous integration and 23 after continuous inte- gration). Therefore, in median 34.24% to 43.01% of the delivery time of pull requests in terms of releases are explained by the number of commits that are performed to produce a release. Our results suggest that the moment at which a pull request is merged with re- spect to other merged pull requests in the release cycle, i.e, queue rank, is the second most influential variable in our models, both before and after continuous integra- tion. In median, 33.88% and 37.95% of the delivery time of merged pull requests in terms of releases, before and after continuous integration, respectively, may be ex- plained by the queue rank variable. Figure 22b shows the relationship that queue rank shares with delivery time. Our models reveal that merged pull requests is more likely to be prevented from delivery in a upcoming release when they are merged late when compared to other pull requests in the release cycle. We also observe that “merge workload" have a strong relationship with de- livery time. Merge workload is the third most influential variable to model delivery time of pull requests (median explanatory power of 7.96%-14.61%), both before and Chapter 4. Study Results 64

ReactiveX/RxJava naver/pinpoint 1.00 1.00

0.75 0.75

0.50 0.50

Log odds of delay 0.25 Log odds of delay 0.25

0.00 0.00 0 5000 10000 15000 0 1000 2000 3000 4000 5000 release commits queue rank (a) (b)

yiisoft/yii 1.00

0.75

0.50

Log odds of delay 0.25

0.00 0 500 1000 1500 merge workload (c)

Figure 22 – The relationship between the most influential variables and delivery time in terms of releases. after continuous integration. Our models show that the greater the number of pull requests competing to being merged in the moment that a pull request is submitted, the higher the probability of such a pull request being delayed in one or more releases (Figure 22c). The variables comments, comments interval, churn, changed files and stacktrace attached have little influence on delivery time in terms of releases, both before and after continuous integration. The median explanatory power of these variables are between 0.4% and 0.54%. Our results suggest that the amount of code change into a pull request has little influence to model delivery time. On the other hand, our models show that the intensity of commits required to produce a release, as well as the variables of the project family (e.g., merge workload and delivery workload) has a strong influence on modeling delivery time in terms of releases, both before and after continuous integration. Chapter 4. Study Results 65

23

CI 20 NO CI

15

10

Number of models 6 6 5 5 4 3 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

churn activities comments merge time queue rank changed files merge workload description length release commits commentscontributor interval delivery stacktrace attached contributor experience

Figure 23 – The number of models per most influential variables (Delivery Time in terms of releases).

Our models suggest that the number of commits performed to produce a release is the most influential factor to model the delivery time of merged pull requests in terms of releases, both before and after continuous integration. Additionally, our models show that “queue rank" and “merge workload" also have a strong impact to predict whether a pull request is likely to being prevented to delivery in one or more releases.

RQ5: Results for delivery time in terms of days

The “release commits” is the most influential attribute to model the delivery time of merged pull requests in terms of days. Figures 24 and 25 show the distributions of explanatory power of each variable that we use in our models before and after the adoption of continuous integration. The higher the median of the explanatory power for a variable, the higher the influence that such a variable has on the delivery time of pull requests. Similar to the results for delivery time in terms of releases, we observe that release commits has the largest influence on our models to explain delivery time in terms of days before and after the adoption of continuous integration. This result might indicate that the greater the number of commits that are performed to produce a release, the higher might be the integration-load of that release cycle, which may Chapter 4. Study Results 66 increase the delivery time of pull requests. 60 80 100 % of explanatory power % of explanatory 0 20 40

churn activities comments queue rank merge time changed files release commits merge workload comments interval description length contributorstacktrace delivery attached contributor experience

Figure 24 – Explanatory power of variables before adopting continuous integration (Delivery Time in terms of days). 60 80 100 % of explanatory power % of explanatory 0 20 40

churn activities comments queue rank merge time changed files release commits merge workload comments interval description length contributorstacktrace delivery attached contributor experience

Figure 25 – Explanatory power of variables after adopting continuous integration (De- livery Time in terms of days). Chapter 4. Study Results 67

Figure 26 shows each explanatory variable and the number of models for which these variables are the most influential. Moreover, Table 11 shows some statistical descriptive metrics (min, mean, median and max) for the explanatory power of each variable of our models. Indeed, release commits is the most influential variable in 55 models (31 before continuous integration and 24 after continuous integration). Figure 27 shows the relationship that the most influential variables of our models share with delivery time. The relationship between release commits and delivery time is shown in Figure 27a. We choose 4 out of the 91 models with the higher R2 to plot the relationships. Nevertheless, the rest of our models produce the same trend.

Table 11 – Descriptive metrics for the percentage of the explanatory power of each vari- able of our models, before and after the adoption of continuous integration (Delivery Time in terms of days). Explanatory Variables CI NO-CI min mean median max min mean median max release commits 0.015 44.57 45.44 96.91 0.014 48.23 49.29 99.29 changed files 0.000 0.88 0.06 11.05 0.000 1.76 0.25 40.11 churn 0.000 0.52 0.08 9.28 0.000 2.20 0.11 53.55 comments 0.000 0.35 0.10 3.18 0.000 1.46 0.16 21.86 comments interval 0.000 1.39 0.03 19.32 0.000 0.54 0.06 4.25 merge workload 0.000 10.26 2.87 96.80 0.004 17.66 7.60 92.62 queue rank 0.001 34.67 33.69 97.88 0.019 28.83 25.16 98.26 description length 0.000 1.30 0.13 43.69 0.000 0.99 0.22 15.97 contributor experience 0.000 3.89 0.65 37.50 0.001 4.05 1.07 42.73 contributor delivery 0.000 7.76 0.75 82.62 0.000 7.74 1.49 59.43 stacktrace attached 0.000 0.22 0.02 2.76 0.000 0.46 0.03 2.68 activities 0.001 1.28 0.25 16.26 0.000 6.13 0.94 49.56 merge time 0.000 3.96 0.12 51.46 0.000 3.93 0.40 62.90

The queue rank variable is the second most influential variable in our models for both data before and after the adoption of continuous integration. Queue rank is the moment at which a pull request is merged with respect to other merged pull requests in the project backlog. Figure 27b shows the relationship that queue rank shares with delivery time. Our models reveal that merged pull requests have a higher delivery time when they are merged more recently when compared to other pull requests in the release cycle. The third and fourth most influential metrics are merge workload and contributor delivery, respectively. The higher the merge workload of the project, the higher the delivery time of their pull requests (Figure 27c). Our models also reveal that if an integrator has his/her prior submitted pull requests delivered quickly, his/her next submitted pull requests tend to be delivered more quickly (Figure 27d). Chapter 4. Study Results 68

31 30 CI NO CI 24

20

15

10 Number of models 8 7

3 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

churn activities comments merge time queue rank changed files merge workloadrelease commits comments interval description length contributor delivery stacktrace attached contributor experience

Figure 26 – The number of models per most influential variables (Delivery Time in terms of days).

Our models suggest that “release commits” is the most influential variable to model the delivery time of merged pull requests in terms of days, before and after the adoption of continuous integration. Additionally, our models show that “queue rank" and “merge workload" also have a strong impact on the time to deliver merged pull requests. Chapter 4. Study Results 69

TelescopeJS/Telescope kivy/kivy

0 4000

-2000 2000 delivery time delivery time -4000

0

-6000 0 5000 10000 15000 0 1000 2000 3000 4000 5000 release commits queue rank (a) (b)

bcit-ci/CodeIgniter HabitRPG/habitica 10000 1000

7500 750

5000 500

2500 delivery time delivery time

250 0

0 -2500 0 500 1000 1500 0 500 1000 1500 merge workload contributor delivery (c) (d)

Figure 27 – The relationship between the most influential variables and delivery time in terms of days.

4.2 Analysis II — What is the impact of continuous in- tegration on the prolonged delivery time?

RQ6: How well can we identify the merged pull requests that will suffer from a long delivery time?

The models that we fitted using data of pull requests that were submitted be- fore continuous integration obtained a median Brier score of 0.043, while obtained 0.091 when fitted using pull requests data of after continuous integration. Further- more, the median bootstrap calculated-optimism for the Brier scores is 0.006 to the models that use data of before continuous integration, while is 0.003 to the models that were fitted with data of after continuous integration. Figure 28 shows the distribution of the Brier Score values and the bootstrap calculated-optimism of the Brier score of our models, before and after continuous integration. Finally, Tables 12 and 13 show all Chapter 4. Study Results 70 the Brier Scores and the AUC values of each model that we fitted.

0.091 0.006 Brier Score

0.003

0.043 optimism Brier Score 0.05 0.10 0.15

CI NO-CI 0.000 0.002 0.004 0.006 0.008 0.010CI 0.012 NO-CI (a) Brier score of the models (b) Brier score optimism of the models Figure 28 – Distribution of the Brier Score and the Brier optimism of the models before and after CI.

Table 12 – Brier Score and AUC values for the models that we fitted using pull requests data of before the adoption of continuous integration. # Project Brier Score Brier Optimism AUC AUC Optimism 1 serverless/serverless 0.021 0.008 0.997 0.003 2 ReactiveX/RxJava 0.163 0.007 0.822 0.013 3 refinery/refinerycms 0.129 0.009 0.856 0.018 4 alohaeditor/Aloha-Editor 0.025 0.006 0.983 0.004 5 BabylonJS/Babylon.js 0.043 0.012 0.987 0.008 6 zurb/foundation-sites 0.113 0.007 0.887 0.015 7 frappe/erpnext 0.018 0.006 0.994 0.002 8 mozilla/pdf.js 0.020 0.003 0.994 0.001 9 appcelerator/titanium_mobile 0.163 0.001 0.740 0.002 10 ansible/ansible 0.092 0.002 0.911 0.004 11 ipython/ipython 0.068 0.008 0.968 0.007 12 owncloud/core 0.036 0.001 0.962 0.001 13 mozilla-b2g/gaia 0.031 0.001 0.990 0.001

Our models obtained a median AUC of 0.92 when fitted using pull requests data of after continuous integration, while obtained 0.97 using data of before con- tinuous integration. Additionally, the median bootstrap-calculated optimism is 0.005 for the AUC values of the models that we fitted using pull request data of after continu- ous integration, whereas is 0.004 for the models that were fitted using pull request data of before continuous integration. Figure 29 shows the distribution of the AUC and the bootstrap-calculated optimism for the AUC values of the models. Such results suggest that our logistic regression models hugely outperform naive models, such as random guessing (AUC value of 0.50), and that the models are stable enough to perform our statistical inferences. Chapter 4. Study Results 71

Table 13 – Brier Score and AUC values for the models that we fitted using pull requests data of after the adoption of continuous integration. # Project Brier Score Brier Optimism AUC AUC Optimism 1 yiisoft/yii 0.064 0.006 0.963 0.007 2 vanilla/vanilla 0.125 0.005 0.850 0.017 3 bokeh/bokeh 0.099 0.003 0.915 0.005 4 scikit-image/scikit-image 0.081 0.005 0.930 0.008 5 jashkenas/backbone 0.064 0.007 0.964 0.008 6 sensu/sensu 0.161 0.010 0.799 0.024 7 jsbin/jsbin 0.180 0.009 0.698 0.026 8 naver/pinpoint 0.086 0.003 0.923 0.005 9 photonstorm/phaser 0.072 0.009 0.947 0.013 10 robolectric/robolectric 0.041 0.004 0.969 0.006 11 ReactiveX/RxJava 0.065 0.006 0.967 0.007 12 haraka/Haraka 0.010 0.004 0.998 0.003 13 jhipster/generator-jhipster 0.063 0.003 0.966 0.003 14 boto/boto 0.157 0.008 0.785 0.020 15 Pylons/pyramid 0.099 0.004 0.876 0.006 16 ether/etherpad-lite 0.144 0.008 0.810 0.020 17 getsentry/sentry 0.013 0.002 0.998 0.001 18 hapijs/hapi 0.168 0.009 0.806 0.019 19 HabitRPG/habitica 0.042 0.003 0.983 0.004 20 pyrocms/pyrocms 0.106 0.007 0.911 0.010 21 BabylonJS/Babylon.js 0.056 0.007 0.965 0.010 22 Leaflet/Leaflet 0.043 0.005 0.979 0.005 23 laravel/laravel 0.138 0.006 0.849 0.010 24 zurb/foundation-sites 0.169 0.004 0.734 0.012 25 callemall/material-ui 0.033 0.003 0.990 0.002 26 loomio/loomio 0.064 0.002 0.973 0.002 27 scikit-learn/scikit-learn 0.117 0.003 0.716 0.016 28 frappe/erpnext 0.056 0.002 0.952 0.002 29 Theano/Theano 0.038 0.002 0.981 0.002 30 puppetlabs/puppet 0.091 0.001 0.903 0.002 31 chef/chef 0.129 0.003 0.814 0.007 32 woocommerce/woocommerce 0.041 0.003 0.985 0.004 33 divio/django-cms 0.117 0.003 0.867 0.007 34 scipy/scipy 0.110 0.002 0.768 0.011 35 matplotlib/matplotlib 0.091 0.002 0.903 0.003 36 sympy/sympy 0.148 0.002 0.748 0.006 37 twbs/bootstrap 0.107 0.002 0.872 0.004 38 elastic/kibana 0.085 0.002 0.922 0.005 39 mozilla/pdf.js 0.129 0.002 0.845 0.005 40 appcelerator/titanium_mobile 0.034 0.003 0.987 0.004 41 StackStorm/st2 0.135 0.002 0.781 0.012 42 TryGhost/Ghost 0.069 0.002 0.943 0.003 43 fog/fog 0.106 0.003 0.881 0.006 44 ipython/ipython 0.019 0.001 0.997 0.000 45 cakephp/cakephp 0.158 0.001 0.837 0.002 46 owncloud/core 0.074 0.001 0.950 0.001 47 rails/rails 0.151 0.000 0.788 0.001 48 mozilla-b2g/gaia 0.107 0.000 0.921 0.000 49 saltstack/salt 0.082 0.000 0.932 0.000 Chapter 4. Study Results 72

0.97 0.92 0.9 1.0 AUC AUC optimism AUC 0.005 0.004 0.5 0.6 0.7 0.8 0.000 0.005 0.010 0.015 0.020 0.025 CI NO-CI CI NO-CI (a) AUC of the models (b) AUC optimism of the models Figure 29 – Distribution of the AUC and the AUC optimism of the models before and after CI.

Our models are able to accurately identify whether a merged pull request is likely to have a long delivery time in a given project. The median AUC values of our models is 0.92 when using pull request data of after continuous integration, while is 0.97 when fitted using pull requests data of before continuous integration. Chapter 4. Study Results 73

RQ7: What are the most influential attributes for identifying the pull requests that will suffer from a long delivery time?

Long delivery time is most consistently associated with the “release commits" and with attributes of the project family, e.g., “queue rank" and “merge workload", both before and after continuous integration. Figures 30 and 31 show the explanatory power (χ2) of each variable that we use in our models, before and after continuous integration, respectively. We observe that the required number of commits to produce a release is the most influential variable to identify whether a merged pull request will suffer from a long delivery time, moreover queue rank and merge workload also have a strong influence to model a prolonged delay, both before and after continuous integration. Figure 32 shows the number of models for which each variable that we use is the most influential. Indeed, release commits is the most influential variable in 30 models (4 before continuous integration and 26 after continuous integration). Our models show that the contributor experience and the velocity at which contributors have their previous submitted pull requests released (e.g., contributor delivery) also have a small influence to identify prolonged delivery time, both before and after continuous integration. Table 14 shows the descriptive statistics metrics for the explanatory power of each variable of our models, before and after continuous inte- gration. The average explanatory power for the contributor experience and contributor delivery variables are, respectively, 9.02% and 9.29%, for the models fitted with data of before continuous integration, while are 5.4% and 9.2% for the models that were fitted using data of after continuous integration.

Table 14 – Descriptive metrics for the percentage of the explanatory power of each vari- able of our models, before and after the adoption of continuous integration (Prolonged delivery time analysis). Explanatory Variables CI NO-CI min mean median max min mean median max release commits 1.226 37.9 43.01 68.1 0.119 32.89 34.24 72.83 changed files 0.006 0.8 0.33 7.8 0.001 0.41 0.07 2.25 churn 0.000 0.8 0.49 8.8 0.079 2.38 0.54 12.90 comments 0.002 0.5 0.28 2.9 0.059 0.47 0.24 1.19 comments interval 0.006 0.5 0.04 3.2 0.000 0.13 0.13 0.26 merge workload 0.000 11.7 7.96 49.0 2.878 18.65 14.61 65.37 queue rank 1.955 36.0 37.95 66.6 2.570 31.42 33.88 58.17 description length 0.000 0.9 0.22 8.0 0.004 0.70 0.40 2.84 contributor experience 0.000 5.4 1.85 44.1 0.002 9.02 2.24 27.85 contributor delivery 0.007 9.2 1.56 72.9 0.301 9.29 5.32 39.67 stacktrace attached 0.001 0.3 0.06 2.1 0.004 0.58 0.09 3.04 activities 0.000 1.7 0.54 8.9 0.004 5.83 2.01 24.64 merge time 0.003 2.1 0.37 28.2 0.004 4.28 1.32 12.46 Chapter 4. Study Results 74 60 80 100 % of explanatory power % of explanatory 0 20 40

churn activities comments queue rank merge time changed files release commits merge workload comments interval description length contributorstacktrace delivery attached contributor experience

Figure 30 – Explanatory power of variables before adopting continuous integration (Prolonged delivery time analysis). 60 80 100 % of explanatory power % of explanatory 0 20 40

churn activities comments queue rank merge time changed files release commits merge workload comments interval description length contributorstacktrace delivery attached contributor experience

Figure 31 – Explanatory power of variables after adopting continuous integration (Pro- longed delivery time analysis). Chapter 4. Study Results 75

26

CI NO CI 20

14

10 Number of models

5 4 4 4 2 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

churn activities comments merge time queue rank changed files merge workloadrelease commits comments interval description length contributor delivery stacktrace attached contributor experience

Figure 32 – The number of models per most influential variables (Prolonged delivery time analysis).

Our explanatory models suggest that long delivery time is more closely associated with the required number of commits to produce a release, and with project characteristics, such as the queue rank and merge workload. Moreover, the contributor experience and contributor delivery variables also play an influential role on identifying a long integration time, both before and after continuous integration.

4.3 Threats to the Validity

In this section, we discuss the threats to the validity of our study.

Construct Validity

The construct threats to validity are concerned with errors caused by the meth- ods that we use to collect our data. We use the GitHub API to develop tools to collect our data. We also develop tools to link pull requests to their respective releases. Bugs in these tools may influence our results. However, we use subsamples of the studied projects to carefully assess our tools’ outcomes, which produced consistent results. Chapter 4. Study Results 76

Internal Validity

Internal threats are concerned with the ability to draw conclusions from the relationship between the dependent variable (the delivery time of merged pull requests) and independent variables (e.g., release commits and queue rank). The method that we use to link pull requests to releases may not match the actual number of delivered pull requests per release. For instance, if a version control system of a project have the following release tags v1.0, v2.0, no-ver and v3.0, we remove the no-ver tag. If there are pull requests associated with the no-ver release, such pull requests will be associated to the release v3.0. However, only 5.37% (485/9035) of our studied releases are included in this case. Also, the way that we segment the response variable (Y) of our Logistic Re- gression models is subject to bias. We use an approach similar as used by Costa et al. (COSTA et al., 2017) to categorize if a pull request has a prolonged delivery time or not. We use at least one MAD above the median delivery time as a threshold to define prolonged delivery time (Definition 3). Although we found it to be a reasonable classification, one could use a different threshold, which may yield different results. With respect to our explanatory models, the predictors that we use in our models are not exhaustive. Based on the studies of Costa et al. (COSTA et al., 2014; COSTA et al., 2016), we choose a starting set of variables families that may share a relationship with delivery time of merged pull requests, and that can be easily computed through the GitHub API. Although our Logistic and Linear Regression models achieve sound AUC and R2 values, other variables may be used to improve performance (e.g., a boolean indicating whether a pull requests is associated with an issue report and another boolean that verifies whether a pull request was submitted by a core developer or an external contributor). Nevertheless, our set of predictors should be approached as a preliminary set that can be easily computed rather than a final solution. Finally, the main limitation of our explanatory models (i.e., Logistic and Linear Regression models) is regarding to their causal relationship limitations. When using Logistic and Linear Regression models, we cannot claim a causal relationship between our explanatory variables (i.e., the attributes described in Tables4 and5) and delivery time. Alternatively, we draw conclusions based on associations between delivery time and the average behavior of such explanatory variables.

External Validity

External threats are concerned with the extent to which we can generalize our results (PERRY; PORTER; VOTTA, 2000). In this dissertation we analyzed 167,037 Chapter 4. Study Results 77 pull requests of 90 popular open source projects from GitHub that have moved to continuous integration. All projects adopt the most popular continuous integration server on GitHub, i.e., Travis-CI. However, we do not have information about the extent to which projects that use self-hosted CI servers like Jenkins affect the delivery time of pull requests to their users. Since we study public GitHub projects only, we cannot guarantee that the re- sults are also applicable to projects from private companies. In such cases, results may deviate significantly. Nevertheless, we study projects from the five most popular pro- gramming languages on GitHub (JavaScript, Python, Java, Ruby and PHP). Our investi- gation covers projects from different sizes and domains, and that use statically-typed and dynamic-typed languages. Future work should replicate this study increasing the number of programing languages and projects. For replication purpose, we publicize our datasets and results to the interested researcher. 1 We use a small sample of projects compared to the size of GitHub. However, we choose projects with sufficient data to make a fair comparison on the impact of the usage of continuous integration on the delivery time of merged pull requests. The data that we gather is able to fit sound models and to have significant results. We believe that our results might be representative to projects with similar domain and characteristics as used by the studied projects, but replication is the best way to verify such as assumption.

1 78

5

Conclusion

Pull-based development is a paradigm broadly used by contributors of open source projects to develop software in a distributed and collaborative way by sending pull requests. A pull request may fix bugs, provide enhancements or new functionalities. The basic life cycle of a pull request is comprised of three main steps. First, a contributor submit a pull request to a software project. Next, the integrators of such a project merge the pull request into its code base. Finally, the merged pull request is delivered to the end users of the software system through an official software release. However, merged pull request may suffer undesirable delays before being released (e.g., delivery time). In addition, a higher delivery time may cause software project to lose their users and the interest of their contributors, given the increasingly competitive market of software development, which requires software projects to deliver changes to their users at a faster pace to improve the time-to-market and customers’ satisfaction. In this dissertation, we performed an empirical study that investigates the im- pact of adopting continuous integration on the delivery time of merged pull requests. We use 167,037 pull requests of 90 GitHub projects to explore the impact of continuous integration on the delivery time of pull requests, and to analyze the impact of continu- ous integration on the prolonged delivery time. In the remainder of this chapter, we described the contributions of this dissertation in Section 5.1, while we discuss the related work in Section 5.2. Finally, we discuss possibilities of future work in Section 5.3.

5.1 Dissertation Contributions

The main goal of this dissertation was to understand the impact of adopting continuous integration on the required time to deliver merged pull request to the end users of a software project. We answer the questions that guided the analyses performed in our study in the following.

• Analysis I — What is the impact of continuous integration on the delivery time of pull requests? Delivery time is frequent in the studied projects, both Chapter 5. Conclusion 79

before and after continuous integration. In median, 13.8% of the merged pull requests per project are prevented from delivery in at least one release before continuous integration, while 24% missed at least one release after continuous integration. Furthermore, we find that many pull requests that missed at least one release were merged well before the release date of the missed releases. In most projects (53%), the time from submission to delivery of pull requests (i.e., pull request lifetime) is shorter before the adoption of continuous integration. We also observe that projects tend to merge pull requests early before continuous integration. On the other hand, one possible reason for the faster delivery of pull requests before continuous integration might be related to the large increase on the number of pull request submissions after the adoption of continuous integration. 77.8% of the projects that adopt continuous integration increase the rate of pull request submissions. Finally, we find that the number of commits performed to produce a release is the most influential factor to estimate delivery time of merged pull requests, both before and after continuous integration. Also, the time at which a pull request is merged (i.e., queue rank) and the amount of pull requests competing for being merged (i.e., merge workload) also have a strong impact on estimating the delivery time in terms of days and releases.

• Analysis II — What is the impact of continuous integration on the prolonged delivery time? In median, 24% of the pull requests of the investigated projects have a prolonged delivery time. We are able to accurately identify merged pull requests that have a prolonged delivery time, both before and after continuous integration. Our explanatory models obtained sound median AUC values of 0.92 to 0.97. Prolonged delivery time is more closely associated with the required number of commits to produce a release, and with project characteristics, such as the queue rank and merge workload. Moreover, the contributor experience and contributor delivery variables also play an influential role on identifying a prolong delivery time, both before and after continuous integration.

Open source projects that plan adopt continuous integration should be aware that the adoption of continuous integration will not necessarily deliver pull requests more quickly. On the other hand, as the pull-based development can attract the inter- est of external contributors, and hence, increase the projects workload, continuous integration may help in other aspects, e.g., delivering more functionalities to end users (see RQ3). Chapter 5. Conclusion 80

5.2 Related Work

In this section, we survey related research work that analyze the impact of adopting continuous integration in open source projects. Despite the wide adoption of Agile Release Engineering (ARE) practices (i.e., continuous integration, rapid releases, continuous delivery and continuous deploy- ment), there is still a lack of empirical studies that investigate the impact that these practices have on the software development activities, i.e., in terms of productivity and quality. Through a systematic literature review, Karvonen et al. (KARVONEN et al., 2017) analyzed 619 papers and selected 71 primary studies that are related to ARE practices. They found that only 8 out of the 71 primary studies empirically investigate continuous integration. Theses studies use diverse approaches for data collection: (i) 5 studies (DEBBICHE; DIENÉR; SVENSSON, 2014; DOWNS; HOSKING; PLIMMER, 2010; FERREIRA; COHEN, 2008; KNAUSS et al., 2015; STÅHL; BOSCH, 2013) use surveys and/or interviews to perform their analyses; (ii) 2 studies (VASILESCU et al., 2014; DESHPANDE; RIEHLE, 2008) use data from error reports and log files to perform their analyses; and (iii) 1 study (STÅHL; BOSCH, 2014a) uses both i and ii approaches for data collection. This systematic literature reviews highlights that empirical research in this field is highly necessary to better understand the impact of adopting continuous integration on software development. Hilton et al. (HILTON et al., 2016) analyzed 34,544 open source projects from GitHub and surveyed 442 developers. The authors found that 70% of the most popular GitHub projects use continuous integration and that the percentage of projects that use continuous integration is growing. They also found that continuous integration helps projects to release more oftenly and that the continuous integration build status may lead to a faster integration of pull requests. Their results show that projects before adopt continuous integration used to release at a rate of 0.34 releases per month, well below the 0.54 rate at which they release after the adoption of continuous integration. In contrast with their results, we do not observe a significant difference when comparing the median duration of the release cycles of each studied project before and after continuous integration. Our studied projects ship new releases in a time interval of 35.7 days (median) before continuous integration, while taken 33.1 days to ship new releases after continuous integration. One factor that might contribute to such a divergence on our results when analyzing the release frequency of the projects may be that we only study user-intended releases, so that we do not consider pre, beta, alpha, rc (release candidate) releases in our analyses. Also, the difference on the number and characteristics of the studied projects may deviate our results significantly. Beller et al. (BELLER; GOUSIOS; ZAIDMAN, 2016) perform an analysis of con- Chapter 5. Conclusion 81 tinuous integration builds using GitHub. The authors investigated 2,640,825 Java and Ruby builds from Travis-CI and found that testing is the most important reason as to why builds fail. The results also show that the majority of builds trigger at least one test. Only 20% of the investigated projects did not include the test phase in their continu- ous integration process. In contrast to the above-mentioned works, our dissertation examines 167,037 pull requests of 90 GitHub projects that have adopted continuous integration at some point of their life span. Our focus is to study the necessary time for merged pull requests to be delivered to the end-users before and after the adoption of continuous integration. Vasilescu et al. (VASILESCU et al., 2014) studied the usage of Travis-CI in a sample of 223 GitHub projects that are written in Ruby, Python and Java. They found that the majority of projects (92.3%) are configured to use Travis-CI, but less than half actually use it. In a follow up research, Vasilescu et al. (VASILESCU et al., 2015) investigated the productivity and quality of 246 GitHub projects that use continuous integration. They found that projects that use continuous integration merge pull requests more quickly when they are submitted by core developers. Also, core developers find significantly more bugs when using continuous integration. We use a similar approach to the one that is used by Vasilescu et al. (VASILESCU et al., 2015) to identify projects that use Travis-CI. We also analyze the merge time of pull requests and find that the majority of the studied projects merge pull requests more quickly before continuous integration. In addition, we also observe that the number of merged pull requests per release is higher after adopting continuous integration for most of the projects. Regarding the factors that affect the acceptance and latency of pull requests in the context of continuous integration, Yu et al. (YU et al., 2016) used regression models in a sample of 40 GitHub projects that use Travis-CI. The authors found that the likelihood of rejection of a pull request increases to 89,6% when the pull request breaks the build. The results also show that the more succinct a pull request is, the greater the probability that such a pull request is reviewed and merged earlier. We complement the prior work by analyzing the most influential factors that impact the delivery time of merged pull requests before and after the adoption of continuous integration. Other research work has studied the delivery time of new features, enhance- ments, and bug fixes (COSTA et al., 2014; COSTA et al., 2016; CHOETKIERTIKUL et al., 2015; COSTA et al., 2017; CHOETKIERTIKUL et al., 2017). Costa et al. (COSTA et al., 2014; COSTA et al., 2017) mined data from the VCSs and ITSs of the Firefox, ArgoUML and Eclipse projects to investigate how frequent is delivery time of fixed issues in such projects. They found that delivery time of addressed issues are frequent in their subject projects (e.g, 34% to 98% of addressed issues were delayed by at least one release). Chapter 5. Conclusion 82

They also observe that 13%, 12%, and 22% of the fixed issues of the Eclipse, Firefox, and ArgoUML projects have a long delivery time, respectively. In contrast, we study delivery time and prolonged delivery time of merged pull requests in a set of 90 GitHub projects. Also, we observe that despite pull requests being merged well before an up- coming release, in median 13.8% of such merged pull requests are delayed in one or more releases before the adoption of continuous integration, while 24% are delayed in at least one release after continuous integration. Furthermore, we observe that in median, 24% of the pull requests of the investigated projects have a prolonged delivery time. In a follow up research, Costa et al. (COSTA et al., 2016) investigated the impact of switching from traditional releases to rapid releases on the delivery time of fixed issues of the Firefox project. They used predictive models to discover which factors significantly impact the delivery time of issues in each release strategy. Differently from prior work, our study focuses on the impact of adopting continuous integration on the time-to-delivery of merged pull requests.

5.3 Future Work

This dissertation contributes to reduce the lack of empirical understanding of the impact of the adoption of continuous integration on the time-to-delivery of merged pull requests. However, more research work is necessary to better understand and improve the activities of integrating and delivering pull requests. We outline some venues for future work below.

Replication. Future work could replicate the analyses that are performed in this disser- tation using additional projects and programming languages. For instance, one could perform a cross-programming language analysis to investigate whether the factors that most impact delivery time of pull requests change depending on the programming language. Furthermore, replications of this study using private projects is necessary (i.e., one must study the delivery time of pull requests from private initiative rather than open source projects). Such replication studies are important to achieve more generalizable conclusions regarding delivery time of merged pull requests. For replication purposes, we publicize our datasets to the interested researcher. 1 Tooling. Further research on the field can build tools. For example, Issue Tracker Systems could tagging submitted pull requests that is going to have a higher delay. Also, Issues Tracker System could tagged pull requests that are going to be delayed. Such tools can be used to help developers and project managers to 1 Chapter 5. Conclusion 83

be aware of the estimated time-to-delivery of new pull requests based on their characteristics. Software Quality. To study the trade-off between shorter delivery time of pull requests and software quality, future work could empirically investigate whether pull requests that are merged and delivered quickly also have a high quality, i.e., in terms of bugs. Prediction. According to Costa (COSTA, 2017), researchers of software engineering have invested a considerable effort in the prediction of bugs, so that their pre- dictions may help to avoid unwanted costs. Another possibility of future work is to build accurate prediction models of delivery time of pull requests, which could help project managers and developers to better planning their activities. Qualitative Study. In our analysis, we quantitatively study the impact of adopting continuous integration on the required time to deliver pull request (i.e, delivery time) to end users of software systems. We only perform statistical analysis based on the data available on the GitHub and Travis-CI of our subject projects. We found that projects after adopting continuous integration deliver almost 3 times more pull requests per release than before adopting continuous integration. However, to reach deeper understanding as to why projects deliver significantly more pull requests after adopting continuous integration, we could perform a qualitative study: (i) to reach insights from developers of projects that use continuous integration, which could not be possible by only performing quanti- tative analysis; (ii) to verify whether developers agree with our results from the quantitative analysis. In a prior quantitative study, Costa et al. (COSTA et al., 2014) identified that 98% of the Firefox project issues had their delivery postponed by at least one release. In a follow up qualitative research, Costa et al. (COSTA DANIEL ; MCINTOSH, 2017; COSTA, 2017) found that reasons to delivery time of addressed issues are related to decision making, team collaboration, and risk management. 84 Bibliography

BASKERVILLE, R.; PRIES-HEJE, J. Short cycle time systems development. Information Systems Journal, Wiley Online Library, v. 14, n. 3, p. 237–264, 2004. Cited on page 15. BECK, K. Extreme Programming Explained: Embrace Change. [S.l.]: Addison-Wesley Professional, 2000. Cited on page 14. BELLER, M.; GOUSIOS, G.; ZAIDMAN, A. Oops, my tests broke the build: An analysis of travis CI builds with github. PeerJ PrePrints, v. 4, p. e1984, 2016. Cited 3 times on pages 16, 24, and 80. CHEN, J.; REILLY, R. R.; LYNN, G. S. The impacts of speed-to-market on new product success: the moderating effects of uncertainty. IEEE Trans. Eng. Manage., IEEE, v. 52, n. 2, p. 199–212, 2005. Cited on page 14. CHOETKIERTIKUL, M. et al. Predicting delays in software projects using networked classification (t). In: IEEE. Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. [S.l.], 2015. p. 353–364. Cited 2 times on pages 16 and 81. CHOETKIERTIKUL, M. et al. Predicting the delay of issues with due dates in software projects. Empirical Software Engineering Journal, p. 1–41, 2017. Cited 2 times on pages 16 and 81. CLIFF, N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, American Psychological Association, v. 114, n. 3, p. 494, 1993. Cited 2 times on pages 29 and 30. COSTA, D. A. da. Understanding the delivery delay of addressed issues in large software projects. Tese (Doutorado) — Federal University of Rio Grande do Norte, Natal, 2 2017. Cited on page 83. COSTA, D. A. da et al. An empirical study of delays in the integration of addressed issues. In: IEEE. Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. [S.l.], 2014. p. 281–290. Cited 4 times on pages 16, 76, 81, and 83. COSTA, D. A. da et al. The impact of switching to a rapid release cycle on the integration delay of addressed issues: An empirical study of the mozilla firefox project. In: Proceedings of the 13th International Conference on Mining Software Repositories. New York, NY, USA: ACM, 2016. (MSR ’16), p. 374–385. Cited 9 times on pages 15, 16, 32, 45, 47, 48, 76, 81, and 82. COSTA, D. A. da et al. An empirical study of the integration time of fixed issues. Empirical Software Engineering, p. toAppear, 2017. Cited 11 times on pages 18, 25, 28, 32, 33, 35, 37, 50, 51, 76, and 81. COSTA DANIEL ; MCINTOSH, S. . T. C. . K. U. . H. A. E. A. D. The impact of rapid release cycles on the integration delay of fixed issues. Journal of Empirical Software Engineering, p. toAppear, 2017. Cited on page 83. Bibliography 85

CROWSTON, K.; ANNABI, H.; HOWISON, J. Defining open source software project success. ICIS 2003 Proceedings, p. 28, 2003. Cited on page 14. DAYTON, C. M. Logistic regression analysis. Stat, p. 474–574, 1992. Cited on page 31. DEBBICHE, A.; DIENÉR, M.; SVENSSON, R. B. Challenges when adopting continuous integration: A case study. In: SPRINGER. International Conference on Product-Focused Software Process Improvement. [S.l.], 2014. p. 17–32. Cited 2 times on pages 14 and 80. DESHPANDE, A.; RIEHLE, D. Continuous integration in open source software development. In: SPRINGER. IFIP International Conference on Open Source Systems. [S.l.], 2008. p. 273–280. Cited on page 80. DOWNS, J.; HOSKING, J.; PLIMMER, B. Status communication in agile software teams: A case study. In: IEEE. Software Engineering Advances (ICSEA), 2010 Fifth International Conference on. [S.l.], 2010. p. 82–87. Cited on page 80. DUVALL, P.; MATYAS, S. M.; GLOVER, A. Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series). [S.l.]: Addison-Wesley Professional, 2007. Cited 3 times on pages 14, 24, and 29. EFRON, B. How biased is the apparent error rate of a prediction rule? Journal of the American statistical Association, Taylor & Francis, v. 81, n. 394, p. 461–470, 1986. Cited 3 times on pages 25, 32, and 35. FERREIRA, C.; COHEN, J. Agile systems development and stakeholder satisfaction: a south african empirical study. In: ACM. Proceedings of the 2008 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries: riding the wave of technology. [S.l.], 2008. p. 48–55. Cited on page 80. FOWLER, M.; FOEMMEL, M. Continuous integration. Thought-Works) http://www. thoughtworks. com/Continuous Integration. pdf, p. 122, 2006. Cited 2 times on pages 23 and 24. GIGER, E.; PINZGER, M.; GALL, H. Predicting the fix time of bugs. In: ACM. Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering. [S.l.], 2010. p. 52–56. Cited on page 48. GOUSIOS, G.; PINZGER, M.; DEURSEN, A. v. An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering. [S.l.: s.n.], 2014. p. 345–355. Cited 2 times on pages 21 and 22. GOUSIOS, G.; SPINELLIS, D. GHTorrent: GitHub’s data from a firehose. In: Mining software repositories (msr), 2012 9th ieee working conference on. [S.l.: s.n.], 2012. p. 12–21. Cited on page 14. GOUSIOS, G. et al. Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering-Volume 1. [S.l.: s.n.], 2015. p. 358–368. Cited 3 times on pages 22, 23, and 24. HANLEY, J. A.; MCNEIL, B. J. The meaning and use of the area under a receiver Bibliography 86 operating characteristic (roc) curve. Radiology, v. 143, n. 1, p. 29–36, 1982. Cited on page 32. HARRELL, F. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. [S.l.]: Springer, 2015. Cited 5 times on pages 33, 34, 38, 39, and 42. HASTIE, T.; TIBSHIRANI, R.; FRIEDMAN, J. The elements of statistical learning: data mining, inference and prediction. 2. ed. [S.l.]: Springer, 2009. Cited on page 32. HILBE, J. M. Logistic regression models. [S.l.]: CRC press, 2009. Cited on page 31. HILTON, M. et al. Usage, costs, and benefits of continuous integration in open-source projects. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering - ASE 2016. [S.l.: s.n.], 2016. Cited 3 times on pages 14, 16, and 80. HOWELL, D. C. Median absolute deviation. In: . Wiley StatsRef: Statistics Reference Online. [S.l.]: John Wiley & Sons, Ltd, 2014. Cited on page 25. JAMES, G. et al. An Introduction to Statistical Learning: With Applications in R. [S.l.]: Springer Publishing Company, Incorporated, 2014. Cited on page 33. JIANG, Y.; ADAMS, B.; GERMAN, D. M. Will my patch make it? and how fast? case study on the linux kernel. In: IEEE. Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on. [S.l.], 2013. p. 101–110. Cited 2 times on pages 15 and 48. KAMPSTRA, P. et al. Beanplot: A boxplot alternative for visual comparison of distributions. Journal of statistical software, v. 28, n. 1, p. 1–9, 2008. Cited on page 29. KARVONEN, T. et al. Systematic literature review on the impacts of agile release engineering practices. Information and Software Technology, v. 86, p. 87 – 100, 2017. Cited on page 80. KNAUSS, E. et al. Research preview: Supporting requirements feedback flows in iterative system development. In: SPRINGER. International Working Conference on Requirements Engineering: Foundation for Software Quality. [S.l.], 2015. p. 277–283. Cited on page 80. LAI, S.-T.; LEU, F.-Y. Applying continuous integration for reducing web applications development risks. In: IEEE. Broadband and Wireless Computing, Communication and Applications (BWCCA), 2015 10th International Conference on. [S.l.], 2015. p. 386–391. Cited on page 24. LAUKKANEN, E.; PAASIVAARA, M.; ARVONEN, T. Stakeholder perceptions of the adoption of continuous integration – a case study. In: Proceedings of the 2015 Agile Conference. [S.l.]: IEEE Computer Society, 2015. (AGILE ’15), p. 11–20. Cited 4 times on pages 14, 24, 29, and 53. LIU, J.; LI, J.; HE, L. A comparative study of the effects of pull request on github projects. In: IEEE. Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual. [S.l.], 2016. v. 1, p. 313–322. Cited on page 16. LONG, J. Understanding the role of core developers in open source software Bibliography 87 development. Journal of Information, Information Technology, and Organizations, Informing Science Institute, v. 1, p. 75–85, 2006. Cited on page 16. MADEY, G.; FREEH, V.; TYNAN, R. The open source software development phenomenon: An analysis based on social network theory. AMCIS 2002 Proceedings, p. 247, 2002. Cited on page 21. MCINTOSH, S. et al. An empirical study of the impact of modern code review practices on software quality. Empirical Softw. Engg., Kluwer Academic Publishers, v. 21, n. 5, p. 2146–2189, 2016. Cited on page 35. MEHDI, T. et al. Kernel smoothing for roc curve and estimation for thyroid stimulating hormone. International Journal of Public Health Research, Universiti Kebangsaan Malaysia, p. 239–242, 2011. Cited on page 32. MEYER, M. Continuous integration and its tools. IEEE Softw., IEEE, v. 31, n. 3, p. 14–16, 2014. Cited 2 times on pages 23 and 24. NAGAPPAN, N.; BALL, T. Use of relative code churn measures to predict system defect density. In: IEEE. Software Engineering, 2005. ICSE 2005. Proceedings. 27th International Conference on. [S.l.], 2005. p. 284–292. Cited on page 48. PERRY, D. E.; PORTER, A. A.; VOTTA, L. G. Empirical studies of software engineering: A roadmap. In: Proceedings of the Conference on The Future of Software Engineering. [S.l.]: ACM, 2000. (ICSE ’00), p. 345–355. Cited on page 76. ROMANO, J. et al. Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys? In: annual meeting of the Florida Association of Institutional Research. [S.l.: s.n.], 2006. p. 1–3. Cited on page 29. SARLE, W. The varclus procedure. sas/stat user’s guide. sas institute. Inc., Cary, NC, USA, 1990. Cited on page 34. SCHROTER, A. et al. Do stack traces help developers fix bugs? In: IEEE. Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on. [S.l.], 2010. p. 118–121. Cited on page 47. SCHWABER, K. SCRUM development process. In: SUTHERLAND, D. J. et al. (Ed.). Business Object Design and Implementation. [S.l.]: Springer London, 1997. p. 117–134. Cited on page 14. SHIHAB, E. et al. Predicting re-opened bugs: A case study on the eclipse project. In: IEEE. Reverse Engineering (WCRE), 2010 17th Working Conference on. [S.l.], 2010. p. 249–258. Cited on page 47. SOUZA, R.; CHAVEZ, C.; BITTENCOURT, R. A. Do rapid releases affect bug reopening? a case study of firefox. In: IEEE. Software Engineering (SBES), 2014 Brazilian Symposium on. [S.l.], 2014. p. 31–40. Cited on page 15. STÅHL, D.; BOSCH, J. Experienced benefits of continuous integration in industry software product development: A case study. In: The 12th iasted international conference on software engineering,(innsbruck, austria, 2013). [S.l.: s.n.], 2013. p. 736–743. Cited on page 80. Bibliography 88

STÅHL, D.; BOSCH, J. Automated software integration flows in industry: a multiple-case study. In: ACM. Companion Proceedings of the 36th International Conference on Software Engineering. [S.l.], 2014. p. 54–63. Cited on page 80. STÅHL, D.; BOSCH, J. Modeling continuous integration practice differences in industry software development. J. Syst. Softw., Elsevier Science Inc., v. 87, p. 48–59, 2014. Cited 5 times on pages 16, 24, 29, 30, and 50. TIWARI, V. Some observations on open source software development on software engineering perspectives. International Journal of Computer Science & Information Technology (IJCSIT), v. 2, n. 6, p. 113–125, 2010. Cited on page 21. VASILESCU, B. et al. Continuous integration in a social-coding world: Empirical evidence from github. In: IEEE. Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. [S.l.], 2014. p. 401–405. Cited 4 times on pages 14, 16, 80, and 81. VASILESCU, B. et al. Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015. [S.l.: s.n.], 2015. Cited 8 times on pages 15, 16, 21, 22, 24, 41, 42, and 81. VEEN, E. V. D.; GOUSIOS, G.; ZAIDMAN, A. Automatically prioritizing pull requests. In: IEEE PRESS. Proceedings of the 12th Working Conference on Mining Software Repositories. [S.l.], 2015. p. 357–361. Cited on page 31. VIRMANI, M. Understanding DevOps bridging the gap from continuous integration to continuous delivery. In: Fifth International Conference on the Innovative Computing Technology (INTECH 2015). [S.l.: s.n.], 2015. p. 78–82. Cited on page 24. WILKS, D. S. Statistical methods in the atmospheric sciences. [S.l.]: Academic press, 2011. v. 100. Cited 2 times on pages 29 and 30. WILLIAMSON, D. F.;PARKER, R. A.; KENDRICK, J. S. The box plot: A simple visual method to interpret data. Annals of Internal Medicine, v. 110, p. 916–921, 1989. Cited on page 30. WNUK, K.; GORSCHEK, T.; ZAHDA, S. Obsolete software requirements. Information and Software Technology, v. 55, n. 6, p. 921–940, 2013. Cited on page 14. WOHLIN, C.; XIE, M.; AHLGREN, M. Reducing time to market through optimization with respect to soft factors. In: The Engineering Management Conference. [S.l.: s.n.], 1995. p. 116–121. Cited on page 14. WU, F.; WILKINSON, D. M.; HUBERMAN, B. A. Feedback loops of attention in peer production. In: IEEE. Computational Science and Engineering, 2009. CSE’09. International Conference on. [S.l.], 2009. v. 4, p. 409–415. Cited on page 16. YU, Y. et al. Wait for it: Determinants of pull request evaluation latency on github. In: IEEE. Mining Software Repositories (MSR), 2015 IEEE/ACM 12th Working Conference on. [S.l.], 2015. p. 367–371. Cited on page 31. YU, Y. et al. Determinants of pull-based development in the context of continuous Bibliography 89 integration. Sci. China Inf. Sci., Science China Press, v. 59, n. 8, 2016. Cited 5 times on pages 16, 23, 24, 31, and 81. Appendix 91 APPENDIX – Studied Projects

# Project Language PRs before CI PRs after CI Total of PRs 1 Yelp/mrjob Python 168 274 442 2 yiisoft/yii PHP 148 645 793 3 roots/sage PHP 282 156 438 4 vanilla/vanilla PHP 112 988 1100 5 processing/p5.js JavaScript 100 335 435 6 bokeh/bokeh Python 134 1419 1553 7 serverless/serverless JavaScript 285 494 779 8 craftyjs/Crafty JavaScript 305 129 434 9 invoiceninja/invoiceninja JavaScript 288 143 431 10 scikit-image/scikit-image Python 219 859 1078 11 dropwizard/dropwizard Java 157 560 717 12 androidannotations/androidannotations Java 215 198 413 13 aframevr/aframe JavaScript 124 254 378 14 jashkenas/backbone JavaScript 252 452 704 15 openhab/openhab Java 1008 506 1514 16 bcit-ci/CodeIgniter PHP 287 560 847 17 mizzy/serverspec Ruby 109 266 375 18 spinnaker/spinnaker Python 211 160 371 19 sensu/sensu Ruby 230 473 703 20 cython/cython Python 104 256 360 21 buildbot/buildbot Python 111 913 1024 22 jsbin/jsbin JavaScript 160 531 691 23 PokemonGoF/PokemonGo-Bot Python 126 216 342 24 naver/pinpoint Java 109 1302 1411 25 siacs/Conversations Java 211 113 324 26 photonstorm/phaser JavaScript 163 504 667 27 fchollet/keras Python 118 211 329 28 robolectric/robolectric Java 204 735 939 29 TelescopeJS/Telescope JavaScript 156 141 297 30 andypetrella/spark-notebook JavaScript 150 143 293 31 apache/incubator-airflow Python 229 392 621 32 ReactiveX/RxJava Java 736 662 1398 33 driftyco/ng-cordova JavaScript 115 175 290 34 haraka/Haraka JavaScript 328 600 928 35 isagalaev/highlight.js JavaScript 105 158 263 36 bundler/bundler Ruby 193 410 603 37 humhub/humhub PHP 114 138 252 38 square/picasso Java 115 116 231 39 Netflix/Hystrix Java 146 440 586 40 dropwizard/metrics Java 121 102 223 41 refinery/refinerycms Ruby 497 427 924 42 gollum/gollum JavaScript 116 105 221 43 jhipster/generator-jhipster JavaScript 107 1257 1364 44 mapbox/mapbox-gl-js JavaScript 106 105 211 45 request/request JavaScript 151 400 551 46 alohaeditor/Aloha-Editor JavaScript 272 204 476 47 boto/boto Python 205 715 920 48 grails/grails-core Java 298 166 464 49 Pylons/pyramid Python 138 1198 1336 APPENDIX A. Studied Projects 92

50 mantl/mantl Python 313 147 460 51 ether/etherpad-lite JavaScript 317 602 919 52 jashkenas/underscore JavaScript 118 332 450 53 apereo/cas Java 227 675 902 54 kivy/kivy Python 157 1136 1293 55 elastic/logstash Ruby 458 447 905 56 getsentry/sentry Python 121 1051 1172 57 hapijs/hapi JavaScript 402 466 868 58 HabitRPG/habitica JavaScript 177 986 1163 59 pyrocms/pyrocms PHP 279 611 890 60 BabylonJS/Babylon.js JavaScript 259 566 825 61 Leaflet/Leaflet JavaScript 290 874 1164 62 laravel/laravel PHP 300 500 800 63 zurb/foundation-sites JavaScript 479 1163 1642 64 callemall/material-ui JavaScript 350 1323 1673 65 loomio/loomio Ruby 366 1420 1786 66 scikit-learn/scikit-learn Python 323 1489 1812 67 frappe/erpnext Python 241 1866 2107 68 Theano/Theano Python 616 1882 2498 69 puppetlabs/puppet Ruby 247 3008 3255 70 chef/chef Ruby 103 1684 1787 71 woocommerce/woocommerce PHP 1342 1150 2492 72 divio/django-cms Python 131 1913 2044 73 scipy/scipy Python 287 1612 1899 74 matplotlib/matplotlib Python 396 2074 2470 75 sympy/sympy Python 792 2316 3108 76 twbs/bootstrap JavaScript 126 1824 1950 77 AnalyticalGraphicsInc/cesium JavaScript 515 547 1062 78 elastic/kibana JavaScript 235 1686 1921 79 mozilla/pdf.js JavaScript 1029 1875 2904 80 appcelerator/titanium mobile JavaScript 5492 1328 6820 81 StackStorm/st2 Python 745 1554 2299 82 TryGhost/Ghost JavaScript 299 2275 2574 83 fog/fog Ruby 479 1694 2173 84 ansible/ansible Python 3150 1462 4612 85 ipython/ipython Python 504 3273 3777 86 cakephp/cakephp PHP 287 3653 3940 87 owncloud/core PHP 2680 5623 8303 88 rails/rails Ruby 525 9866 10391 89 mozilla-b2g/gaia JavaScript 4369 17691 22060 90 saltstack/salt Python 457 19366 19823 93 APPENDIX – R2 and R2 optimism for the linear models

# Project CI NO-CI R2 R2 optimism R2 reduced R2 R2 optimism R2 reduced 1 Yelp/mrjob 0.452 0.069 0.383 0.488 0.08 0.410 2 yiisoft/yii 0.606 0.026 0.580 0.907 0.04 0.867 3 roots/sage 0.184 0.129 0.055 0.553 0.04 0.511 4 vanilla/vanilla 0.131 0.036 0.096 0.897 0.02 0.877 5 processing/p5.js 0.119 0.034 0.085 0.885 0.05 0.830 6 bokeh/bokeh 0.597 0.009 0.588 0.875 0.19 0.681 7 serverless/serverless 0.958 0.003 0.954 0.883 0.02 0.866 8 craftyjs/Crafty 0.903 0.015 0.888 0.361 0.31 0.049 9 invoiceninja/invoiceninja 0.823 0.088 0.734 0.402 0.18 0.224 10 scikit-image/scikit-image 0.714 0.008 0.707 0.586 0.08 0.511 11 dropwizard/dropwizard 0.736 0.010 0.725 0.759 0.03 0.730 12 androidannotations/androidannotations0.928 0.017 0.911 0.980 0.00 0.978 13 aframevr/aframe 0.809 0.723 0.087 0.208 0.14 0.069 14 jashkenas/backbone 0.806 0.007 0.799 0.057 0.09 -0.029 15 openhab/openhab 0.934 0.007 0.927 0.435 0.04 0.400 16 bcit-ci/CodeIgniter 0.845 0.008 0.837 0.452 0.14 0.310 17 mizzy/serverspec 0.180 0.076 0.103 0.045 0.20 -0.151 18 spinnaker/spinnaker 0.259 0.212 0.047 0.437 0.16 0.274 19 sensu/sensu 0.298 0.072 0.226 0.224 0.14 0.081 20 buildbot/buildbot 0.816 0.010 0.806 0.581 0.23 0.348 21 jsbin/jsbin 0.102 0.035 0.066 0.587 0.24 0.349 22 PokemonGoF/PokemonGo-Bot 0.330 0.095 0.236 0.093 0.75 -0.659 23 naver/pinpoint 0.397 0.011 0.387 0.526 1.05 -0.520 24 siacs/Conversations 0.191 0.759 -0.568 0.628 0.06 0.565 25 photonstorm/phaser 0.701 0.028 0.673 0.344 0.40 -0.057 26 fchollet/keras 0.741 0.040 0.701 0.467 0.15 0.314 27 robolectric/robolectric 0.621 0.017 0.604 0.765 0.03 0.732 28 TelescopeJS/Telescope 0.970 0.003 0.967 0.512 0.10 0.416 29 andypetrella/spark-notebook 0.682 0.111 0.571 0.775 0.13 0.648 30 apache/incubator-airflow 0.449 0.036 0.413 0.797 0.12 0.677 31 ReactiveX/RxJava 0.870 0.020 0.850 0.176 0.07 0.111 32 driftyco/ng-cordova 0.211 0.071 0.141 0.158 0.47 -0.315 33 haraka/Haraka 0.768 0.022 0.747 0.544 0.07 0.472 34 isagalaev/highlight.js 0.183 0.069 0.114 0.338 0.31 0.023 35 bundler/bundler 0.528 0.040 0.488 0.630 0.06 0.571 36 humhub/humhub 0.861 0.032 0.830 0.304 0.12 0.181 37 square/picasso 0.847 0.053 0.794 0.658 0.06 0.598 38 Netflix/Hystrix 0.479 0.033 0.446 0.907 0.03 0.881 39 dropwizard/metrics 0.855 0.019 0.836 0.641 0.08 0.564 40 refinery/refinerycms 0.849 0.127 0.722 0.365 0.05 0.313 41 gollum/gollum 0.606 16.788 -16.182 0.471 0.27 0.198 42 jhipster/generator-jhipster 0.838 0.003 0.834 0.723 0.38 0.344 43 mapbox/mapbox-gl-js 0.918 0.025 0.893 0.482 0.29 0.192 44 request/request 0.229 0.064 0.165 0.452 0.20 0.252 45 alohaeditor/Aloha-Editor 0.335 0.845 -0.509 0.452 0.07 0.377 APPENDIX B. R2 and R2 optimism for the linear models 94

46 boto/boto 0.247 0.047 0.200 0.327 0.21 0.122 47 grails/grails-core 0.084 0.108 -0.024 0.523 0.12 0.403 48 Pylons/pyramid 0.319 0.029 0.290 0.679 0.39 0.284 49 mantl/mantl 0.258 0.113 0.145 0.092 0.06 0.028 50 ether/etherpad-lite 0.237 0.041 0.196 0.372 0.04 0.335 51 jashkenas/underscore 0.737 0.040 0.697 0.903 0.06 0.842 52 apereo/cas 0.537 0.014 0.523 0.146 0.12 0.025 53 kivy/kivy 0.967 0.001 0.965 0.433 0.11 0.327 54 elastic/logstash 0.414 0.080 0.333 0.152 0.09 0.067 55 getsentry/sentry 0.454 0.021 0.433 0.069 0.21 -0.143 56 hapijs/hapi 0.427 0.028 0.399 0.248 0.04 0.206 57 HabitRPG/habitica 0.660 0.077 0.582 0.416 0.07 0.346 58 pyrocms/pyrocms 0.607 0.017 0.590 0.137 0.25 -0.112 59 BabylonJS/Babylon.js 0.866 0.006 0.860 0.280 0.28 -0.004 60 Leaflet/Leaflet 0.542 0.029 0.513 0.966 0.02 0.941 61 laravel/laravel 0.472 0.019 0.453 0.584 0.10 0.480 62 zurb/foundation-sites 0.087 0.094 -0.007 0.451 0.04 0.415 63 callemall/material-ui 0.786 0.004 0.782 0.284 0.13 0.156 64 loomio/loomio 0.720 0.005 0.715 0.879 0.01 0.868 65 scikit-learn/scikit-learn 0.079 0.046 0.033 0.340 0.11 0.226 66 frappe/erpnext 0.756 0.006 0.750 0.733 0.03 0.704 67 Theano/Theano 0.817 0.002 0.814 0.698 0.03 0.670 68 puppetlabs/puppet 0.516 0.004 0.512 0.288 0.14 0.151 69 chef/chef 0.426 0.008 0.418 0.554 0.06 0.495 70 woocommerce/woocommerce 0.624 0.009 0.614 0.517 0.04 0.478 71 divio/django-cms 0.403 0.009 0.394 0.477 0.50 -0.022 72 scipy/scipy 0.137 0.028 0.109 0.273 0.11 0.165 73 matplotlib/matplotlib 0.511 0.006 0.505 0.363 0.14 0.218 74 sympy/sympy 0.261 0.008 0.253 0.278 0.08 0.200 75 twbs/bootstrap 0.399 0.011 0.388 0.075 0.26 -0.189 76 AnalyticalGraphicsInc/cesium 0.701 0.019 0.682 0.885 0.00 0.881 77 elastic/kibana 0.413 0.012 0.400 0.379 0.36 0.019 78 mozilla/pdf.js 0.455 0.007 0.449 0.751 0.00 0.747 79 appcelerator/titanium_mobile 0.566 0.009 0.557 0.241 0.00 0.238 80 StackStorm/st2 0.225 0.011 0.215 0.901 0.01 0.890 81 TryGhost/Ghost 0.548 0.010 0.538 0.733 0.02 0.710 82 fog/fog 0.358 0.016 0.342 0.192 0.22 -0.026 83 ansible/ansible 0.600 0.009 0.590 0.581 0.01 0.567 84 ipython/ipython 0.760 0.002 0.758 0.660 0.03 0.634 85 cakephp/cakephp 0.333 0.007 0.326 0.407 0.06 0.348 86 owncloud/core 0.566 0.002 0.564 0.629 0.00 0.626 87 rails/rails 0.194 0.001 0.193 0.075 0.06 0.016 88 mozilla-b2g/gaia 0.513 0.001 0.512 0.763 0.00 0.760 89 saltstack/salt 0.573 0.001 0.572 0.306 0.04 0.270 95 APPENDIX – Percentage of delivered pull requests per project in the next and later release buckets

# project CI NO-CI % (next) % (later) Total of Prs % (next) % (later) Total of Prs 1 Yelp/mrjob 84.3 15.7 274 72.6 27.4 168 2 yiisoft/yii 80.2 19.8 645 95.9 4.1 148 3 roots/sage 98.1 1.9 156 92.6 7.4 282 4 vanilla/vanilla 4.5 95.5 988 9.8 90.2 112 5 processing/p5.js 91.6 8.4 335 99 1 100 6 bokeh/bokeh 93.4 6.6 1419 88.8 11.2 134 7 serverless/serverless 86.0 14.0 494 74.0 26.0 285 8 craftyjs/Crafty 98.4 1.6 129 91.8 8.2 305 9 invoiceninja/invoiceninja 52.4 47.6 143 95.1 4.9 288 10 scikit-image/scikit-image 92.8 7.2 859 87.2 12.8 219 11 dropwizard/dropwizard 43.9 56.1 560 85.4 14.6 157 12 androidannotations/androidannotations 75.3 24.7 198 81.4 18.6 215 13 aframevr/aframe 60.6 39.4 254 64.5 35.5 124 14 jashkenas/backbone 94.7 5.3 452 98.8 1.2 252 15 openhab/openhab 70.6 29.4 506 71.8 28.2 1008 16 bcit-ci/CodeIgniter 0.4 99.6 560 0.3 99.7 287 17 mizzy/serverspec 89.1 10.9 266 92.7 7.3 109 18 spinnaker/spinnaker 91.9 8.1 160 98.6 1.4 211 19 sensu/sensu 96.6 3.4 473 99.6 0.4 230 20 cython/cython 36.7 63.3 256 55.8 44.2 104 21 buildbot/buildbot 61.8 38.2 913 79.3 20.7 111 22 jsbin/jsbin 74.6 25.4 531 88.1 11.9 160 23 PokemonGoF/PokemonGo-Bot 95.4 4.6 216 99.2 0.8 126 24 naver/pinpoint 43.9 56.1 1302 99.1 0.9 109 25 siacs/Conversations 89.4 10.6 113 87.2 12.8 211 26 photonstorm/phaser 87.9 12.1 504 92.6 7.4 163 27 fchollet/keras 97.6 2.4 211 94.9 5.1 118 28 robolectric/robolectric 98.0 2.0 735 70.1 29.9 204 29 TelescopeJS/Telescope 93.6 6.4 141 92.9 7.1 156 30 andypetrella/spark-notebook 98.6 1.4 143 23.3 76.7 150 31 apache/incubator-airflow 43.9 56.1 392 90.0 10.0 229 32 ReactiveX/RxJava 32.2 67.8 662 87.0 13.0 736 33 driftyco/ng-cordova 84.6 15.4 175 93.9 6.1 115 34 haraka/Haraka 97.2 2.8 600 95.7 4.3 328 35 isagalaev/highlight.js 91.8 8.2 158 91.4 8.6 105 36 bundler/bundler 34.9 65.1 410 2.6 97.4 193 37 humhub/humhub 87.7 12.3 138 87.7 12.3 114 38 square/picasso 98.3 1.7 116 92.2 7.8 115 39 Netflix/Hystrix 68.2 31.8 440 72.6 27.4 146 40 dropwizard/metrics 69.6 30.4 102 74.4 25.6 121 41 refinery/refinerycms 60.9 39.1 427 35.6 64.4 497 42 gollum/gollum 98.1 1.9 105 94.8 5.2 116 43 jhipster/generator-jhipster 79.4 20.6 1257 97.2 2.8 107 APPENDIX C. Percentage of delivered pull requests per project in the next and later release buckets 96

44 mapbox/mapbox-gl-js 85.7 14.3 105 94.3 5.7 106 45 request/request 90.5 9.5 400 93.4 6.6 151 46 alohaeditor/Aloha-Editor 8.3 91.7 204 68.8 31.3 272 47 boto/boto 83.8 16.2 715 93.7 6.3 205 48 grails/grails-core 31.3 68.7 166 33.9 66.1 298 49 Pylons/pyramid 35.9 64.1 1198 58.0 42.0 138 50 mantl/mantl 89.1 10.9 147 92.0 8.0 313 51 ether/etherpad-lite 79.9 20.1 602 95.6 4.4 317 52 jashkenas/underscore 95.8 4.2 332 100 0 118 53 apereo/cas 8 92 675 22.9 77.1 227 54 kivy/kivy 89.8 10.2 1136 96.2 3.8 157 55 elastic/logstash 79.4 20.6 447 81.7 18.3 458 56 getsentry/sentry 52.5 47.5 1051 76.9 23.1 121 57 hapijs/hapi 84.5 15.5 466 82.6 17.4 402 58 HabitRPG/habitica 97.4 2.6 986 87.6 12.4 177 59 pyrocms/pyrocms 59.6 40.4 611 77.8 22.2 279 60 BabylonJS/Babylon.js 99.8 0.2 566 97.7 2.3 259 61 Leaflet/Leaflet 36.0 64.0 874 60.7 39.3 290 62 laravel/laravel 74 26 500 76.7 23.3 300 63 zurb/foundation-sites 83.1 16.9 1163 84.8 15.2 479 64 callemall/material-ui 76.8 23.2 1323 81.7 18.3 350 65 loomio/loomio 96.2 3.8 1420 32.8 67.2 366 66 scikit-learn/scikit-learn 66.0 34.0 1489 83.9 16.1 323 67 frappe/erpnext 65.4 34.6 1866 97.9 2.1 241 68 Theano/Theano 98.8 1.2 1882 97.1 2.9 616 69 puppetlabs/puppet 36.7 63.3 3008 15.4 84.6 247 70 chef/chef 21.6 78.4 1684 4.9 95.1 103 71 woocommerce/woocommerce 36.6 63.4 1150 58.7 41.3 1342 72 divio/django-cms 43.0 57.0 1913 45.0 55.0 131 73 scipy/scipy 23.4 76.6 1612 38.0 62.0 287 74 matplotlib/matplotlib 72.3 27.7 2074 64.9 35.1 396 75 sympy/sympy 68.4 31.6 2316 88.3 11.7 792 76 twbs/bootstrap 85.1 14.9 1824 98.4 1.6 126 77 AnalyticalGraphicsInc/cesium 0 100 547 2.3 97.7 515 78 elastic/kibana 27.9 72.1 1686 72.8 27.2 235 79 mozilla/pdf.js 92.1 7.9 1875 97.8 2.2 1029 80 appcelerator/titanium_mobile 41.2 58.8 1328 45.2 54.8 5492 81 StackStorm/st2 74.2 25.8 1554 99.5 0.5 745 82 TryGhost/Ghost 90.4 9.6 2275 87.3 12.7 299 83 fog/fog 89.8 10.2 1694 95.2 4.8 479 84 ansible/ansible 10.4 89.6 1462 45.4 54.6 3150 85 ipython/ipython 22.6 77.4 3273 0 100 504 86 cakephp/cakephp 22.6 77.4 3653 43.2 56.8 287 87 owncloud/core 34.3 65.7 5623 18.1 81.9 2680 88 rails/rails 11.3 88.7 9866 1.7 98.3 525 89 mozilla-b2g/gaia 10.7 89.3 17691 94.6 5.4 4369 90 saltstack/salt 24.2 75.8 19366 47.5 52.5 457