The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: an Empirical Study

Federal University of Rio Grande do Norte Center of Exact and Earth Sciences Department of Informatics and Applied Mathematics Graduate Program in Systems and Computing Academic Master’s Degree in Systems and Computing The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: An Empirical Study João Helis Junior de Azevedo Bernardo Natal, Brazil July, 2017 João Helis Junior de Azevedo Bernardo The Impact of Adopting Continuous Integration on the Delivery Time of Merged Pull Requests: An Empirical Study A dissertation submitted to the Computer Science Graduation Program of the Cen- ter of Exact and Earth Sciences in confor- mity with the requirements for the Degree of Master in Systems and Computing. PPgSC - Graduate Program in Systems and Computing DIMAp - Department of Informatics and Applied Mathematics UFRN - Federal University of Rio Grande do Norte Advisor: Uirá Kulesza Co-Advisor: Daniel Alencar da Costa Natal, Brazil July, 2017 Catalogação da Publicação na Fonte. UFRN / SISBI / Biblioteca Setorial Especializada do Centro de Ciências Exatas e da Terra – CCET. Bernardo, João Helis Junior de Azevedo. The impact of adopting continuous integration on the delivery time of merged pull requests: an empirical study / João Helis Junior de Azevedo Bernardo. – Natal, RN, 2017. 96 f.: il. Orientador: Prof. Dr. Uirá Kulesza. Coorientador: Prof. Dr. Daniel Alencar da Costa. Dissertação (mestrado) – Universidade Federal do Rio Grande do Norte. Centro de Ciências Exatas e da Terra. Departamento de Informática e Matemática Aplicada. Programa de Pós-Graduação em Sistemas e Computação. 1. Engenharia de software – Dissertação. 2. Integração contínua – Dissertação. 3. Desenvolvimento baseado em pull requests – Dissertação. 4. Pull request – Dissertação. 5. Tempo de entrega – Dissertação. 6. Atraso de entrega – Dissertação. 7. Mineração de repositórios de software – Dissertação. I. Kulesza, Uirá. II. Costa, Daniel Alencar da. III. Título. RN/UF/BSE-CCET CDU 004.41 Acknowledgements First and foremost, I would like to thank God, the Almighty, for giving me the strength and support in all this quest for knowledge, especially for showing me the way forward in the most difficult moments of my life. Without His blessings, I certainly would not have got here. My deep gratitude to my parents, João Helis Bernardo and Rosilda de Azevedo Bernardo, and to my sister Juliana Raffaely de Azevedo Bernardo, without their love, dedication and support in all single part of my life, I would not be who I am. Thanks for teaching me that I can never give up on my dreams. I would like to express my deepest gratitude and special thanks to my girlfriend Milenna Veríssimo, for her love, support and constant patience. Thanks for always encourage me to be a better man. I love you. I would like to express my extreme sincere gratitude to my advisor Uirá Kulesza, who gave me the opportunity to work with him, and expertly guided me on the path that I walked during my master’s degree. I would also like to thank my co-advisor and friend Daniel Alencar da Costa, for mentoring me and provide me all support that I needed to conduct the studies that we performed in this dissertation. Without his precious guidance, I could not be able to achieve the state of this work. I would like to extend my appreciation to my laboratory colleagues, Leo Moreira, Fabio Penha, and Eduardo Nascimento who helped to make lighter the pressures that we were facing together on the final stages of our master’s degree, by providing moments of sharing knowledge and fun through the so-called "coffee time". Ultimately, I am very grateful to CNPq for the financial support. Society must learn that we Indians can and should use technology and information in our everyday activities. That doesn’t make us any less Indians. Being Indian is in the blood that flows through our veins, not in clothing and utensils that we use or any external characteristic. Abstract Continuous Integration (CI) is a software development practice that leads developers to integrate their work more frequently. Software projects have broadly adopted CI to ship new releases more frequently and to improve code integration. The adoption of CI is usually motivated by the allure of delivering new software content more quickly and frequently. However, there is little empirical evidence to support such claims. Over the last years, many available software projects from social coding environments such as GitHub have adopted the CI practice using CI facilities that are integrated in these environments (e.g., Travis-CI). In this dissertation, we empirically investigate the impact of adopting CI on the time-to-delivery of pull requests (PRs), through the analysis of 167,037 PRs of 90 GitHub projects that are implemented in 5 different programming languages. On analyzing the percentage of merged PRs per project that missed at least one release prior being delivered to the end users, the results show that before adopting CI, a median of 13.8% of merged PRs are postponed by at least one release, while after adopting CI, a median of 24% of merged PRs have their delivery postponed to future releases. Contrary to what one might speculate, we find that PRs tend to wait longer to be delivered after the adoption of CI in the majority (53%) of the studied projects. The large increase of PR submissions after CI is a key reason as to why these projects deliver PRs more slowly after adopting CI. 77.8% of the projects increase the rate of PR submissions after adopting CI. To investigate the factors that are related to the time-to-delivery of merged PRs, we train linear and logistic regression models, which obtain sound median R-squares of 0.72-0.74, and good median AUC values of 0.85-0.90. A deeper analysis of our models suggests that, before and after the adoption of CI, the intensity of code contributions to a release may increase the delivery time due to a higher integration-load (in terms of integrated commits) of the development team. Finally, we are able to accurately identify merged pull requests that have a prolonged delivery time. Our regression models obtained median AUC values of 0.92 to 0.97. Keywords: Continuous Integration; Pull-based Development; Pull Request; Delivery Time; Delivery Delay; Mining Software Repositories. Resumo A Integração Contínua (IC) é uma prática de desenvolvimento de software que leva os desenvolvedores a integrarem seu código-fonte mais frequentemente. Projetos de software têm adotado amplamente a IC com o intuito de melhorar a integração de código e lançar novas releases mais rapidamente para os seus usuários. A adoção da IC é usualmente motivada pela atração de entregar novas funcionalidades do software de forma mais rápida e frequente. Todavia, há poucas evidências empíricas para justificar tais alegações. Ao longo dos últimos anos, muitos projetos de software disponíveis em ambientes de codificação social, como o GitHub, tem adotado a prática da IC usando serviços que podem ser facilmente integrados nesses ambientes (por exemplo, Travis-CI). Esta dissertação investiga empiricamente o impacto da adoção da IC no tempo de entrega de pull requests (PRs), através da análise de 167.037 PRs de 90 projetos do GitHub que são implementados em 5 linguagens de programação diferentes. Ao analisar a porcentagem de merged PRs por projeto que perderam pelo menos uma release antes de serem entregues aos usuários finais, os resultados mostraram que antes da adoção da IC, em mediana 13.8% dos merged PRs tem sua entrega adiada por pelo menos um release, enquanto que após a adoção da IC, em mediana 24% dos merged PRs tem sua entrega adiada para futuras releases. Ao contrário do que se pode especular, observou-se que PRs tendem a esperar mais tempo para serem entregues após a adoção da IC na maioria (53%) dos projetos investigados. O grande aumento das submissões de PRs após a IC é uma razão fundamental para que projetos demorem mais tempo para entregar PRs depois da adoção da IC. 77,8% dos projetos aumentam a taxa de submissões de PRs após a adoção da IC. Com o propósito de investigar os fatores relacionados ao tempo de entrega de merged PRs, treinou-se modelos de regressão linear e logística, os quais obtiveram R-Quadrado mediano de 0.72-0.74 e bons valores medianos de AUC de 0.85-0.90. Análises mais profundas de nossos modelos sugerem que, antes e depois da adoção da IC, a intensidade das contribuições de código para uma release pode aumentar o tempo de entrega de PRs devido a uma maior carga de integração (em termos de commits integrados) da equipe de desenvolvimento. Finalmente, apresentamos heurísticas capazes de identificar com precisão os PRs que possuem um tempo de entrega prolongado. Nossos modelos de regressão obtiveram valores de AUC mediano de 0.92 a 0.97. Palavras-chave: Integração Contínua; Desenvolvimento Baseado em Pull Requests; Pull Request; Tempo de Entrega; Atraso de Entrega; Mineração de Repositórios de Software. List of Figures Figure 1 – An overview of the scope of the dissertation............... 17 Figure 2 – An overview of the pull-based development model that is integrated with Continuous Integration. ...................... 22 Figure 3 – An illustrative example of how we compute delivery time in terms of days.................................... 25 Figure 4 – An illustrative example of how we compute delivery time in terms of releases.................................. 26 Figure 5 – The basic life-cycle of a released pull request.............. 29 Figure 6 – Training Linear and Logistic Regression Models. ........... 33 Figure 7 – Percentage of merged pull requests that have a long delivery time. 38 Figure 8 – An overview of our project selection process. ............. 41 Figure 9 – Number of projects grouped by programming language.

Load more