Understanding the Evolution of Socio-Technical Aspects in Open Source Ecosystems: an Empirical Analysis of GNOME
Total Page:16
File Type:pdf, Size:1020Kb
UMONS Faculté des Sciences Département d’Informatique Understanding the Evolution of Socio-technical Aspects in Open Source Ecosystems: An Empirical Analysis of GNOME Mathieu Goeminne A dissertation submitted in fulfillment of the requirements of the degree of Docteur en Sciences Advisor Jury Dr. TOM MENS Dr. XAVIER BLANC Université de Mons, Belgium Université de Bordeaux 1, France Dr. VÉRONIQUE BRUYÈRE Université de Mons, Belgium Dr. JESUS M. GONZALEZ-BARAHONA Universidad Rey Juan Carlos, Spain Dr. TOM MENS Université de Mons, Belgium Dr. ALEXANDER SEREBRENIK Technische Universiteit Eindhoven, The Netherlands Dr. JEF WIJSEN Université de Mons, Belgium June 2013 Acknowledgments This work could have never been completed without the support of many people. First things first, I would like to thank Dr. Tom Mens for his help throughout the years. He provided me advice and support, and helped me to grow up professionally and personally. He forced me to surpass myself, and I would like to thank him warmly for that. I would thank the other members of my jury: Dr. Xavier Blanc, Dr. Véronique Bruyère, Dr. Jesus M. Gonzalez-Barahona, Dr. Alexander Serebrenik, and Dr. Jef Wijsen. All of them have positively influenced my vision of the world and left their mark on me. Thank you for the time you have given me. I hope I will have the pleasure to continue to work with you in the future. I would also like to thank Bogdan Vasilescu, who co-authored a journal article from which a chapter of this dissertation has been extracted. Thank you for your collaboration and your precious advices. I wish to thank my friends and my colleagues who offered me so many joy and who regularly proved me that there is a life apart from the thesis! Special thanks go to my family for their encouragement and for being present in my life. To all of you: Thank you for being on my side during this crazy adventure. Finally, I would like to thank my wife Nadège and my son Antoine. I’m sorry for the too many hours I have passed far away, even when I was sitting close to you. Thank you for your unconditional support. Abstract Since the 70’s, software development has experienced an exponential growth. The number of developed software products, their size and their complexity has become so important that understanding their functioning and managing their evolution have become very hard today. Open source software (OSS) does not escape from this growth and the problems it raises. For more than a decade, OSS systems have been the subject to an increasing interest from the academic community, individuals and the software industry at large, and their development is booming because of their low cost of use (OSS systems are generally freely available), their low barriers to entry for the developers, their low cost of development (they may be built by reusing other OSS systems), and the large quantity of easily available historical data. Contrary to the traditional commercial and proprietary software, OSS is typically de- veloped by a group of persons dispersed all over the world. This geographical distribution forces contributors to use tools allowing an asynchronous communication and an informa- tion exchange over big space scales. The public availability of the historical data being handled by these tools facilitates the analysis of OSS evolution. Initially, empirical analysis of OSS projects evolution was limited to the study of source code evolution only. Later, other software development artefacts have been taken into account as well. For instance, the first analyses of OSS project mailing lists date to 2002 [157]. However, the main factor that drives the evolution of a software project is the people contributing to it. Hence, in order to better comprehend how OSS projects evolve, one needs to gain a better insight in the socio-technical aspects that surrounding them. In order to get a more accurate model of the interaction between the project contributors one needs to consider development artefacts that contain information about its social aspects, such as bug reports, e-mail discussions and version commits. Frequently, collections of different projects are developed and evolve together in the same environment. We refer to these collections as software ecosystems. Since the con- tributors to the projects belonging to these ecosystems work together towards a common goal, they tend to form de facto communities. It is therefore important to study the so- cial aspects not only at the level of individual projects, but also at the level of the entire ecosystem. The goal of this dissertation is to understand the evolution of the social aspects in open source ecosystems. More precisely, we study how contributors to open source ecosystems can be grouped in different communities that evolve and collaborate in different ways. In doing so, we provide evidence that contributors have specificities that are not taken into account by today’s analysis tools. Becoming aware of these specificities opens up new research and practically relevant questions on how new automated tools can be designed and used to offer better support to the ecosystem’s contributors in their activities. The contributions of this dissertation are manifold. We developed an application framework that allows us to empirically study the evolution of software ecosystems. Fo- cusing on the GNOME ecosystem, we designed a systematic approach for detecting the multiple accounts used by contributors to access the software repositories and used it to gain a better insight in the communities belonging to the ecosystem. We defined objec- tive criteria according to which these contributors can be categorised. In the GNOME history we observed a power law behaviour between the number of contributors and their contributions, in term of commits submitted, mails sent and bug reports handled. With further statistical analyses we established correlations and trends between the contribu- tors’ effort, their favourite means of communication and the activity types in which they are involved. For example, we observed that the contributors tend to restrict themselves to a limited number of activity types, but the more active a contributor is, the more he tends to spread his effort over different types of activity. When studying the evolution of GNOME contributors, we observed a tendency of specialisation towards less activity types. We also observed that, during the last years, the effort in each of the studied activity types is decreasing. Résumé Depuis les années 70, le développement logiciel connaît une croissance exponentielle. Le nombre de produits logiciels développés, leur taille et leur complexité sont devenus si im- portants que la compréhension de leur fonctionnement et la gestion de leur évolution sont devenues très difficiles de nos jours. Les logiciels open source (OSS) n’échappent pas à cette croissance ni aux problèmes qu’elle pose. Depuis plus d’une décennie, les systèmes open source font l’objet d’un intérêt croissant de la communauté académique, des parti- culiers et de l’industrie logicielle en général. Leur développement explose du fait de leur faible coût d’utilisation (les systèmes open source sont généralement librement accessi- bles), leur faible ticket d’entrée pour les développeurs, leur faible coût de développement (ils peuvent être construits en réutilisant d’autres systèmes open source), ainsi que la grande quantité de données historiques pouvant être aisément obtenues. Contrairement aux logiciels commerciaux et propriétaires traditionnels, les logiciels open source sont typiquement développés par un groupe de personnes dispersées à travers le monde. Cette distribution géographique oblige les contributeurs à utiliser des outils permettant une communication asynchrone et l’échange d’informations sur de grandes distances. La mise à disposition publique des données historiques gérées par ces outils facilite l’analyse de l’évolution des logiciels open source. Initialement, l’analyse empirique de l’évolution des projets open source était limitée à l’étude de l’évolution du code source. Par la suite, d’autres artefacts de développement logiciel ont été pris en compte. Par exemple, les premières analyses des listes de diffusion des projets open source datent de 2002 [157]. Cependant, les personnes contribuant à un projet logiciel en constituent le principal vecteur d’évolution. Ainsi, afin de mieux comprendre la manière dont les projets open source évoluent, il est nécessaire d’avoir un meilleur aperçu des aspect socio-techniques qui l’entourent. Afin d’avoir un modèle plus précis et plus juste des interactions existant entre les contributeurs du projet, il est nécessaire de considérer les artefacts de développement qui contiennent de l’information relative à ses aspects sociaux, tels que les rapports d’erreur, les discussions par e-mail et les commits de version. Fréquemment, des projets logiciels sont développés et évoluent ensemble dans le même environnement. Nous appelons de telles collections de projets des écosystèmes logiciels. Dans la mesure où les contributeurs des projets appartenant à ces écosystèmes travaillent ensemble dans un but commun, ils ont tendance à former de facto des communautés. Il est donc important d’étudier les aspects sociaux non seulement au niveau des projets individuels, mais également au niveau de l’écosystème dans son ensemble. L’objectif de cette thèse est de comprendre l’évolution des aspects sociaux des écosys- tèmes open source. Plus précisément, nous étudions la manière dont les contributeurs impliqués dans les écosystèmes open source peuvent être groupés en différentes commu- nautés qui évoluent et collaborent de différentes manières. De la sorte, nous apportons des indices probants selon lesquels les contributeurs ont des spécificités qui ne sont pas prises en compte par les outils d’analyses actuels. La prise de conscience de ces spécificités laisse entrevoir de nouvelles questions de recherche et de nouvelles pratiques sur la manière de concevoir de nouveaux outils automatisés aidant plus efficacement les contributeurs de l’écosystème dans la réalisation de leurs activités.