UNIVERSIDAD REY JUAN CARLOS ESCUELA SUPERIOR DE CIENCIAS EXPERIMENTALES Y TECNOLOG¶IA EMPIRICAL SOFTWARE ENGINEERING RESEARCH ON LIBRE SOFTWARE: DATA SOURCES, METHODOLOGIES AND RESULTS DOCTORAL THESIS Gregorio Robles Ingeniero de Telecomunicaci¶on Madrid, 2005 Thesis submitted to the Departamento de Inform¶atica, Estad¶³stica y Telem¶atica in partial ful¯llment of the requirements for the degree of Doctor europeus of Philosophy in Computer Science Escuela Superior de Ciencias Experimentales y Tecnolog¶³a Universidad Rey Juan Carlos M¶ostoles, Madrid, Spain DOCTORAL THESIS EMPIRICAL SOFTWARE ENGINEERING RESEARCH ON LIBRE SOFTWARE: DATA SOURCES, METHODOLOGIES AND RESULTS Author: Gregorio Robles-Mart¶³nez Ingeniero de Telecomunicaci¶on - Telecommunication Engineer Director: Jesus¶ M. Gonz¶alez-Barahona Doctor Ingenierio de Telecomunicaci¶on - Doctor Telecommunication Engineer M¶ostoles (Madrid), Spain, 2005 DOCTORAL THESIS: Empirical Software Engineering Research on Libre Software: Data Sources, Methodologies and Results AUTHOR: Gregorio Robles-Mart¶³nez DIRECTOR: Jesus¶ M. Gonz¶alez-Barahona The committee named to evaluate the Thesis above indicated, made up of the following doctors: PRESIDENT: Prof. Dr. Manuel Hermenegildo (Universidad Polit¶ecnica de Madrid, Madrid, Spain) VOCALS: Prof. Dr. Brian Fitzgerald (University of Limerick, Limerick, Ireland) Dr. Stefan Koch (WirtschaftsuniversitÄat Wien, Vienna, Austria) Dr. Daniel M. Germ¶an (University of Victoria, Victoria, Canada) SECRETARY: Dr. Antonio Fern¶andez-Anta (Universidad Rey Juan Carlos, Madrid, Spain) has decided to grant the quali¯cation of M¶ostoles (Madrid, Spain), February 9th 2006 The secretary of the committee. (c) 2005 Gregorio Robles This work is licensed under a Creative Commons Attribution-ShareAlike License. http://creativecommons.org/licenses/by-sa/2.0/ (see Appendix D for further details). To my father, mother, sister and Nuria for their in¯nite love, understanding and support. Acknowledgments It has been much work during these last four years. But at the same time, it has been a lot of fun. A great part of this fun is due to the people I've been involved with, both in my research and teaching duties and in my personal life (although sometimes I don't know where the former ends and the latter starts). I have to start showing my gratitude to Jesus,¶ who has o±cially been my advisor, but uno±cially much more. There has been a special connection between us from the ¯rst minute regarding what we wanted to research and how we wanted to do it. With much e®ort and many nights without sleep (which I myself can easily a®ord, but which I should regret and apologize in front of Jesus'¶ family) we have started to build a new research line, many research projects and a new research group that accounts now up to 8 persons. My name is the only one that appears on the front cover of this thesis, but without any doubt it would be fair if the name of both of us would be there. I won't have time in this life to say enough times Muchas gracias, Jesus¶ . Of course, I have to mention all the guys from our research team (Juanjo, Israel, Teo, Alv¶ aro N., Jorge, Diego and Carlos, as well as Luis L. and Alv¶ aro del C.). They have brought new ideas and lots of amusement into my research and I'm sure they all will agree when I say that we've built a small family in our labs in M¶ostoles. I'm also very thankful to all those who compose the Grupo de Comunicaciones y Sistemas (GSyC) and are not part of the CALIBRE team: Jos¶e (with whom I share the pride of being a Real Madrid supporter), Jos¶e Mar¶³a (when my friends tell me that I work too much I always explain them that I've got a colleague named Jos¶e Mar¶³a...), Agust¶³n, Pedro, Vicente, Luis R., Miguel, Eva, Paco, Pablo, Rafaela, Sergio, Antonio, nemo, Gorka, Katia, Andr¶es, Juan and Quique. During these years of research, I've had also the possibility of visiting some research groups abroad; in 2003 I visited Vienna for 8 weeks and in 2004 I was in Maastricht for four months. The time I spent there has had a great impact on this work, but I also enjoyed much the people I found there. So, many thanks go to Stefan (plus Lale and child), who acted as local host in Vienna, and to Rishab in Maastricht, also to the many great people I had the possibility to know there: Rudiger,Ä Bernhard, Wilma, Ekin, Semih, Abraham, Paolo, Elad, among others. I cannot forget all those with whom I have co-authored some of the works or actively exchange some research ideas in these last years. In this sense, Martin Michlmayr (from the Univ. Cambridge), Andrea Capiluppi (Univ. Torino and Open Univ.), Juan F. Ramil (Open Univ.), Ioannis Samoladas (Univ. Thessaloniki), Juan Juli¶an Merelo (Univ. Granada), Paul A. David (Univ. Stanford & Univ. Oxford) and Jorge Ferrer (Univ. Polit¶ecnica of Madrid) will be somewhat guilty of some of the ideas and concepts that are included in this thesis. Rosario Plaza should have her place here as well, as she has been the one who has reviewed my English with much patience and dedication. I also have to mention all those who have been the victims of this PhD thesis: my friends. They have been the ones that have su®ered the lack of time I had to devote them and that they, of course, deserve. Many thanks go to Diego, Rodrigo & Cristina, Carlos Mart¶³n Ugalde, Enrique Zamora, Jos¶e Mar¶³a Nadal, my scouting group, the people of Scouts-es, some of my students, and many, many others... Finally, I will always be in debt with my family. My father, my mother and my sister Elisa have always been a great support and have had the necessary understanding and patience that is required when you are dealing with a PhD student at home. A special hug goes to my grand parents, Gregorio and Celia, and Ram¶on and Maria del Carmen. And last but not least, I'm deeply grateful to Nuria, who has been always there showing in¯nite love. Gregorio Madrid, October 2005 Abstract With the appearance and implantation of Internet new ways of developing software have arisen that make use of telematic tools, follow exible methodologies and incorporate third-party contributions. One of the paradigmatic examples of software development that counts on the aforementioned charac- teristics can be found in the phenomenon of libre (free/open source) software, being of special special interest those projects that are large in number of participants and in software size. Although at ¯rst these new environments are less controllable than traditional ones (because development is done generally in a geographically distributed way, there is no a company behind the development that takes the lead, traditional hierarchic structures are not followed or external contributions are hardly predictable), we have access to much information: the software product itself and many of the by-products that are created during the development process (communication archives, bug-tracking systems and versioning systems, among others). These data sources are usually publicly available on the Internet, so we can make exhaustive analysis with a great amount of data (much of which is hardly obtainable in traditional, industrial environments). The goal of this thesis is to identify the data sources that libre software projects o®er publicly, to present and display some methodologies for the analysis of these sources and the data that we can extract from them, and to show the results that have been obtained from applying these methodologies. Our intention is, in particular, to know the libre software phenomenon better, but also in general software creation processes since the acquired knowledge does not have to be speci¯c to libre software, but could be applied to many other development environments. Thus, we will start in this thesis with the description of the publicly available data sources on the Internet and the data that we can extract from them. Afterwards, several methods, that will depend on the source, will be used to obtain information from the data and to ¯lter out interferences. Finally, several methodologies will be presented and applied on the data obtained from libre software projects which have been selected as case studies. The methodologies will range from classical to novel ones. Thus, among the classical we will perform an analysis of the growth of the software systems as it is known from software evolution, or we will apply social network analysis, a technique from the ¯eld of social sciences. In both cases, the contribution of this thesis has been to apply them to libre software projects. Regarding novel methodologies, we propose the archaeological analysis of software systems with the aim of stating what remains from previous versions, the generalization of software evolution to ¯le types di®erent from source code (for instance, documentation, translation or user interface ¯les, among others) or the study of the evolution of volunteer participation and the regeneration of the leading \core" group. Also, a series of tools have been created to automate, at least partially, the whole process. These tools permits to reuse these methodologies on other projects. Among the main contributions of this thesis we can state that this is the ¯rst exhaustive analysis of a large number of software projects, although the proposed methodologies and the tools that have been developed allow the study in the next future of more projects. On the other hand, we have shown that the technical analysis should be complemented with socio-technical analysis to fully understand the development process and many of the technical issues of (libre) software projects.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages276 Page
-
File Size-