UNIVERSITY OF MACEDONIA DEPARTMENT OF APPLIED INFORMATICS MASTER THESIS A STUDY OF WORK DISTRIBUTION IN OPEN SOURCE SOFTWARE PROJECTS KONSTANTINOS STAMATIADIS ADVISOR: ALEXANDER CHATZIGEORGIOU JANUARY 2012 Konstantinos Stamatiadis [email protected] ii PREFACE In software projects in general and Open Source Software (OSS) projects in particular, the most important aspects are the teams of people that develop them (in OSS we call them the “Community”). As projects grow in size and complexity, so do the teams that develop and maintain them. The emergence of the OSS movement provided software engineering researchers with massive amounts of data from every aspect of the process of developing software, ranging from the social behavior within the teams to various metrics of the code that is being produced. Numerous studies explored how the teams operate [15], [13], evolve [14], [9], the motiva- tion behind the participating developers [10], [18] and the ingredients that affect the quality of the output [1]. The goal of this Thesis is to contribute knowledge in the stud- ies of the social aspect of the OSS movement. We focus on the study of the contribution of the developers in open source projects, by employing the Gini coefficient as a measure of the distribution of effort. Even though the Gini coefficient was used before [5], [17] (albeit in only a few studies and only until recently), this paper, in our knowledge, is the first one to utilize data extracted from a massive source of around 1.200 open source projects, varying in size and duration, thus describing what seems to be the norm, rather than a limited observation. We decided to research how developers contribute to OSS projects because we think (and others too [16]) that it’s one of the factors that indicate how viable is a project (i.e. how active — and in what way — is the community around it) and in an essence influences the deci- sion (for individuals, academics and corporations) on whether or not to invest and get involved in an open source project. The remainder of this Thesis is organized as follows: In the first chapter we make an in- troduction into the empirical studies in software engineering and provide the reasons that are important today. In Chapter 2 we present the FLOSSMetrics project (the source of the data we analyzed), describe what it offers and the challenges it introduces when used. In Chapter 3 we define our specific research target, we describe the decisions we took, how we received the results and what are our findings. Finally in Chapter 4 we conclude our research and propose work for future studies. iii iv ACKNOWLEDGEMENTS I would like to thank my advisor, Alexander Chatzigeorgiou, for suggesting a research area well-suited to both my interests and skills, and for giving me solid advice as I worked on it. His encouragement, enthusiasm and contribution, during numerous, lengthy and productive sessions, always helped me push ahead. Also, I want to thank all my friends, fellow M.Sc. and Ph.D. students and family mem- bers for helping me and believing in me in moments I couldn’t. v vi CONTENTS Preface ................................................................................................................................iii Acknowledgements ............................................................................................................. v Contents ............................................................................................................................ vii List of Figures ..................................................................................................................... ix List of Tables ........................................................................................................................ x List of Source Code ............................................................................................................ xii 1 Introduction..................................................................................................................1 2 FLOSSMetrics .............................................................................................................. 3 2.1 About FLOSSMetrics ........................................................................................... 3 2.2 Data Preparation .................................................................................................. 4 2.3 Schema ................................................................................................................. 5 2.4 Description of Tables .......................................................................................... 11 2.4.1 Description of MLS Tables .............................................................................. 11 2.4.2 Description of SCM Tables .......................................................................... 13 2.4.3 Description of TRK Tables ..............................................................................16 2.5 Working with FLOSSMetrics Data .....................................................................18 2.5.1 Challenges .......................................................................................................18 2.5.2 Working with the Data....................................................................................18 2.5.3 “Bird’s Eye” View of the Data ..........................................................................19 3 Work Distribution ...................................................................................................... 21 3.1 Gini Coefficient ................................................................................................... 21 3.2 Data Retrieval and Preparation ......................................................................... 23 3.3 Gini/Project ........................................................................................................ 24 3.4 Correlations ........................................................................................................ 28 3.4.1 Number of Committers & Gini ...................................................................... 29 3.4.2 Number of Commits & Gini ........................................................................... 30 vii 3.4.3 Project’s Duration & Gini ................................................................................ 31 3.4.4 Aggregated SLOC & Gini............................................................................ 32 3.5 Gini Progress ...................................................................................................... 33 3.6 Survival Analysis ................................................................................................ 36 4 Threats to Validity ..................................................................................................... 39 5 Conclusions and Future Work....................................................................................41 A. Appendix .................................................................................................................... 43 A.1 SQL Queries ....................................................................................................... 43 A.2 MATLAB Code ................................................................................................... 46 A.3 Numerical Data .................................................................................................. 48 Bibliography .......................................................................................................................81 viii LIST OF FIGURES Figure 2-1: Unified schema .................................................................................................. 7 Figure 2-2: MLS schema ...................................................................................................... 8 Figure 2-3: SCM schema ...................................................................................................... 9 Figure 2-4: TRK schema .....................................................................................................10 Figure 3-1: Income disparity since WWII .......................................................................... 22 Figure 3-2: Defining Gini coefficient using a Lorenz curve .............................................. 23 Figure 3-3: Gini coefficient per project ............................................................................. 24 Figure 3-4: Number of projects per Gini coefficient range ............................................... 25 Figure 3-5: Gini coefficient values in a Box Plot ................................................................ 26 Figure 3-6: Correlation coefficient and plotf o committers and Gini coefficient............. 29 Figure 3-7: Correlation coefficient and plot of commits and Gini coefficient ................. 30 Figure 3-8: Correlation coefficient and plot of duration and Gini coefficient .................. 31 Figure 3-9: Correlation coefficient and plot of aggregated SLOC and Gini coefficient ... 32 Figure 3-10: Negative and positive Gini trends (all projects)............................................ 34 Figure 3-11: Negative and positive Gini trends (projects with actual change rate)........... 35 Figure 3-12: Survival Analysis ............................................................................................ 37 ix LIST OF TABLES Table 2-1: Various sizes .......................................................................................................
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages94 Page
-
File Size-