Indicators for Missing Maintainership in Collaborative Open Source Projects
Total Page:16
File Type:pdf, Size:1020Kb
TECHNISCHE UNIVERSITÄT CAROLO-WILHELMINA ZU BRAUNSCHWEIG Studienarbeit Indicators for Missing Maintainership in Collaborative Open Source Projects Andre Klapper February 04, 2013 Institute of Software Engineering and Automotive Informatics Prof. Dr.-Ing. Ina Schaefer Supervisor: Michael Dukaczewski Affidavit Hereby I, Andre Klapper, declare that I wrote the present thesis without any assis- tance from third parties and without any sources than those indicated in the thesis itself. Braunschweig / Prague, February 04, 2013 Abstract The thesis provides an attempt to use freely accessible metadata in order to identify missing maintainership in free and open source software projects by querying various data sources and rating the gathered information. GNOME and Apache are used as case studies. License This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. Keywords Maintenance, Activity, Open Source, Free Software, Metrics, Metadata, DOAP Contents List of Tablesx 1 Introduction1 1.1 Problem and Motivation.........................1 1.2 Objective.................................2 1.3 Outline...................................3 2 Theoretical Background4 2.1 Reasons for Inactivity..........................4 2.2 Problems Caused by Inactivity......................4 2.3 Ways to Pass Maintainership.......................5 3 Data Sources in Projects7 3.1 Identification and Accessibility......................7 3.2 Potential Sources and their Exploitability................7 3.2.1 Code Repositories.........................8 3.2.2 Mailing Lists...........................9 3.2.3 IRC Chat.............................9 3.2.4 Wikis............................... 10 3.2.5 Issue Tracking Systems...................... 11 3.2.6 Forums............................... 12 3.2.7 Releases.............................. 12 3.2.8 Patch Review........................... 13 3.2.9 Social Media............................ 13 3.2.10 Other Sources........................... 14 3.3 Weighting and Relevance of Sources................... 14 3.4 Sources in Potential Case Studies.................... 15 3.5 Selection of Case Studies......................... 18 3.5.1 The Apache Organization.................... 18 3.5.2 The GNOME organization.................... 19 4 Concept and Implementation 21 4.1 Existing Tools for Specific Sources.................... 21 4.1.1 Code Repositories......................... 21 4.1.2 Mailing Lists........................... 22 4.1.3 IRC Chat............................. 22 4.1.4 Wikis............................... 23 vii Contents 4.1.5 Issue Tracking Systems...................... 23 4.1.6 Forums............................... 23 4.1.7 Releases.............................. 24 4.1.8 Patch Review........................... 24 4.1.9 Social Media............................ 24 4.2 Data Collection Implementation..................... 24 4.2.1 Prerequisites............................ 24 4.2.2 Data Queries........................... 27 4.3 Rating Criteria.............................. 31 4.3.1 Code Repository Activity.................... 31 4.3.2 Issue Tracker Activity...................... 32 5 Results and Interpretation 33 5.1 Results................................... 33 5.1.1 Metadata Consistency and Completeness............ 33 5.1.2 Code Repositories......................... 34 5.1.3 Issue Tracker........................... 37 5.2 Rating Results.............................. 38 5.3 Interpretation............................... 39 5.4 Threats to Validity and Data Correctness................ 40 6 Conclusions 42 6.1 Data Analysis............................... 42 6.2 Recommended Actions.......................... 43 6.2.1 Data Gathering: On Metadata Availability and Consistency. 43 6.2.2 Taking Action on Results.................... 44 6.3 Outlook.................................. 45 Bibliography 47 A Source Code 62 A.1 Database Setup.............................. 62 A.2 Apache Subversion Checkout and Update................ 63 A.3 GNOME Git Checkout and Update................... 64 A.4 Extracting Metadata from Apache DOAP files............. 64 A.5 Extracting Metadata from GNOME DOAP files............ 65 A.6 Maintainer Information from Apache JIRA............... 67 A.7 Code Repository Activity Data in Apache............... 68 A.8 Code Repository Activity Data in GNOME.............. 69 A.9 Code Repository Activity Rating in Apache.............. 70 A.10 Code Repository Activity Rating in GNOME............. 71 A.11 Issue Tracker Activity Data in Apache................. 72 A.12 Issue Tracker Activity Data in GNOME................ 76 A.13 Issue Tracker Activity Rating in Apache................ 78 viii Contents A.14 Issue Tracker Activity Rating in GNOME............... 80 B List of Projects by Maintenance Rating 82 B.1 Apache: ratings.............................. 82 B.2 GNOME: ratings............................. 83 C Code Repository Activity 92 C.1 Apache................................... 92 C.2 GNOME.................................. 93 D Disclosure of Interest 102 ix List of Tables 3.1 Organization Infrastructures....................... 16 3.2 Organization Infrastructures (continued)................ 17 5.1 Metadata Consistency and Completeness................ 33 5.2 Commits and Committers in Apache: Absolute Numbers....... 35 5.3 Commits and Committers in Apache: Percentage Values....... 35 5.4 Commits and Committers in GNOME: Absolute Numbers...... 36 5.5 Commits and Committers in GNOME: Percentage Values....... 36 5.6 Number of Projects per Rating Category................ 38 x 1 Introduction 1.1 Problem and Motivation „Open source development is a form of distributed, collaborative, asynchronous, partly volunteer, software development“ [DM08, p. 159]. It is largely driven and maintained by volunteers who work on development of software in order to solve a problem they face [Gra04, p. 236]. Potential reasons for involvement include „fun, reputation, learning, enjoyment, and peer recognition“ [vKS07, p. 239]. According to [MH03, p. 105], „the majority of Free Software projects are directed by a single lead developer — usually the software’s original author — who assumes the role of maintainer, integrates patches submitted from developers in his or her user community, and organizes and coordinates releases of the software.“ At any time, a maintainer of a free software/open source project1 (in short: FOSS project) might lose interest in continuing development and become inactive. „As a result of the centralized decision making structure, development stalls in the absence of the lead developer“ [MH03, p. 105]. While spending less time is often realized by a maintainer, „there’s no point at which he consciously realizes that he can no longer fulfill the duties of the role“ [Fog06, p. 216] as the process takes place grad- ually. Furthermore, developers feel emotionally bound to their project and ask the community „to be patient but never find the time“ [MH03, p. 105]. Hence no public call for new contributors is issued either. To make the problem worse, „most free software projects are distributed, which makes it hard to quickly identify volunteers who neglect their duties“ [Mic04, p. 93]. Terms commonly used for this situation are AWOL (Away/Absent Without Official Leave) and MIA (Missing in Action) [Mic04, p. 95]. About 80% of the projects registered on the popular open source platform Source- Forge do not see any error reports, patches, feature request or other measurable in- teraction with the community after their registration [CMP05, p. 9]. Such a project can be considered inactive and unmaintained if no code development takes place and if nobody feels responsible for its state. The code base rots and does not get ported to potential new technologies in its environment, or falls behind the quality of competing projects. Inactivity is often not recognized by parties with interest in further development who could have stepped up to fill the gap if they only had known early enough. 1The terms Free Software and Open Source are often used synonymously. However, the first and older term implies an ideology and suffers from the ambiguity of the term free in English language (free as in freedom but also without costs) while Open Source rather focuses on technical aspects. For elaboration see [Fog06, p. 11–15]. 1 1 Introduction 1.2 Objective Big and established FOSS organizations can consist of hundreds of projects (modular packages) specialized on providing certain functionality in the stack [Gra04, p. 237]. Such packages, or modules, are often led and maintained by one person or a small core team of developers who lead and moderate discussions and decide on the direc- tion of the project and design questions [Seb08, p. 9][Gra04, p. 237]. Previous research shows that most projects only have a small number of developers — [CMP05, p. 6] proves that 82.6% of the projects hosted on SourceForge have only one or two devel- opers and [Wei05, p. 31] shows that the average number of developers per project on SourceForge is 2.0067. Most of these teams are loosely organized and do not require any official membership. Positions are frequently filled by individuals based on mer- its (meritocracy2), determined by previous contributions, commitment (cf. [Gra04, p. 237]), and reputation. As a result, volunteers and paid developers often work together on a project (see for example [Nea10] for an analysis of the contributor base of the GNOME organization). Moreover, most projects face the problem of only having a few main developers.