Uncorrected Proof
Total Page:16
File Type:pdf, Size:1020Kb
JSS 8229 No. of Pages 12, Model 5G ARTICLE IN PRESS 20 November 2008 Disk Used The Journal of Systems and Software xxx (2008) xxx–xxx 1 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss 2 Identifying exogenous drivers and evolutionary stages in FLOSS projects 3 Karl Beecher *, Andrea Capiluppi, Cornelia Boldyreff 4 Q1 Centre of Research on Open Source Software – CROSS, Department of Computing and Informatics, University of Lincoln, UK 5 article info abstract 187 8 Article history: The success of a Free/Libre/Open Source Software (FLOSS) project has been evaluated in the past through 19 9 Received 11 January 2008 the number of commits made to its configuration management system, number of developers and num- 20 10 Received in revised form 22 October 2008 ber of users. Most studies, based on a popular FLOSS repository (SourceForge), have concluded that the 21 11 Accepted 24 October 2008 vast majority of projects are failures. 22 12 Available online xxxx This study’s empirical results confirm and expand conclusions from an earlier and more limited work. 23 Not only do projects from different repositories display different process and product characteristics, but 24 13 Keywords: PROOF a more general pattern can be observed. Projects may be considered as early inceptors in highly visible 25 14 Open source software repositories, or as established projects within desktop-wide projects, or finally as structured parts of 26 15 Software evolution 16 Software repositories FLOSS distributions. These three possibilities are formalized into a framework of transitions between 27 17 repositories. 28 The framework developed here provides a wider context in which results from FLOSS repository mining 29 can be more effectively presented. Researchers can draw different conclusions based on the overall char- 30 acteristics studied about an Open Source software project’s potential for success, depending on the repos- 31 itory that they mine. These results also provide guidance to OSS developers when choosing where to host 32 their project and how to distribute it to maximize its evolutionary success. 33 Ó 2008 Published by Elsevier Inc. 34 35 36 37 1. Introduction cluded that the majority of projects housed there should be 59 considered ‘‘tragedies” by virtue of their failure to initiate a steady 60 38 The environment in which software is deployed is known to series of releases (English and Schweik, 2007). 61 39 have a direct effect on its subsequent evolution. Lehman’s first The general success of open source software projects has 62 40 law of software evolution anticipates that useful real-world soft- accompanied the wider establishment of organized repositories 63 41 ware systems (i.e., E-type) must undergo continuing change in aiming to facilitate their development and management. In a pre- 64 42 response to various requirements, in other words they must evolve vious work Beecher (XXXX) the authors examined a collection of 65 43 (Lehman et al., 1997). The LAMP stack (Linux, Apache, MySQL, Perl/ open source projects, and studied instead the exogenous drivers 66 44 PHP/Python), the Mozilla Foundation and the BSDs family are well- acting upon them and established to what extent the repositories 67 45 known examples of open source E-type software systems, and as in which a project is located affects its evolutionary characteristics. 68 46 such are no exception to this rule. By comparing equally sized random samples from two open source 69 47 The successful evolution of such open source software projects repositories and also tracking the evolution of projects that moved 70 48 has been made possible (among other factors) also by their attrac- between them, this earlier study concluded that a repository typi- 71 49 tion of large communities of both users and developers, two cate- cally has statistically significant effects upon characteristics such 72 50 gories that notably are not mutually exclusive in open source as the number of contributing developers as well as the period 73 51 software. Users initiate the need for change and the developers and amount of development activity. 74 52 implement it (Mockus et al., 2002). The extent to which an open This work extends and expands the previous study in two ways. 75 53 source project is successful hasUNCORRECTED often been evaluated empirically First, it encompasses a greater number of repositories; instead of 76 54 by measuring endogenous characteristics, such as the amount of the original two, this paper formulates hypotheses and gathers 77 55 developer activity, the number of developers, or the size of the pro- empirical evidence from data extracted from six different FLOSS 78 56 ject (Crowston et al., 2006; Godfrey and Tu, 2000; Robles et al., repositories, and provides further empirical evidence for the earlier 79 57 2003). As an example, a thorough study of Sourceforge.net (a pop- assertions. By making multiple comparisons between them, a 80 58 ular repository of more than 200,000 open source projects) con- structured body of knowledge has been constructed regarding 81 the key practical differences between the individual FLOSS reposi- 82 Q2 * Corresponding author. Tel.: +44 01522 886858. tories being studied. Secondly, the paper formulates a framework 83 E-mail address: [email protected] (K. Beecher). of evolution for FLOSS projects, based on the repository to which 84 0164-1212/$ - see front matter Ó 2008 Published by Elsevier Inc. doi:10.1016/j.jss.2008.10.026 Please cite this article in press as: Beecher, K. et al., Identifying exogenous drivers and evolutionary stages in FLOSS projects, J. Syst. Soft- ware (2008), doi:10.1016/j.jss.2008.10.026 JSS 8229 No. of Pages 12, Model 5G ARTICLE IN PRESS 20 November 2008 Disk Used 2 K. Beecher et al. / The Journal of Systems and Software xxx (2008) xxx–xxx 85 they belong, comprising a typical path of evolution between repos- than their SourceForge counterparts; in addition, it was found that, 148 86 itories, which exploits better process and product characteristics of within the Debian sample, these increased measures could be ob- 149 87 projects in particular repositories. served typically after the projects were included in the Debian 150 88 The paper is articulated as follows: Section 2 explores previous repository. The present study expands the previous data base and 151 89 work and shows how the findings of this paper extend and expand results by considering four other repositories (KDE, GNOME, Ruby- 152 90 upon past literature on the subject. Section 3 tailors the Goal-Ques- Forge and Savannah), extracts similar samples from each of the 153 91 tion-Metric methodology to this specific case study, and introduces resulting six repositories (50 projects each from the repository’s 154 92 the empirical hypotheses on which this study is based, the null ‘‘stable” pool), and studies four product and process characteristics 155 93 hypotheses and their alternative counterparts, and discusses how of the projects in the samples. Based on these experiments, this 156 94 they have been operationalized. It also describes which reposito- study also provides a more general framework for the evolution 157 95 ries have been selected, how the data has been extracted from of FLOSS projects. 158 96 them, and which attributes have been used to characterize their There are several tools and data sources which are used to ana- 159 97 process and product aspects. Sections 4 and 5 illustrate the results lyze FLOSS projects. FLOSSmole1 is a single point of access to data 160 98 gathered, and verifies whether the hypotheses have to be rejected. gathered from a number of FLOSS repositories (e.g., SourceForge, 161 99 Section 6 provides the discussion for the empirical findings and Freshmeat, Rubyforge). While FLOSSmole provides a simple querying 162 100 introduces the framework for the evolution of FLOSS projects along tool, its main function is to act as a source of data for others to ana- 163 101 repositories; Section 7 explores the threats to the external and lyze. CVSAnaly2 is a tool which is used to measure any analyses from 164 102 internal validity of this empirical study, while Section 8 provides large FLOSS projects (Robles et al., 2004). It is used in this paper to 165 103 the key findings of this research. determine such measures as the number of commits and developers 166 associated with a particular project. 167 104 2. Previous work 3. Empirical study definition and planning 168 105 There are two main types studies found in the FLOSS literature, 106 one termed external and the other internal to the FLOSS phenome- The Goal-Question-Metric method (GQM) Basili et al. (1994) 169 107 non (Beecher, XXXX). Based on the availability of FLOSS data, the evaluates whetherPROOF a goal has been reached by associating that goal 170 108 former has traditionally used FLOSS artefacts in order to propose with questions that explain it from an operational point of view 171 109 models (Hindle and German, 2005), test existing or new frame- and provide the basis for applying metrics to answer these ques- 172 110 works (Canfora et al., 2007; Livieri et al., 2007), or build theories tions. The aim of the method is to determine the information and 173 111 (Antoniol et al., 2001) to provide advances in software engineering. metrics needed to be able to draw conclusions on the achievement 174 112 The latter includes several other studies that have analyzed the of the goal. 175 113 FLOSS phenomenon per se (Capiluppi, 2003; German, 2004; Herraiz In this study, the GQM method is applied firstly to identify the 176 114 et al., 2008; Stamelos et al., 2002) with their results aimed at both overall goals of this research; then to formulate a number of ques- 177 115 building a theory of FLOSS, and characterizing the results and their tions related to FLOSS repositories and their exogenous (or exter- 178 116 validity specifically as inherent to this type of software and style of nal) effects on the underlying process and product characteristics 179 117 development.