Identifying Exogenous Drivers and Evolutionary Stages in FLOSS Projects
Total Page:16
File Type:pdf, Size:1020Kb
University of Groningen Quality Factors and Coding Standards - a Comparison Between Open Source Forges Capiluppi, Andrea; Boldyreff, Cornelia; Beecher, Karl; Adams, Paul J. Published in: Electronic Notes in Theoretical Computer Science DOI: 10.1016/j.entcs.2009.02.063 IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2009 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Capiluppi, A., Boldyreff, C., Beecher, K., & Adams, P. J. (2009). Quality Factors and Coding Standards - a Comparison Between Open Source Forges. Electronic Notes in Theoretical Computer Science, 233(C), 89- 103. https://doi.org/10.1016/j.entcs.2009.02.063 Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license. More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne- amendment. Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 29-09-2021 The Journal of Systems and Software 82 (2009) 739–750 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Identifying exogenous drivers and evolutionary stages in FLOSS projects Karl Beecher *, Andrea Capiluppi, Cornelia Boldyreff Centre of Research on Open Source Software – CROSS, Department of Computing and Informatics, University of Lincoln, UK article info abstract Article history: The success of a Free/Libre/Open Source Software (FLOSS) project has been evaluated in the past through Received 11 January 2008 the number of commits made to its configuration management system, number of developers and num- Received in revised form 22 October 2008 ber of users. Most studies, based on a popular FLOSS repository (SourceForge), have concluded that the Accepted 24 October 2008 vast majority of projects are failures. Available online 6 November 2008 This study’s empirical results confirm and expand conclusions from an earlier and more limited work. Not only do projects from different repositories display different process and product characteristics, but Keywords: a more general pattern can be observed. Projects may be considered as early inceptors in highly visible Open Source software repositories, or as established projects within desktop-wide projects, or finally as structured parts of Software evolution Software repositories FLOSS distributions. These three possibilities are formalized into a framework of transitions between repositories. The framework developed here provides a wider context in which results from FLOSS repository mining can be more effectively presented. Researchers can draw different conclusions based on the overall char- acteristics studied about an Open Source software project’s potential for success, depending on the repos- itory that they mine. These results also provide guidance to OSS developers when choosing where to host their project and how to distribute it to maximize its evolutionary success. Ó 2008 Elsevier Inc. All rights reserved. 1. Introduction cluded that the majority of projects housed there should be considered ‘‘tragedies” by virtue of their failure to initiate a steady The environment in which software is deployed is known to series of releases (English and Schweik, 2007). have a direct effect on its subsequent evolution. Lehman’s first The general success of open source software projects has law of software evolution anticipates that useful real-world soft- accompanied the wider establishment of organized repositories ware systems (i.e., E-type) must undergo continuing change in re- aiming to facilitate their development and management. In a pre- sponse to various requirements, in other words they must evolve vious work Beecher (2008) the authors examined a collection of (Lehman et al., 1997). The LAMP stack (Linux, Apache, MySQL, open source projects, and studied instead the exogenous drivers Perl/PHP/Python), the Mozilla Foundation and the BSDs family acting upon them and established to what extent the repositories are well-known examples of open source E-type software systems, in which a project is located affects its evolutionary characteristics. and as such are no exception to this rule. By comparing equally sized random samples from two open source The successful evolution of such open source software projects repositories and also tracking the evolution of projects that moved has been made possible (among other factors) also by their attrac- between them, this earlier study concluded that a repository typi- tion of large communities of both users and developers, two cate- cally has statistically significant effects upon characteristics such gories that notably are not mutually exclusive in open source as the number of contributing developers as well as the period software. Users initiate the need for change and the developers and amount of development activity. implement it (Mockus et al., 2002). The extent to which an open This work extends and expands the previous study in two ways. source project is successful has often been evaluated empirically First, it encompasses a greater number of repositories; instead of by measuring endogenous characteristics, such as the amount of the original two, this paper formulates hypotheses and gathers developer activity, the number of developers, or the size of the pro- empirical evidence from data extracted from six different FLOSS ject (Crowston et al., 2006; Godfrey and Tu, 2000; Robles et al., repositories, and provides further empirical evidence for the earlier 2003). As an example, a thorough study of Sourceforge.net (a pop- assertions. By making multiple comparisons between them, a ular repository of more than 200,000 open source projects) con- structured body of knowledge has been constructed regarding the key practical differences between the individual FLOSS reposi- * Corresponding author. Tel.: +44 (0)1522 886858. tories being studied. Secondly, the paper formulates a framework E-mail address: [email protected] (K. Beecher). of evolution for FLOSS projects, based on the repository to which 0164-1212/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2008.10.026 740 K. Beecher et al. / The Journal of Systems and Software 82 (2009) 739–750 they belong, comprising a typical path of evolution between repos- within the Debian sample, these increased measures could be ob- itories, which exploits better process and product characteristics of served typically after the projects were included in the Debian projects in particular repositories. repository. The present study expands the previous data base and The paper is articulated as follows: Section 2 explores previous results by considering four other repositories (KDE, GNOME, Ruby- work and shows how the findings of this paper extend and expand Forge and Savannah), extracts similar samples from each of the upon past literature on the subject. Section 3 tailors the Goal-Ques- resulting six repositories (50 projects each from the repository’s tion-Metric methodology to this specific case study, and introduces ‘‘stable” pool), and studies four product and process characteristics the empirical hypotheses on which this study is based, the null of the projects in the samples. Based on these experiments, this hypotheses and their alternative counterparts, and discusses how study also provides a more general framework for the evolution they have been operationalized. It also describes which reposito- of FLOSS projects. ries have been selected, how the data has been extracted from There are several tools and data sources which are used to ana- them, and which attributes have been used to characterize their lyze FLOSS projects. FLOSSmole1 is a single point of access to data process and product aspects. Sections 4 and 5 illustrate the results gathered from a number of FLOSS repositories (e.g., SourceForge, gathered, and verifies whether the hypotheses have to be rejected. Freshmeat, Rubyforge). While FLOSSmole provides a simple querying Section 6 provides the discussion for the empirical findings and tool, its main function is to act as a source of data for others to ana- introduces the framework for the evolution of FLOSS projects along lyze. CVSAnaly2 is a tool which is used to measure any analyses from repositories; Section 7 explores the threats to the external and large FLOSS projects (Robles et al., 2004). It is used in this paper to internal validity of this empirical study, while Section 8 provides determine such measures as the number of commits and developers the key findings of this research. associated with a particular project. 2. Previous work 3. Empirical study definition