Measuring Developer Contribution from Software Repository Data
Total Page:16
File Type:pdf, Size:1020Kb
Measuring Developer Contribution from Software Repository Data Eirini Kalliamvakou, Georgios Gousios, Diomidis Spinellis, Nancy Pouloudi Department of Management Science and Technology Athens University of Economics and Business Athens, Greece ikaliam,gousiosg,dds,[email protected] Abstract coordinate with colleagues effectively and to use a variety of tools that produce and modify code with minimal input Our work is concerned with an enriched perspective from his or her side. This change has become more apparent of what constitutes developer contribution in software in- with the emergence of Open Source Software (OSS). frastructures supporting incremental development and dis- In this respect, a developer contributes to a wide range of tributed software projects. We use the term “contribution” activities both involving the process and the product. Such an to express the combination of all the actions a developer enriched perspective on a developer’s contribution requires has performed during the development process and propose all individual actions to be taken into account. In this paper a model for calculating this individually for developers par- we discuss and measure contribution in this respect; a com- ticipating in a software project. Our approach departs from bination of all the actions a person has performed during the the traditional practice of only measuring the contribution to software development process weighted for their significance the final outcome (the code) and puts emphasis additionally to the specific project. Our practice, then, encompasses the on other activities that do not directly affect the product contribution to the final outcome as well as to the process itself but are essential to the development process. We use that generated it. the Open Source Software (OSS) context to take advantage This paper introduces a new model for measuring de- of the public availability of data in software repositories. veloper contribution, assuming that a more comprehensive In this paper, we present our method of calculation and its image can be formed about a developer’s contribution by system implementation and we apply our measurements on combining actions directed towards the product itself and various projects from the GNOME ecosystem. the process that yields it. For implementing our contribution calculation algorithm we have combined our proposed model 1. Introduction of calculation with repository mining techniques. We provide a visual representation of the results, thus offering rich An important aspect of all engineering principles is the information regarding the total contribution per developer assessment of the contribution of individuals that work on a and how it is divided among different actions during the de- project. Contribution assessments are performed to monitor velopment process. Our initial observations set the basis for the rate of project development, identify implementation discussing contribution to multiagent, distributed software bottlenecks and isolate exceptional cases, while the results of projects based on this new kind of information. contribution assessments can help with project planning and future estimations. An open issue linked with contribution 2. Existing work assessment is the definition of what contribution is in a particular context and also the selection and application of We use the term “contribution” to express the combination the appropriate measurements. of all the actions a developer has performed during the In software engineering, contribution assessment entails development process. In today’s changing software devel- the measurement of the contribution of a person in terms opment environment a developer’s work items have been of lines of code (LOC) or function points towards the final enriched with the addition of further activities that benefit development of a software project [Kan, 2003]. This practice the whole project, and this reality needs to be reflected. clearly focuses on the contribution to the final outcome of Contribution, as a notion, encapsulates other notions that the project (i.e. the source code). To this end, only LOC have been frequently used in the literature to express activity, is regarded as measured contribution. In recent years, how- participation, effort or performance. In these cases, we see ever, the shift towards modern development practices and that although the name changes, the same concept is being the proliferation of software and project management tools described and the same measurement is used. challenge this perspective. A software developer today is not Productivity is a reoccurring discussion in all processes only required to write code, but also to communicate and that involve inputs and outputs. In economic terms, pro- ductivity is the ratio of output to input, the output of developers’ contribution that accounts not only for the LOC a process divided by the effort required to produce it. that they have produced but also their support via posting In [Walston and Felix, 1977], programmer productivity is to mailing lists, submitting bug reports and building wikis. defined as the ratio of the delivered source lines of code (DSL) to the total effort in man-months (MM) required to 3. Our approach produce the delivered program. Input and output in soft- ware engineering processes are frequently addressed with Our work is concerned with the measurement of de- output usually measured in LOC [Walston and Felix, 1977], veloper involvement and activity in the face of incremen- [Asundi, 2005], [Maxwell and Forselius, 2000]. As the LOC tal and distributed development practices. The model we metric cannot be determined safely before the end of the are building exploits the availability of publicly accessi- project, function point analysis usually complements it. ble software repositories to perform measurements and its Input, on the other hand, is not as a straightforward notion system implementation can run fully automatically with in software development and its calculation requires further no human intervention. The current paper extends previous explanation. work [Gousios et al., 2008], both theoretically as well as In a software project there are several assets that receive technically. Specifically, we present an updated method of input [Hertel et al., 2003], [Koch and Schneider, 2000], calculation and a more detailed table of actions. Also, we leaving trails of the actions of participating developers. have applied our methods and measurements to generate Participation and performance of developers, which can results. be calculated from their input, are frequently discussed Our model departs from the classic measurement practices in productivity contexts. Again, although it is noted as it does not consider the added lines of source code as that OSS developers provide many different kinds of the only contribution metric. This is a deliberate choice services to their projects, participation is measured in that we believe better reflects how software is developed terms of number of source code contributions, showing using modern development methodologies, in the context a complete focus on participation to the outcome, while of which, an important portion of development time is spent performance is mainly expressed in terms of rank on communication and manipulation of development support advancement [Roberts et al., 2006]. tools. Our model does not neglect the importance of source The shortcomings of just measuring LOC to account for code either; we still use the lines of code which we have a developer’s significance to a project have been discussed represented via three actions (CADD, CREM, CCGN), but we by researchers [Amor et al., 2006]. Aiming to estimate cost also combine them with the developers’ other actions on in the OSS context, Amor et al. propose that developer the project. We argue that this combination provides a more activity should be calculated. Although this is usually done complete image of how much a developer has contributed to by means of LOC, they stress that it is necessary to enhance the software development process, not accounting only for this by a more detailed description of activity that accounts writing code. for actions other than simply writing code. This is a first To identify which actions can be classified as contribu- attempt to move from focus on the outcome to examining tion, we follow a hierarchical, top-down approach: we first the whole process. Cost is considered a function of effort, identify the project assets that can potentially receive contri- which in turn is considered a function of activity and bution and then analyze the actions that can be performed on suggested sources of information include CVS repositories, each of the identified assets to see if they constitute a contri- mailing list archives and bug tracking systems. In this bution or not. The actions have been initially identified intu- regard, Amor et al. differentiate from previous literature that itively and through personal experience and based on related regards participation of developers simply as the addition literature [Hertel et al., 2003], [Koch and Schneider, 2000], of LOC [Koch and Schneider, 2002], [Mockus et al., 2002], [Amor et al., 2006]. After consulting with experts the table [Mockus and German, 2003]. is updated and refined. Today, with software development following more ag- In Table 1, we present a non-exhaustive breakdown of ile practices, developers in a project contribute to more actions that can be performed