Using Software Archaeology to Measure Knowledge Loss in Software Projects Due to Developer Turnover ∗
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009 Using Software Archaeology To Measure Knowledge Loss in Software Projects Due To Developer Turnover ∗ Daniel Izquierdo-Cortazar, Gregorio Robles, Felipe Ortega and Jesus M. Gonzalez-Barahona GSyC/LibreSoft Universidad Rey Juan Carlos (Madrid, Spain) dizquierdo, grex, jfelipe, jgb @gsyc.escet.urjc.es { } Abstract The lifespan of a software system can range from sev- eral years to decades. In such scenarios, the develop- Developer turnover can result in a major problem ment team in charge of the software may suffer from when developing software. When senior developers turnover: old developers leave while new developers abandon a software project, they leave a knowledge gap join the project. With the abandonment of old, senior that has to be managed. In addition, new (junior) devel- developers, projects lose human resources experienced opers require some time in order to achieve the desired both with the details of the software system and with the level of productivity. In this paper, we present a method- organizational and cultural circumstances of the project. ology to measure the effect of knowledge loss due to de- New developers will need some time to become famil- veloper turnover in software projects. For a given soft- iar with both issues. In this regard, the time for volun- ware project, we measure the quantity of code that has teers to become core contributors to FLOSS 1 projects been authored by developers that do not belong to the has been measured to be 30 months in mean, since their current development team, which we define as orphaned first contribution [11]. code. Besides, we study how orphaned code is managed Although maintaining the current development team by the project. Our methodology is based on the concept could be thought as a plausible solution to mitigate of software archaeology, a derivation of software evolu- this problem, turnover is usually unavoidable. Being tion. As case studies we have selected four FLOSS (free, a highly intellectual work, developers have a tendency libre, open source software) projects, from purely driven to lose the original motivation on the software system as by volunteers to company-supported. The application time passes by, and they have the natural desire to search of our methodology to these case studies will give in- for new objectives. In this sense, high turnovers have sight into the turnover that these projects suffer and how been observed in most large FLOSS projects, where sev- they have managed it and shows that this methodology eral generations of successive development teams have is worth being augmented in future research. been identified [21]. Although these environments are Keywords: developer turnover, orphaning, software partially, if not mostly, driven by volunteers, turnover in risk management, software archaeology industrial environments is also high. In this paper, we present a methodology to measure the effect of developer turnover in software projects. It 1. Introduction is based on quantifying the knowledge that developers contribute to a project based on the number of lines of code written for it. In case developers leave, their lines Software development is an activity intense in hu- become orphaned.Theamountoforphaned lines can man resources. The work of many developers is re- be considered as a measure of the knowledge that the quired to create almost any non-trivial piece of code. 1Through this paper we will use the term FLOSS to refer to “free, ∗This work has been funded in part by the European Commission, libre, open source software”, including code that conforms either to the under the FLOSSMETRICS (FP6-IST-5-033547), QUALOSS (FP6- definition of “free software” (according to the Free Software Founda- IST-5-033547) and QUALIPSO (FP6-IST-034763) projects, and by tion) or “open source software” (according to the Open Source Initia- the Spanish CICyT, project SobreSalto (TIN2007-66172). tive). 978-0-7695-3450-3/09 $25.00 © 2009 IEEE 1 Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009 project has lost with the abandonment of developers. chapters are devoted to the management of such issues. Hence, orphaned lines constitute a legacy of past devel- Developer turnover is one of the most important factors opers. As the amount of orphaned lines increases, the to be considered. Nonetheless, in the editors notice of a project may become more obscure to the current devel- special issue on software risk management [2], Boehm opment team, since major parts of the code were written and DeMarco state that “Indian software managers re- by developers who are no longer available. vealed that they perceived personnel turnover as their Our methodology can be useful to determine the biggest source of risk”. A questionnaire addressed by knowledge that the current development team has about Otte [18] shows that around a 12% of the total SLOCs the software system. This is specially interesting in in average is abandoned code. He demonstrated that the case of software maintenance, a non-trivial task projects with high levels of abandoned code tend to re- that accumulates over 80% of total activity in software port more bugs. projects [3]. In our vision, projects with a high share of Beyond pure industrial settings, recent research has orphaned lines have an inferior starting position, since focused on this issue in the case of FLOSS projects. they need to maintain code authored by others. Mockus et al. identified that in FLOSS projects a small, We have applied our methodology to several FLOSS but very active number of developers perform a large projects. Among them there are some led by companies, part of the software development; they labelled this following development models which can be considered group as the core group [16]. Robles et al. studied similar to those of non-FLOSS software development. the composition of the core group for several FLOSS Therefore results can (in part) be extended to traditional projects over time and found that only a minority of software development. However, the methodology is es- projects shown a stable core group for many years [21]. pecially interesting for FLOSS projects, since the de- Most projects showed a pattern where generations of de- scriptive information obtained can be of great interest velopers successively take the lead for a limited amount for decision-making developers. It should be noted that of time, generally in the range of two to four years. regarding in-house development, companies may have Michlmayr et al. introduced the measure of half-life for much more knowledge over projects they manage di- the human resources of a project [15]. They analyzed the rectly. Nonetheless, in the FLOSS world, where devel- population of developers participating in a project and opment is performed following distributed, highly dy- identified the point in time when only half of them are namic patterns, and involving volunteers, gaining insight still active. In the case of Debian, the half-life observed into a project is a major issue. was 7.5 years. However, other studies have pointed The structure of this paper is as follows. Next sec- out that packages maintained by Debian developers who tion presents related research. The third section intro- abandoned the project were later maintained by others, duces our methodology, based on extracting authorship showing a natural “regeneration” process [22]. information by mining the source control management system of software projects. We also highlight the new In general, developers who leave a project some- aspects that software archaeology introduces in research how carry with them their knowledge about it. If that on software maintenance and evolution. The fourth sec- knowledge is not shared with other developers, it can tion gives some insight into the four FLOSS projects that be considered as disappearing from the project. How to have been selected as case studies, while the fifth one characterize that knowledge is a common case of study. shows the results of applying our methodology on them. Hutchison [13] concludes that “The knowledge system Then, the methodology is discussed in detail, as well as is not the individual but the entire history of problem current limitations and challenges that further research solving teams in which individuals actively participate”. may target are also presented. Finally, conclusions are Ge et al., focusing on FLOSS projects, state that the ex- drawn in the last section. pertise is not centered in just one member [7]. They conclude that no single member is the owner of the whole knowledge, even for specific parts of the code. 2. Related research However it is a challenge to recover all this expertise (knowledge) when developers leave, because it is usu- Risk management during the software development ally not recorded anywhere. In the case of FLOSS and maintenance process has been a matter of research projects, it is recorded only in part, in mailing lists for a long time [1]. Some of the risks affecting soft- archives, source code management or bug tracking sys- ware development are related to human resources. In tems, where source code, new ideas or comments are DeMarco and Lister’s classic Peopleware [4], several registered. All this common knowledge can help the 2 Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009 “regeneration” of knowledge once a given developers time, as the software experiments further changes. leaves. The life of orphaned lines is different depending on German observed empirically that developers have what is happening in the project. For instance, if de- a tendency to work on specific parts of their software velopers are focused on adding new functionality, it is project, so that it is easy to determine the files in which likely that orphaned lines are only seldom touched.