Using Emergent Team Structure to Focus Collaboration
Total Page:16
File Type:pdf, Size:1020Kb
Using Emergent Team Structure to Focus Collaboration by Shawn Minto B.Sc, The University of British Columbia, 2005 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science The Faculty of Graduate Studies (Computer Science) The University Of British Columbia January 30, 2007 © Shawn Minto 2007 ii Abstract To build successful complex software systems, developers must collaborate with each other to solve issues. To facilitate this collaboration specialized tools are being integrated into development environments. Although these tools facilitate collaboration, they do not foster it. The problem is that the tools require the developers to maintain a list of other developers with whom they may wish to communicate. In any given situation, it is the developer who must determine who within this list has expertise for the specific situation. Unless the team is small and static, maintaining the knowledge about who is expert in particular parts of the system is difficult. As many organizations are beginning to use agile development and distributed software practices, which result in teams with dynamic membership, maintaining this knowledge is impossible. This thesis investigates whether emergent team structure can be used to support collaboration amongst software developers. The membership of an emergent team is determined from analysis of software artifacts. We first show that emergent teams exist within a particular open-source software project, the Eclipse integrated development environment. We then present a tool called Emergent Expertise Locator (EEL) that uses emergent team information to propose experts to a developer within their development environment as the developer works. We validated this approach to support collaboration by applying our ap• proach to historical data gathered from the Eclipse project, Firefox and Bugzilla and comparing the results to an existing heuristic for recommending experts that produces a list of experts based on the revision history of individual files. We found that EEL produces, on average, results with higher precision and higher recall than the existing heuristic. iii Contents Abstract ii Contents iii List of Tables v List of Figures vi Acknowledgements vii 1 Introduction 1 1.1 Scenario 2 1.2 Validation Approach • • • 3 1.3 Thesis Structure 3 2 Related Work 4 3 Emergent Teams Exist 7 4 Approach and Implementation 14 4.1 Approach 14 4.1.1 Mechanics 14 4.2 Implementation 17 4.2.1 Extensibility 18 5 Validation 20 5.1 Methodology 20 5.2 Data 24 5.3 Results 26 5.4 Threats 31 5.4.1 Construct Validity : 31 5.4.2 Internal Validity .' 32 •5.4.3 External Validity ....•..'.... 32 6 Discussion 33 6.1 Other Sources of Information 33 6.2 Using Emergent Team Information 34 6.3 Future Evaluation 34 Contents iv 6.4 Limitations 35 7 Summary 37 Bibliography 38 A Complete Results 40 B Name Mapping Method 54 C CVS to Bugzilla Name Mappings 56 C.l Eclipse 56 C.2 Mozilla 59 V List of Tables 3.1 Number of projects each JDT developer has committed to in the last two years other than JDT projects 9 5.1 Bug and change set statistics per project 25 5.2 Optimistic average precision 27 5.3 Optimistic average recall 27 A. l Mapping of test case number to unique combination of bug com• ment partition, size of change set and number of recommendations. 41 B. l E-mail to CVS username mapping statistics per project 55 C. l Eclipse one-to-one username to e-mail mappings 57 C.2 Eclipse one-to-many, duplicate and unknown username to e-mail mappings 58 C.3 Mozilla one-to-one username to e-mail mappings 60 C.4 Mozilla one-to-one username to e-mail mappings continued. ... 61 C.5 Mozilla one-to-one username to e-mail mappings continued. ... 62 C.6 Mozilla one-to-one username to e-mail mappings continued. ... 63 C.7 Mozilla one-to-one username to e-mail mappings continued. ... 64 C.8 Mozilla one-to-many username to e-mail mappings 65 C.9 Mozilla one-to-many username to e-mail mappings continued. 66 C.10 Mozilla one-to-many username to e-mail mappings continued. 67 C.11 Mozilla Duplicate username mappings. 68 C.12 Mozilla Duplicate username mappings continued 69 C.13 Mozilla unknown username mappings 70 vi List of Figures 3.1 JDT developer project activity for mkeller 10 3.2 JDT developer project activity for ffusier 11 3.3 Platform developer project activity for emoffatt 12 3.4 Platform developer project activity for teicher 13 4.1 Context menu list of developers showing the multiple methods of communication available 15 4.2 EEL in use within Jazz 16 4.3 Architecture of EEL 17 5.1 The 9 cases for validation. Lines in this diagram represent a combination of comment partition and change set subset for val• idation purposes 23 5.2 Eclipse optimistic precision 28 5.3 Eclipse optimistic recall. 28 5.4 Bugzilla optimistic precision. 29 .5.5 Bugzilla optimistic recall 29 5.6 Firefox optimistic precision 30 5.7 Firefox optimistic recall 30 A.l All Eclipse optimistic precision 42 A.2 All Eclipse pessimistic precision - 43 A.3 All Eclipse optimistic recall 44 A.4 All Eclipse pessimistic recall 45 A.5 All Bugzilla optimistic precision 46 A.6 All Bugzilla pessimistic precision 47 A.7 All Bugzilla optimistic recall 48 A.8 All Bugzilla pessimistic recall. '. 49 A.9 'All Firefox optimistic precision 50 A.10 All Firefox pessimistic precision 51 A.11 All Firefox optimistic recall 52 A.12 All Firefox pessimistic recall : . 53 vii Acknowledgements I would like to thank my supervisor Gail Murphy for introducing me to research during a co-op work term as an undergraduate. I would not have known about the interesting world of research if it were not for her. Also, through work• ing on Mylar with Mik, my research interests in collaboration and task based development were made apparent. Furthermore, I would like to thank all of the members of the Software Practices Lab for engaging conversations about a range of research topics related to software engineering. Finally, I could not have finished this thesis without all of the love and support from Kenedee over the past two years. 1 Chapter 1 Introduction Software developers must collaborate with each other at all stages of the soft• ware life-cycle to build successful complex software systems. To enable this collaboration, integrated development environments (IDEs) are including an in• creasing number of tools to support collaboration, such as chat support (e.g., ECF1 and the Team Work Facilitation in IntelliJ 2) and screen sharing (e.g., IBM Jazz3). All of these tools have two limitations that make them harder to use than necessary. First, the tools require the user to spend time and effort explaining the tool to all members of a team with whom he may want to communicate over time (i.e., a buddy list). Given that the composition of software teams is increasingly dynamic for many organizations due to agile development pro• cesses, distributed software development and other similar trends, it may not be straightforward for a developer to keep a description of colleagues on the many teams in which she may work up-to-date.4 Second, the tools require the user to determine with whom he should collaborate in a particular situation. This requirement forces the user to have some knowledge of who has expertise on particular parts of the system. To support collaboration amongst members of such dynamic teams, there is a need for a mechanism to determine the composition of the team automatically so that developers do not need to spend time configuring membership lists for the many teams to which they may belong. We believe, for many cases in which collaboration needs to occur, the context from which a developer initiates com• munication combined with information about the activity of developers on items related to that context can be used to determine the appropriate composition of the team. We consider that the team structure emerges from the activity, and thus refer to this problem as determining emergent team structure. In this thesis, we describe an approach and tool, called Emergent Expertise Locator (EEL), that overcomes these limitations for developers working on code. 1ECF is the Eclipse Communications Framework, http://www.eclipse.org/ecf/, verified 12/17/06. 2IntelliJ is a Java development environment, http://www.jetbrains.com/idea/, verified 12/17/06. 3 Jazz is an IBM software development environment supporting team development and designed to incorporate all development artifacts and processes for a company. Some of the features included are source control, issue tracking and synchronous communication through chat. 4As one example, the Eclipse development process uses dynamic teams as described by Gamma and Wiegand in an EclipseCon 2005 presentation, http://eclipsecon.org/2005/presentations/econ2005-eclipse-way.pdf, verified 12/17/06. Chapter 1. Introduction 2 The intuition is that a useful definition of a team, from the point of view of aid• ing collaboration, are those colleagues who can provide useful help in solving a particular problem. We approximate the nature of a problem by the file(s) on which a developer is working. Based on the history of how files have changed in the past together and who has participated in the changes, we can recommend members of an emergent team for the current problem of interest. Our ap• proach uses the framework from Cataldo et. al.[2], adapting their matrix-based computation to support on-line recommendations using different information, specifically files rather than task communication evidence.