Using Emergent Team Structure to Focus Collaboration

by

Shawn Minto

B.Sc, The University of British Columbia, 2005

A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF

Master of Science

The Faculty of Graduate Studies

(Computer Science)

The University Of British Columbia

January 30, 2007

© Shawn Minto 2007 ii

Abstract

To build successful complex software systems, developers must collaborate with each other to solve issues. To facilitate this collaboration specialized tools are being integrated into development environments. Although these tools facilitate collaboration, they do not foster it. The problem is that the tools require the developers to maintain a list of other developers with whom they may wish to communicate. In any given situation, it is the developer who must determine who within this list has expertise for the specific situation. Unless the team is small and static, maintaining the knowledge about who is expert in particular parts of the system is difficult. As many organizations are beginning to use agile development and distributed software practices, which result in teams with dynamic membership, maintaining this knowledge is impossible. This thesis investigates whether emergent team structure can be used to support collaboration amongst software developers. The membership of an emergent team is determined from analysis of software artifacts. We first show that emergent teams exist within a particular open-source software project, the Eclipse integrated development environment. We then present a tool called Emergent Expertise Locator (EEL) that uses emergent team information to propose experts to a developer within their development environment as the developer works. We validated this approach to support collaboration by applying our ap• proach to historical data gathered from the Eclipse project, and and comparing the results to an existing heuristic for recommending experts that produces a list of experts based on the revision history of individual files. We found that EEL produces, on average, results with higher precision and higher recall than the existing heuristic. iii

Contents

Abstract ii

Contents iii

List of Tables v

List of Figures vi

Acknowledgements vii

1 Introduction 1 1.1 Scenario 2 1.2 Validation Approach • • • 3 1.3 Thesis Structure 3

2 Related Work 4

3 Emergent Teams Exist 7

4 Approach and Implementation 14 4.1 Approach 14 4.1.1 Mechanics 14 4.2 Implementation 17 4.2.1 Extensibility 18

5 Validation 20 5.1 Methodology 20 5.2 Data 24 5.3 Results 26 5.4 Threats 31 5.4.1 Construct Validity : 31 5.4.2 Internal Validity .' 32 •5.4.3 External Validity ....•..'.... 32

6 Discussion 33 6.1 Other Sources of Information 33 6.2 Using Emergent Team Information 34 6.3 Future Evaluation 34 Contents iv

6.4 Limitations 35

7 Summary 37

Bibliography 38

A Complete Results 40

B Name Mapping Method 54

C CVS to Bugzilla Name Mappings 56 C.l Eclipse 56 C.2 59 V

List of Tables

3.1 Number of projects each JDT developer has committed to in the last two years other than JDT projects 9

5.1 Bug and change set statistics per project 25 5.2 Optimistic average precision 27 5.3 Optimistic average recall 27

A. l Mapping of test case number to unique combination of bug com• ment partition, size of change set and number of recommendations. 41

B. l E-mail to CVS username mapping statistics per project 55

C. l Eclipse one-to-one username to e-mail mappings 57 C.2 Eclipse one-to-many, duplicate and unknown username to e-mail mappings 58 C.3 Mozilla one-to-one username to e-mail mappings 60 C.4 Mozilla one-to-one username to e-mail mappings continued. ... 61 C.5 Mozilla one-to-one username to e-mail mappings continued. ... 62 C.6 Mozilla one-to-one username to e-mail mappings continued. ... 63 C.7 Mozilla one-to-one username to e-mail mappings continued. ... 64 C.8 Mozilla one-to-many username to e-mail mappings 65 C.9 Mozilla one-to-many username to e-mail mappings continued. . . 66 C.10 Mozilla one-to-many username to e-mail mappings continued. . . 67 C.11 Mozilla Duplicate username mappings. . 68 C.12 Mozilla Duplicate username mappings continued 69 C.13 Mozilla unknown username mappings 70 vi

List of Figures

3.1 JDT developer project activity for mkeller 10 3.2 JDT developer project activity for ffusier 11 3.3 Platform developer project activity for emoffatt 12 3.4 Platform developer project activity for teicher 13

4.1 Context menu list of developers showing the multiple methods of communication available 15 4.2 EEL in use within Jazz 16 4.3 Architecture of EEL 17

5.1 The 9 cases for validation. Lines in this diagram represent a combination of comment partition and change set subset for val• idation purposes 23 5.2 Eclipse optimistic precision 28 5.3 Eclipse optimistic recall. 28 5.4 Bugzilla optimistic precision. 29 .5.5 Bugzilla optimistic recall 29 5.6 Firefox optimistic precision 30 5.7 Firefox optimistic recall 30

A.l All Eclipse optimistic precision 42 A.2 All Eclipse pessimistic precision - 43 A.3 All Eclipse optimistic recall 44 A.4 All Eclipse pessimistic recall 45 A.5 All Bugzilla optimistic precision 46 A.6 All Bugzilla pessimistic precision 47 A.7 All Bugzilla optimistic recall 48 A.8 All Bugzilla pessimistic recall. . . . '. 49 A.9 'All Firefox optimistic precision 50 A.10 All Firefox pessimistic precision 51 A.11 All Firefox optimistic recall 52 A.12 All Firefox pessimistic recall : . . 53 vii

Acknowledgements

I would like to thank my supervisor Gail Murphy for introducing me to research during a co-op work term as an undergraduate. I would not have known about the interesting world of research if it were not for her. Also, through work• ing on Mylar with Mik, my research interests in collaboration and task based development were made apparent. Furthermore, I would like to thank all of the members of the Software Practices Lab for engaging conversations about a range of research topics related to software engineering. Finally, I could not have finished this thesis without all of the love and support from Kenedee over the past two years. 1

Chapter 1

Introduction

Software developers must collaborate with each other at all stages of the soft• ware life-cycle to build successful complex software systems. To enable this collaboration, integrated development environments (IDEs) are including an in• creasing number of tools to support collaboration, such as chat support (e.g., ECF1 and the Team Work Facilitation in IntelliJ 2) and screen sharing (e.g., IBM Jazz3). All of these tools have two limitations that make them harder to use than necessary. First, the tools require the user to spend time and effort explaining the tool to all members of a team with whom he may want to communicate over time (i.e., a buddy list). Given that the composition of software teams is increasingly dynamic for many organizations due to agile development pro• cesses, distributed software development and other similar trends, it may not be straightforward for a developer to keep a description of colleagues on the many teams in which she may work up-to-date.4 Second, the tools require the user to determine with whom he should collaborate in a particular situation. This requirement forces the user to have some knowledge of who has expertise on particular parts of the system. To support collaboration amongst members of such dynamic teams, there is a need for a mechanism to determine the composition of the team automatically so that developers do not need to spend time configuring membership lists for the many teams to which they may belong. We believe, for many cases in which collaboration needs to occur, the context from which a developer initiates com• munication combined with information about the activity of developers on items related to that context can be used to determine the appropriate composition of the team. We consider that the team structure emerges from the activity, and thus refer to this problem as determining emergent team structure. In this thesis, we describe an approach and tool, called Emergent Expertise Locator (EEL), that overcomes these limitations for developers working on code.

1ECF is the Eclipse Communications Framework, http://www.eclipse.org/ecf/, verified 12/17/06. 2IntelliJ is a Java development environment, http://www.jetbrains.com/idea/, verified 12/17/06. 3 Jazz is an IBM software development environment supporting team development and designed to incorporate all development artifacts and processes for a company. Some of the features included are source control, issue tracking and synchronous communication through chat. 4As one example, the Eclipse development process uses dynamic teams as described by Gamma and Wiegand in an EclipseCon 2005 presentation, http://eclipsecon.org/2005/presentations/econ2005-eclipse-way.pdf, verified 12/17/06. Chapter 1. Introduction 2

The intuition is that a useful definition of a team, from the point of view of aid• ing collaboration, are those colleagues who can provide useful help in solving a particular problem. We approximate the nature of a problem by the file(s) on which a developer is working. Based on the history of how files have changed in the past together and who has participated in the changes, we can recommend members of an emergent team for the current problem of interest. Our ap• proach uses the framework from Cataldo et. al.[2], adapting their matrix-based computation to support on-line recommendations using different information, specifically files rather than task communication evidence. EEL produces a col• laboration matrix C through the computation, C = (FAFD)FJ where FA is a file authorship matrix and FD is a file dependency matrix. Further information on the'matrices and the computation are presented in Section 4.1.1. After this computation, a value in C^- describes which developers should interact based on the revisions that they have committed in the past to the repository. We use these values in C to recommend a ranked list of the likely emergent team members with whom to communicate given a set of files currently of interest.

1.1 Scenario

To describe why and how EEL can help developers as they work, we describe a scenario of a common development task that may require communication between two developers to gather knowledge required to solve an issue. This scenario provides insight into the simplicity and usefulness of EEL. Selena, Dave and John are developers on the same open-source project. Selena, Dave and John each live in a different part of the world. This project has been released and the developers are currently working to fix bugs that have been reported by users. Even though these three developers all work on the same project, each is more knowledgeable about a different area of the code than the others because he or she has worked on that part of the'code base more frequently. Many of the bugs that have been reported refer to the core data model for the system, a part of the system for which Selena is the most knowledgeable. Dave, on the other hand, was in charge of the external representation of the model for the release. Since Dave is somewhat knowledgeable about the model, and there have been no bugs reported related to his portion of the system, he decides to help Selena and fix some bugs that are more appropriate for her. Dave picks a bug. To start working on it, he investigates a stack trace provided in the bug that is related to the error. Although he can reproduce the problem, he is unable to determine the source of the problem because it requires extensive knowledge of how events are sent and handled within the model. Since he is unaware of how this works, he right clicks on the file on which he is currently working and views a list of people to contact on the team associated with that part of the software as determined by EEL. This ranked list shows Selena listed as the most knowledgeable followed by John. Since Dave knows that Selena was busy, he does not want to contact her. However, EEL has made it evident Chapter 1. Introduction 3 that John is also knowledgeable, a fact of which Dave was unaware. John is listed because he worked part time on the model, but is very knowledgeable of it. Dave decides to contact John through chat and is able to gain the knowledge that he needed to fix the bug. If John was not online, Dave could have used an asynchronous communication method like e-mail to contact him. EEL is not only useful during the exploration of a system to fix a bug, but it can also be useful during regular development and testing and during mentoring when junior developers are becoming familiar with a system. EEL is integrated into an IDE such that at any point while working on a system, a developer can open the context menu on a file, determine who they may contact to gather more information about a task that they are currently working on and easily initiate communication with the appropriate colleague.

1.2 Validation Approach

To determine the accuracy of EEL in predicting emergent teams, we applied the approach to historical data for the Eclipse project, Firefox and Bugzilla. The validation of EEL was fully automated and did not involve users since it is difficult to recruit developers if there is no evidence of the usefulness of the tool. To perform the validation, we needed two pieces of information, a bug report and the list of files needed to fix the bug report. Bug reports provide a record of communication on a particular issue and we use the commenters on the bug as a list of potential experts. We populate EEL using the list of files needed to fix the bug as determined using a standard means of associating bugs and file revisions (See Section 5.2). We then used the recommendations from EEL and the list of potential experts from the bug report to calculate the precision and recall. We then compared the performance of EEL to an existing heuristic for recommending experts that produces a list of experts based on the revision history of individual files. We found that EEL produces, on average, results with higher precision and higher recall than the existing heuristic.

1.3 Thesis Structure

We begin by comparing our approach with existing work on locating experts (Chapter 2). Next we show that emergent teams exist in Chapter 3.' Next, we describe our approach and implementation (Chapter 4) before presenting our validation of the approach (Chapter 5). Before summarizing, we discuss outstanding issues with our approach and validation (Chapter 6). 4

Chapter 2

Related Work

Three types of approaches have been used to recommend experts for a software development project: heuristic-based (e.g., [15]), social network-based (e.g., [16]) and machine learning-based (e.g., [1]). Heuristic-based recommenders apply heuristics against data collected from and about the development to determine who is expert in various areas of the system. Some approaches require users to maintain profiles that describe their area of expertise (i.e., Hewlett-Packard's CONNEX1) or organizational position (i.e., [13]). CONNEX is a traditional expertise finder which requires users to maintain a profile of their expertise. CONNEX then allows users to search or browse the directory of profiles looking for a person with the expertise in which they are interested . Expertise Recommender (ER) uses an organizational chart of departments within a company to determine if an expert should be recom• mended based on the "distance" the departments are from each other [13]. This allows ER to limit the recommendations to people who are in departments that are "connected" to the department of the developer requesting the recommen• dation. These profiles can be effective because they gather information from the source of the expertise. Unfortunately, it is difficult to keep such profiles up-to- date. During a field study of expertise location, it was found that a seven-year old profile-based system was available but the profiles had never been updated [12]. To avoid this problem, EEL does not use any profile-based information. Other heuristic-based expertise recommenders are based solely on data ex• tracted from the archives of the software development. The Expertise Browser (ExB), for example, uses experience atoms (EA), basic units of experience, as the basis for recommending experts [15]. Experience atoms are created by min• ing the version control system for the author of each file revision and the changes made to the file. A mined experience atom is then associated with multiple do• mains (e.g., the file containing a modification, the technology used, the purpose of the change and/or the release of the software). A simple counting of experi• ence atoms for each domain in question is then used determine the experience in that area. Similar to our approach, ExB equates experience to expertise. In contrast, our approach accounts for how files are modified together (See Sec• tion 4.1), which we believe contain rich information about the expertise of the developer who made the change. As another example, the Expertise Recommender (ER) by McDonald [13] was deployed using two heuristics: tech support and change history. The change history heuristic, which is related to our work, uses the "Line 10" rule that

Mittp://www.carrozza.com/atwork/connex/about.html, verified 01/08/07 Chapter 2. Related Work 5 states that the revision authors are the experts for a file. These experts are ranked according to revision time so that the last developer to modify the file has the highest rank [13]. If multiple modules are selected as the target for an expertise request in ER, an intersection of the experts is performed, raising the possibility of ER producing an empty set of experts. In contrast, EEL uses the frequency of file modifications that occur together and can always produce a recommendation. Both ER and ExB require the user to switch from the application in which they are currently working to a special one designed for supplying the expertise recommendations. Yimam-Seid et. al. wrote that "it is beneficial if expert finding systems are embedded in the day-to-day problem solving and informa• tion search environments" since expertise finding is a daily occurrence[18, p 13]. EEL takes this approach by providing the expertise recommendations from within the development environment. This allows developers to work as nor• mal, and if a problem arises, they can request the expertise list without having to switch applications. Furthermore, ER requires the user to enter potentially complicated queries to the system[13] to get recommendations and ExB makes the developer select the module on which they are currently working [15]. Since activity within the IDE provides the context for a developers current work, EEL determines the files that the user is interested in by monitoring their work. EEL is therefore able to determine the experts in the area that is currently under investigation automatically, without a user entered query. A social network describes relationships between developers built using data mined from the system development (e.g., [9]). These networks often become large. As a result, many tools support queries to prune the network to show the most relevant portion; for instance, enabling the production of a view with experts in a particular area such as NetExpert [16]. NetExpert provides support for searching for an expert, browsing the social and knowledge networks as well as ways to initiate communication [16]. NetExpert requires the user to initially create a profile, then it is maintained automatically by using information in documents that the user submits to the system as well as their personal web pages[16]. This social network approach adds complexity for the user since they must be able to interpret and search the network to extract the information that they want. In contrast, the query needed to determine the experts in EEL is formed behind the scenes automatically based on what the artifacts and tasks on which the developer is working. Social networks were also used in the Expertise Recommender (ER) to tailor the expertise results for each user[ll]. For ER, the social networks were cre• ated by hand through information gained directly from the users. The networks created were then used to change the expertise recommendations based on the relationships between users. This means that each user might get different rec• ommendations based on the people with whom they would rather communicate. Explicit social network information is an excellent way to tailor the recommen• dations but was not used in EEL since it would be a per-project customization that would require extensive analysis of the teams to build. Furthermore, the method used by ER to get the social network has the same problems as profiles Chapter 2. Related Work 6 since it requires the networks to be updated as both the project and teams evolve. Machine learning-based approaches in the area of expertise recommendation have focused on using text categorization techniques to characterize bugs [1] and documents [17]. Anvik et. al. describe a system to recommend developers who should fix a bug based on the history of bug fixes for the system and the de• scription of the newly reported bug. A more generalized machine learning-based expertise locator is ExpertiseNet as described by Song et. al. [17]. ExpertiseNet examines files, specifically papers, to dynamically update a users' expertise pro• file [17]. Similar to machine learning-based expertise recommenders, EEL relies on past information to form recommendations. In contrast to these approaches, EEL uses a simple frequency-based weighting to form recommendations and does not produce any general model of the activity between developers. To investigate the coordination requirements of a project [2], Cataldo et al. introduced an elegant matrix-based' solution to finding and investigating these requirements. In their approach, the product of a task dependency matrix and a task assignment matrix is multiplied by the transpose of the task assignment matrix to produce a description of the extent to which people involved in a development share tasks. In comparison, we use the basic matrix framework to consider how work performed oh files, irrespective of tasks, can be used to pre• dict expertise as a developer works, as opposed to analyzing post-development if the communication matches the coordination requirements. Previous expertise recommenders were validated using human subjects; few have undergone a systematic user evaluation[10]. Expertise Browser (ExB) was deployed in two companies and the type and number of interactions of users with ExB was recorded [15]. Use of the tool was used to infer if the tool worked well. No information was collected on the accuracy of the ExB recommen• dations. Expertise Recommender (ER) was validated using a systematic user evaluation[10]. This study had users rank a list of potential experts and then the results were compared to the ranking provided by ER[10]. The validation presented in this thesis of EEL focuses on an automated validation strategy as initial accuracy numbers are needed prior to inserting the technology in a development environment. 7

Chapter 3

Emergent Teams Exist

Two recent trends in software development are the use of more agile software de• velopment processes[6] and global(distributed) software development[4]. These trends have arisen for many reasons, including the need to ensure appropriate expertise for the development at appropriate times. By using these development techniques, formal teams are replaced by dynamic ones that are created during development and that are constantly changing. The team structure emerges from the activity, and thus we refer to this as the emergent team structure Lewin and Regine define an emergent team as "a dynamic way of working together that keeps organisations on the edge" [8]. In an EclipseCon 2005 pre• sentation entitled "the eclipse way: processes that adapt", Erich Gamma and John Wiegand described the use of dynamic teams during the development of Eclipse; these teams are established to solve cross-component issues and consist of developers from all components affected by the issue. We believe that emergent teams in software development are not created only explicitly just to solve particular issues, but, more commonly, that they form implicitly through the action of working in the same area of the system. If a new developer begins work on an area of a software system, they are joining the emergent team consisting of the other developers who have previously worked in that area. The previous developers have the expertise needed to work on the system that the new developer must gather by interacting with the team. Even if formal teams are defined, developers naturally create, join and leave emergent teams on a daily basis based on the area of the system in which they are currently working. To show that emergent teams exist and to understand more about their cre• ation and composition, we investigated whether such teams exist on.the Eclipse project. We found that, on average, each committer on the Eclipse Java devel• opment tools (JDT) component team committed to eight different Eclipse Java projects1 other than JDT projects2 within the past year. Table 3.1 shows the number of projects, other than JDT projects, that each of the Eclipse JDT de• velopers has committed code to within the last year. As this table shows, some developers are stationary (just working on the code within their designated team) while others are much more active across the rest of Eclipse. Further• more, research on developer social networks by Madey et. al. saw a similar trend on SourceForge3. They noted that the "busiest" developer worked on

1A Java project is a module that is at the top level of the Eclipse CVS tree. 2Java projects beginning with org.eclipse.jdt. 3A project hosting site, http://sourceforge.net, verified 12/17/06. Chapter 3. Emergent Teams Exist 8 between 17 and 27 projects during the 14 month period that they monitored[9]. A more fine-grained view of the dynamic nature of teams on the Eclipse project is to look at the amount of work developers perform on each project within a given time frame. Figures 3.1, 3.2 3.3 and 3.4 show the activity of four different developers, two from the Eclipse JDT team and two from the Eclipse Platform team. The graphs consider a six month period divided into two week spans. The bar graph for each two week period represents the total number of commits that the developer made to all of the Eclipse Java projects. Each of the lines represents the number of commits made to each separate project. These graphs show how a developer's activity on a specific project changes over time and their area of focus in the system changes as well. This change in activity shows that the developers participate in many different teams, as they are not the only developer contributing to the different parts. This provides proof that even though a developer belongs to a formal team, their emergent team is changing as they work, showing how teams emerge. Chapter 3. Emergent Teams Exist 9

Table 3.1: Number of projects each JDT developer has committed to in the last two years other than JDT projects.

Username Number of Projects sdimitro 52 dmegert 29 dejan 26 kmoir 20 darins 12 mfaraj 12 darin 11 teicher 11 krbarnes 9 dbaeumer 9 twidmer 8 mkeller 7 maeschli 5 aweinand 5 mrennie 3 bbaumgart 2 jeromel 1 oliviert 1 ffusier 1 pmulet 1 wharley 1 jgarms 1 lbourlier 0 tyeung 0 daudel 0 thanson 0 mdaniel 0 kent 0 Average 8 mkeller

Figure 3.1: JDT developer project activity for mkeller. Chapter 3. Emergent Teams Exist 1 1

siiuiCQ jo JOqiunN emoffatt

25

2 Week Period

Figure 3.3: Platform developer project activity for emoffatt. to teicher 14

Chapter 4

Approach and Implementation

4.1 Approach

The goal of the Emergent Expertise Locator (EEL) is to make it easier for a developer to determine with whom to communicate during a programming task. EEL displays a ranked list of other developers with expertise on the set of files that the user of EEL has recently edited or selected—their current change set. To use EEL, a developer accesses a menu on a source 'file that displays a ranked list of developers along with ways to initiate a communication as in Figure 4.1. These communication methods may be synchronous (i.e., chat) or asynchronous (i.e., e-mail). This approach aims to minimize the impact of the communication on a developer's work flow and aims to provide assistance in context; for example, a developer need not switch to an external application to perform the communication and context about the developers current state may be automatically transmitted to the expert with which communication is begun.

4.1.1 Mechanics Our approach is based on the mechanism of using matrices to compute coordi• nation requirements introduced by Cataldo et al. [2]. Our approach requires two matrices, the file dependency matrix and the file authorship matrix, and produces a third, the expertise matrix.

1. File Dependency Matrix A cell ij (or ji) in this matrix represents the number of times that the file i and the file j have been modified together1. Since this produces a triangular symmetric matrix, EEL only records data in the upper half of the triangle to save space. This matrix is populated by querying a version control system, for each version of a file, for the files that changed with it.

2. File Authorship Matrix A cell ij in this matrix represents the number of times an developer % has modified a file j.

The time duration can be set within EEL. By default, the entire project history is used. Chapter 4. Approach and Implementation 15

Figure 4.1: Context menu list of developers showing the multiple methods of communication available.

Expertise Matrix This matrix represents the current experts based on the file dependency matrix and the file authorship matrix. A cell ij (or ji) in this matrix specifies the amount of expertise that developer i has to j. We consider that the higher the number in cell ij, the more of an expert developer j is to i. This matrix is computed using the equation:

C = (FAFD)FT (4.1)

where C is the expertise matrix, FA is the file authorship matrix and FQ is the file dependency matrix.

The tool that we have built on this basic approach uses a developer's current change set—the files the developer has recently selected or edited—to suggest an ordered list of developers with whom to communicate. Figure 4.2 shows this tool working within the Jazz Eclipse client. To provide this support, EEL mines information as a developer works. When a developer selects or edits a file, it triggers EEL to access the version control system and mine the related files and authors to populate the matrices. Once the user right clicks on a file and attempts to collaborate with another developer, EEL calculates the coordination matrix on the fly to ensure up-to-date information. The calculation of the coordination matrix can be time intensive. To mitigate this problem, since we are interested in experts only for the current developer, we modify the expertise matrix calculation to be

(RFAFD)FH (4.2) where v is a vector that represents the experts related to just the current devel•

oper, RpA is the row that corresponds to the current developer in the file au• thorship matrix, FQ is the file dependency matrix and FA is the file authorship matrix. By using only the row that corresponds to the current developer, the matrix multiplications are reduced to simple vector calculations. Even though we are only interested in the experts relative to the developer performing the query, the entire file dependency and file authorship matrices must be populated since they are required for the expertise matrix calculation. Java - AtlnbuleRsure.Java - Eclipse SDK Ffc Edit Source Refector fiavigate Search Project fun Window Heb

iii CH.if a. draw, application 0) < CH.ifa.draw.figures YOtd dca»(Graphics rnpoit declarations ,ti AriowTip.iava * Att'ibuteFigure if ij AttributoRgure.javi n ("Attributes : FigureAtt .+.' /.< BerdeiDecorator ,|ava • a ^ FgOefauHAtttbutes +, ^ Border tool. )ava '• ChopEiip se Connect jr. )ava 1 attrbuteFigueSer ai* Open Declaration F3 .v= c AlcribWeFgureQ ! * j(j CormectodText Tool. )ava CO Open Type Hierarchy H $ * :draw(GrapWcs) ;|}) EfcowConnection la«a it Open CaB Herarchy CW+Afc+H :» [/j •bowHarvSe. )ava drawBadayoundfGrap: > fl Eipsefrgi^e.iave Open Super implementator 4 drawFiame[Graphks) Show Package Explorer S 18 FicjveAttrtiutes.iava 1 9 getFSCotor0 .+'• J'j FontSiieKardie. tava & getFrameCotorQ > s * -J\ SroupCommand.Java n IrttiaiMAttrbutesO . 5 % £j\ Gfoupftgure.java «: gecDefaultAttrtaute(Sti >, j\ &oupMenJe.>ava Paste Qrl+V 4 getA»nbute(Strrfl) » i M|i .•' J] ImageFtgur* )*v* setAttrbutKStrfig, A*+5htt+S » '* i\ lnswtlmageCo^-a'i |«va Sovee « * write [ij tnsOecaaton t»v| Local Htstcry > .•, j ? NifrbeiTextFtgm.iavA ) References Declarations * ]7j PofyUreCflnnecto(.Vav» JTj PoJyLnefTCve Java Occurrences nFfa Qrl+Shft+U • rH ^ PolymHande.i*va Run As * > PoryUnet water, lava Debug As • >: RarJusManote Java :+ £fl RectangleFigure.tava Coteiwate . ^: liSflVHHHJMB :+'i, ft RondRextangleFicva.lava :• jj SobbleTooi.save Cornpare Wth »| > J) l^ortestCistariceCorinertor.java Replace With fj • Set a Ifi TaxtFigure.iava • met 8 ® TartTod.java Prefer ericas,.. '*; J7j Unc^CMJCDnimand.iava _f|§ CH.iFa. draw, frame wort 3^ CH.|fa. crew, image* public Coloc getfillCoIocO ( |H CH. if a. draw, samples, javadraw return (CoiQCj getAttribute("FillColor") i:) draw,samples.iavadr aw. sarrple images ;f8 CH, if a. draw, samples, net if* CH.i.'a.draw.samples.™*King IH CKira.drajw.samples.pert, •fj^ CH,iFa.draw.samples,pett.images Modems Charges.: I CH.fa.draw.stafidard j|f CH.ifa.draw.util If: Sfe -RE System Library [tdkl ,5]

*! »G XWnjKJME/Junit.iar - C:Ueam\so>\ecfese\p*^s\org.juni(_3.B.l\iunit.jar

|j cuU.xm! Seccrti [latest]

Sraart Insert :-53:10 IMoWoiihPtogress •

Figure 4.2: EEL in use within Jazz. Chapter 4. Approach and Implementation 17

IBM Jazz EEL Plug-in

IBM Jazz SDK EEL Core

Eclipse SDK MTJ Library

IBM Jazz Server

Figure 4.3: Architecture of EEL.

4.2 Implementation

EEL is implemented as a Java plug-in for Eclipse since Jazz has an Eclipse client. The core plug-in contains the algorithms and data structures required to determine the potential experts based on the repository data. This core plug-in also handles scheduling the queries to the repository in the background so that they are transparent to the developer. Furthermore, it provides an extension point so that additional repository support can be added easily. The second plug-in is the repository plug-in. This plug-in is related directly to the type of repository that EEL queries for author and related file information. Figure 4.3 shows the architecture of EEL with respect to Eclipse and Jazz. We have developed a repository plug-in for both Jazz and Subversion repositories. EEL uses the application programming interface (API) provided by Jazz to gather information from the version control system (i.e., change sets and authors) that is needed to build the file authorship and dependency matrices. Since EEL is developed as a client-side plug-in, no changes had to be made to the Jazz server. This means that EEL is personalized based on each user and the files on which they have worked. EEL could have been implemented as a server, but we chose the client-based approach since we wanted to ensure that each developer could personalize the tool to suit their needs. Furthermore, implementing EEL as a server would require additional infrastructure that we did not want to impose on teams. To support matrix computations in EEL, we used the open-source matrix package Matrix Toolkits for Java2. This package provides the ability to create the required matrices as well as perform the necessary calculations to obtain the expertise matrix. To ensure that the final matrix contains enough relevant information to determine the appropriate experts in the area of interest, EEL uses matrices that are 1000 elements square. This choice means that we can track up to 1000 files, enabling a substantial portion of a developer's work to be

2MTJ, http://rs.cipr.uib.no/mtj/, verified 12/17/06. Chapter 4. Approach and Implementation 18 used for the recommendation of experts. Even though the matrices are fairly large, they can fill up quickly due to the number of related files per revision of a file. To mitigate this problem, EEL uses a least recently used approach to determine which entries to remove from the matrix once it becomes full, allowing the files that are either related or viewed more often to remain in the matrix longer. To ensure that the files that a developer has worked on (the current change set) remains in the matrix, they are removed only if they occupy 50% of the matrix. The current change set, the files which the developer has selected or edited, is treated differently since the set contains information that directly pertains to the developer's current work. Since the files in the change sets provide the basis for the determination of expertise within EEL, it is necessary that they provide accurate information. Ying and colleagues noted that while mining software repositories, change sets containing over 100 files are often not meaningful since they usually correspond to automated modifications, such as formatting the code or changing the licenc• ing [19]. After inspecting the change logs for several projects,3 we noted that this is true of most change sets with over 50 files. With this knowledge, we were able to limit EEL from mining information from change sets with over 50 files in it. This choice ensures that, irrelevant related file data does not pollute the file authorship and dependency matrices. The time required for EEL to produce a recommendation is dependant on two main factors. The first factor is the speed of the repository from which EEL accesses the author and related file information. This speed is affected by many factors such as network speed, repository size and server load. Generally, these systems can provide the information that is required by EEL quickly, therefore, it is not a major factor in the usability of EEL. The second factor is the speed of the calculation of the expertise matrix. Since the expertise matrix is computed when the user opens the menu, it is the main factor in producing a recommendation quickly. On a 2.13Ghz Core 2 Duo system with 2Gb of memory, the calculation of the expertise matrix takes 891ms using the vector calculation approach.

4.2.1 Extensibility

EEL was designed to be extensible since many different repositories and com• munication methods exist. Currently, EEL supports only the addition of a repository that is tied to a single communication tool (e.g., Jabber4 or IBM Lo• tus Sametime5). This is an issue since two teams using the same repository may use different systems for communication. Each repository needs the communi• cation mechanism handled separately since many systems, unlike Jazz, do not have collaboration support built into them and external tools need to be used.

3Most notably Eclipse and Gnome Evolution. 4Jabber is an open-source instant messaging system, http://www.jabber.org/, verified 01/08/07. 5Sametime is an enterprise instant messaging and web conference system, http://www- 142.ibm.com/software/sw-lotus/sametime, verified 01/08/07. Chapter 4. Approach and Implementation 19

If EEL is to employed it would be beneficial to provide a pluggable communi• cation framework since not all projects use the same collaboration techniques, even if they use the same version control system. To add support to EEL for a new repository, a single class needs to be created. This class must implement methods for retrieving author and related file information from the repository given a file that the developer is currently interested in. These methods are called from the core plug-in of EEL when the developer selects or edits a file. 20

Chapter 5

Validation

Ideally, we would validate EEL by gathering statistics about the accuracy of EEL's recommendations as developers use the tool as a part of their daily work. Such an evaluation requires a moderately-sized, preferably distributed, devel• opment team. Engaging such a team in an evaluation is difficult without any proven information about the effectiveness of the technique. To provide initial evaluation information, we have thus chosen to apply the approach to the history of existing open-source systems. We use information about the revisions to files stored in the version control system of a project to drive our approach. We use the communication patterns recorded on bug reports as a partial glimpse into the collaborations that actually occurred between the developers. Because we have only a glimpse into the communication that occurred during the project, the results we provide in this section are essentially a lower-bound on the accu• racy of the recommendations provided. For validation purposes, we created an extension to EEL that allowed for the mining of Subversion1 repositories. Subversion was used since it retains information about change sets stored in the repository, unlike CVS which stores single file modifications. First, because the early access version of Jazz had limitations that prevented us from importing an active open-source project, limiting the size of project we could use for validation2. We were able to access and create subversion repositories for a variety of open-source projects, enabling a more thorough evaluation.

5.1 Methodology

Our validation method involved selecting a bug of interest and recreating the development state at that time by considering only source code revisions that were committed before the bug was closed. We used a determination of the files required to fix the bug to populate the matrices and determine the recommen• dations. We then compared our list of experts to those who had communicated on the bug report, as determined through comments posted to the bug report. Since the communication recorded on a bug report largely discusses the issue underlying the report, the developers involved in this discussion either have

1 http://subversion.tigris.org/, verified 01/08/07. 2The Jazz system has not yet been released and only limited development data from Jazz itself is becoming available now. Chapter 5. Validation 21 expertise in the area or gain expertise through the discussion.3 To perform this validation, we needed to determine a set of bugs with a sufficient number of recorded comments to infer communication amongst devel• opers and with associated revisions of the source files that "solved" the bug, the resolving change set for the bug. We searched through all of the bugs marked as resolved and fixed for reports with ten or more comments and where' at least five different developers had recorded comments; Appendix B provides a description of how we determined whether an entered comment represented a developer. For the validation, we retained only the comments provided by developers, discarding the others as they are not relevant to providing a lower- bound on the developer communication. We used a standard approach (see Section 5.2) to determine the resolving change set for a bug and ensured that all change sets considered between three and nine files. We chose a range of change set sizes to enable evaluation across a range of situations. EEL is intended to be used as development proceeds. To mimic development in this validation, we used the following process: • Create three subsets of the resolving change set Given the resolving change set for the solved bug, we create three change set sized subsets (| of the change set, | of the files, and the entire change set) to test how well EEL performs in finding experts given less information than what is needed to fix the given bug. We choose the files for each subset randomly with the constraint that at least one file in each subset must not be an initial revision when the bug was fixed to ensure that we have some history from which EEL can recommend emergent team members. Random subset formation is necessary since we do not know in what order a developer may have modified the files used to solve the bug.

• Partition the comments in the bug into three groups We partition the comments in the bug into three approximately equal groups based on the date of the comment. The first group has the oldest comments while the last group contains the newest ones, enabling us to mimic how development occurs. The bugs could have been partitioned based on a date span, but this would have provided a varying sized parti• tion and could have produced poor results due to the frequency and speed of communication that occurred on the report. An example is if we chose a one week time period, one bug may have been fixed in a week, therefore all comments are in one partition. On the other hand, one bug may have taken two years to fix and was only commented on once per month, leav• ing each comment in its own partition. Also, as this example illustrates, the number of partitions would vary per bug and therefore it would be difficult to compare the results. Furthermore, if a comment partition has no developers communicating within it, the entire bug is discarded. When no developers communicate, Communication on a bug report unrelated to the underlying issue is typically moved into another bug report. Chapter 5. Validation 22

the precision would always be 0% and the recall would be incomputable since it would be divided by 0 (see below). To rectify this situation, we chose to discard these bugs from our final dataset.

Apply EEL to each combination of comment partitions and- change set

subsets

We apply EEL to each of the nine cases resulting from combining the comment partitions with the file revision subsets (see Figure 5.1) and evaluate the precision and recall of the recommendations produced by EEL. Specifically, for each case we,

1. find the last revision of each file (in the change set subset) before the earliest comment in the comment partition,

2. apply EEL to the file revisions in the change set subset obtained in the previous step to produce an ordered list of emergent team members,

3. determine who all of the commenters were in the bug partition, form• ing the set of relevant developers (see Appendix B for details on how we determined which comment corresponded to a developer) 4. compute the precision, representing the percentage of correctly iden• tified team members (5.1), and recall, representing the percentage of potential team members correctly identified (5.2).

# Appropriate Recomendations Precisionn = — (5.1) Total # Recomendations

,-, I, # Appropriate Recomendations Recall = — : (5.2) # Possibly Relevant Developers The # Appropriate Recomendations is the number of developers recommended by EEL that commented on a bug report, the Total # Recomendations is the number of recommendations that EEL made and # Possibly Relevant Developers is the number of developers that communicated on the bug report.

In our validation, we also compare EEL against the "Line 10" rule that was used in the Expertise Recommender (ER) [13]. The "Line 10" rule is when the last person that modified the file is considered as the expert in that file. ER extends this approach to rank all of the developers that have modified the file by their last edit date. If multiple files are selected, the expert lists are computed separately for each file and then an intersection of the lists is performed to produce the final expert list. After a preliminary run of the validation on EEL, it was noticed that some of the precision and recall values were 0%. Further investigation revealed that the files that changed to fix a bug may have not been created prior to the dates Chapter 5. Validation 2:\

Change Set Comment Partition # 1 Subset # 1 Comment # 1 File #2 Comment # 2

Comment # 3

Comment Partition # 2 Change Set Subset # 2 Comment # 4 File # 1 Comment # S File # 3

Comment # 6

Comment Partition # 3

Comment # 7 Change Set Comment # 8 Subset # 3

Comment # 9 File # 1 File # 2 Comment # 10 File #3

Figure 5.1: The 9 cases for validation. Lines in this diagram represent a combi• nation of comment partition and change set subset for validation purposes.

of earlier comments. This means that EEL was unable to get any historical data for the file since it did not exist and therefore EEL was unable to produce a list of experts. This situation can occur even if the file is not new as of that change set since development is ongoing and the file that needed to be changed was added after the bug was commented on, but before it was fixed. For these cases, we report an optimistic and pessimistic case precision and recall. The optimistic case precision and recall is 100% and it is appropriate since if the file did not exist at the time, producing no experts is correct. On the other hand, since EEL was unable to produce any experts, we can consider the pessimistic case and assume a precision and recall of 0%. These optimistic and pessimistic case values are only used if EEL is unable to produce a list of experts. If EEL is able to produce a list of experts, only one precision and recall value is given and they are computed using equations 5.1 and 5.2 as described earlier. Another thing that we noticed upon the preliminary run of the validation was that the performance of the recommendations of EEL was lower than the "Line 10" rule for some projects. Investigation into this problem revealed that many of the developers contributing to these project committed only a few changes during a small time period then never committed again. Since EEL considers the entire project history, the addition of authors that are no longer active affects EEL's recommendations. The "Line 10" rule is not impacted by this case since it ranks experts by their last authorship date, ensuring that the most recent authors are recommended first. In contrast, EEL uses all authors Chapter 5. Validation 24 and related files to determine the recommended set of experts, resulting in the potential for an author that was previously highly active on the project, who has not been active recently, to be recommended over a more recently active developer. To ensure EEL provides relevant recommendations, we apply it to the last twelve months of development history. To be fair in our comparison, we use the same underlying data for the "Line 10" rule. The use of a limited amount of recent data from the project archives has limited effects on the results of applying the "Line 10" rule because this approach-already ranks the most recent activity higher.

5.2 Data

We used three existing open-source software projects, the Eclipse project4, Fire- fox5 and Bugzilla6 in our validation. Eclipse is an open-source platform for integrating tools implemented in Java, Firefox is a popular open-source web browser and Bugzilla is an open-source issue tracking system. As Firefox and Bugzilla are part of the , they use the same version control and issue tracking systems. These projects were chosen because they each have a number of developers who have been active in committing code and bugs in their histories and there is a sufficient amount of data to run the validation af• ter we apply the constraints we outlined in the previous section. Furthermore, these projects use the repository types our infrastructure supported: CVS for source control and Bugzilla for issue tracking. These two data sources are used to populate EEL and validate its recommendations. To form appropriate change sets of Eclipse we obtained an archive copy of the CVS repository from the Eclipse.org archive site7 and imported it into Subversion using cvs2svn. We performed a similar operation to the Mozilla CVS repository 8. Cvs2svn is a python script developed along with Subversion (SVN) by the Tigris.org9 community. This script must be run on the system that the CVS repository currently resides since it directly reads the RCS files. Cvs2svn has many passes that it uses to prepare the SVN repository and to determine the change sets associated with each of the CVS files, ensuring that the imported data is robust and correct. Cvs2svn uses a simple algorithm, similar to that described by Mockus et. al. [14], that inspects the author and log message for each revision of a file and has a notion of time. Using this information, it inspects all of the revisions of all files, grouping revisions that have the same author and log message and that occur within a five minute window of time to create a single change set. If multiple files are changed by the same author and

4http://www.eclipse.org, verified 01/08/07. . 5http://www.mozilla.com/en-US/firefox/, verified 01/08/07. 6http://www.bugzilla.org/, verified 01/08/07 7Eclipse makes archives of the CVS repository available in compressed for at http://archive.eclipse.org/arch/, verified 01/08/07. 8The Mozilla CVS was obtained using rsync. Rsync is an open-source utility for file transfer, http://samba.anu.edu.au/rsync/, verified 01/08/07. 9http://www.tigris.org, verified 01/08/07. Chapter 5. Validation 25

Table 5.1: Bug and change set statistics per project.

Eclipse Firefox Bugzilla Total # change sets 122614 174581 174581 Total # bugs resolved in last 2 years 10013 4918 2148 Total # bugs with at least 5 developers and 10 comments 354 501 291 Total # criteria fitting bugs with reference in change log 182 283 216 Total # bugs with reference and correct change set size 49 70 81 Total # bugs excluded due to no developers in partition 1 2 0

with the same comment but span more than five minutes, multiple change sets are created, hence a single change made to the system may be split into two or more change sets. Furthermore, this algorithm could cause unrelated files to be committed as a single change set, but it was noted in the cvs2svn design notes that this will only happen with insufficiently detailed log messages, like "changed doc" or an empty message, coupled with multiple commits performed quickly[3]. The design notes also note that if a log message is insufficiently detailed, then the change must not be important and there is no real harm if the files are grouped [3]. In selecting bugs for the validation, we-used those marked as closed and fixed within the past two years. These bugs were obtained by downloading the XML version of all bugs that fit this criteria from the respective projects' Bugzilla database. Our selection criteria ensures that the fix is fairly recent, and that it was actually committed to the repository. If the status of a bug was not closed and fixed, it could still be under development, be a duplicate of another bug that has an unknown status, or it may be marked not a bug. Since most open-source projects associate a bug with a single change set, many of them require that the identifying number of the bug that was fixed be entered into the comment of a commit to the version control system. Using this knowledge, we were able to map a bug to a single commit so that we could recreate the development state needed to fix the bug in question. To perform this mapping, we searched each of the log messages obtained from Subversion (one per revision) for a reference to one of the bugs that could be usable for validation. This was done by creating a log of all of the change sets stored in the version control system along with the files that were changed and searching for a string that indicates that it corresponds to a bug fix, for example, "bug 321", "fix 321" or just "321". We then matched logs with a reference to a bug to the bugs that we have determined are appropriate for validation. Any bug that did not have a reference in a log message was discarded from the validation. Table 5.1 provides some statistics on the size of the data sets that were used. The total number of change sets presented in table 5.1 for Firefox and Bugzilla are the same since they are both contained in the same source code repository. Chapter 5. Validation 26

5.3 Results

Since EEL can produce a varying number of recommendations, we computed the precision and recall for three different sized lists of potential team members, namely three, five and seven recommendations. These lists were obtained by taking the top parts of the ordered list produced by EEL and the "Line 10" rule. We varied the recommendations to investigate the impact of the size of the recommendations on the performance of EEL. Figures 5.2, 5.3, 5.4, 5.5, 5.6 and 5.7 present a subset of the optimistic precision and recall values using the Eclipse, Bugzilla and Firefox datasets. The presented results represent a more interesting subset of the data that was collected. This subset presents the results for all of the time frames provided by the bug partitions, but only | and | of the files in the change set. Furthermore, the figures present only the results when five developers were recommended. We chose to focus our presentation of results on these cases, as these cases represent a developer looking for expertise prior to a problem being fixed and because the recommendation list is of a reasonable size for a developer to consider. Appendix A provides both the optimistic and pessimistic results of all of the 27 different test cases for each of the projects. The results are presented in box-and-whisker plots. These plots assist in viewing the distribution of the results. The shaded box in the plot represents the second and third quartiles of the data set, whereas the lines extending above and below them, the whiskers, represent the fourth and first quartiles respectively. The large black dot represents-the average value of the data and the line represents the median. Any small unshaded circles that are located above or below the whiskers represent outliers that are well outside the range of common values. On the results graphs, the y-axis represents the percentage value of the precision or the recall. The x-axis separates each of the cases that we are interested in. A label on the x-axis that reads Tl,l/3 means that it represents the first bug partition (TI) and | of the files in the subset (1/3). Table 5.2 shows the overall average optimistic precision for both EEL and the "Line 10" rule for all three of the projects used in the validation. Table 5.3 presents the average optimistic recall for the three tested projects. Figures 5.2 and 5.3 present the optimistic precision and recall values for Eclipse. The overall average optimistic precision and recall of EEL for this project are 37% and 49% respectively, compared to 28% and 35% for the "Line 10" rule. Figures 5.4 and 5.5 present the optimistic precision and recall values for Bugzilla. The overall' average optimistic precision and recall of EEL are 28% and 38% respectively, compared to 23% and 28% for the "Line 10" rule. Figures 5.6 and 5.7 present the optimistic precision and recall values for Firefox respectively. The overall average optimistic precision and recall of EEL are 16% and 21% respectively, compared to 13% and 16% for the "Line 10" rule. In each of the three projects tested, EEL produces higher precision and higher recall than the "Line 10" rule. On average, in 88% of the 27 different test cases for each of the three projects, EEL produced a higher precision and Chapter 5. Validation 27

Table 5.2: Optimistic average precision.

Project EEL Line 10 Eclipse 37% 28% Bugzilla 28% 23% Firefox 16% 13%

Table 5.3: Optimistic average recall.

Project EEL Line 10 Eclipse 49% 35% Bugzilla 38% 28% Firefox 21% 16%

recall than the "Line 10" rule. This shows that EEL produces better results than the "Line 10" rule. It is an open question whether these precision and recall values are sufficient to create an effective tool for recommendations. We are optimistic that an effective tool can be based on this approach because McDonald's study found that people working on the project generally agreed with the recommendations provided [10]. Knowing that the "Line 10" rule performs well when strict testing is performed, we believe that the our results show that EEL provides better expertise recommendations than the "Line 10" rule. Furthermore, there were some cases where EEL was able to produce a list of experts when the "Line 10" rule recommended an empty list. Recommending an empty list of experts does not help developers find expertise, forcing them to modify the information that they are interested in until they find an expert that might be of interest. EEL always produced a recommendation if there was history in the repository for it to use. For Eclipse, 6.5% of the cases we tried resulted in the "Line 10" rule being unable to produce a result when EEL could. For Bugzilla, this occurred 15.0% of the time, and Firefox 14.9%. The "Line 10" produces an empty recommendation list since it performs an intersection of the authors when there are multiple files in the change set. This situation can occur when the files are relatively new, or new dependencies have been added within the files that is not reflected within the history of project. The difference between the optimistic and pessimistic values gives an insight into the number of times that there was insufficient history for either EEL or the "Line 10" rule to produce a recommendation. Chapter 5. Validation 28

Test Number

• EEL • LINE 10

Figure 5.2: Eclipse optimistic precision.

I EEL * LINE 10

Figure 5.3: Eclipse optimistic recall. Chapter 5. Validation 29

0 o 0.0 ...

1 1

T1.1/3 T1.2/3 T2. 1/3 T2. 2/3 T3. 1/3 T3,213 ll'lTest Numbe r • EEL • LINEII 10 Ill Figure 5.4: Bugzilla optimistic precision.

T1. 1/3 T1.2/3 T2. 1/3 T2.2/3 T3. 1/3 T3.2/3 Test Number

• EEL • LINE 10

Figure 5.5: Bugzilla optimistic recall. Chapter 5. Validation

100 95 90

70 85 60 0 o 0 0 o 0 o o 0 in reOI E0E3 B 5° 5 45 BL 40

T3. 1/3 T3.2/3 Test Number

! EEL • LINE 10

Figure 5.6: Firefox optimistic precision.

Test Number

EEL is LIME 10

Figure 5.7: Firefox optimistic recall. Chapter 5. Validation 31

5.4 Threats

Several factors could affect the construct, internal and external validity in our study of EEL.

5.4.1 Construct Validity

Construct validity considers whether the measures used in a study represent the concept that was being studied. A potential threat to the construct validity of our validation is how we determined the experts to whom we compare the recommendations made by the two approaches. We used the developers who commented on the bug report as the experts for the area of the system in question. However, it is possible that the comments that were posted to the bug report were not related to the bug or were not technical in content. Either situation would mean that who we consider as experts may not actually be experts in that part of the implementation of the system. As we described in section 5.1, these situations are unlikely given how the bug reporting system is used in practice. In the open-source community, bug reports are used as a collaboration device to track the technical communication surrounding the bug report. For instance, in the open-source communities we studied, if unrelated issues arise in a report, they are generally continued within a new bug report or through other communication means. The result is that the commenters on the bug report are potential experts since they are commenting on the technical aspects of the bug. Our use of the bug report comment data is a lower bound on the commu• nication that occurred during the development of the system. As a result, we may not have a complete list of the experts, or a ranking of the importance of each of the developers with respect to a bug. A user study where developers could use their knowledge regarding expertise and provide a more detailed view of the correctness of EEL would have better construct validity. The validation presented here is intended to be a preliminary study to show the effectiveness of EEL so that it can be deployed into a development team. Another potential threat to the construct validity of our study is that the version control system may contain non-technical changes to the system. These kinds of changes would mean that a developer committing a part of the sys• tem made a change that did not require knowledge of the area (i.e., changing the licence agreement that appears at the top of each file). As we previously mentioned, these types of commits to the system are generally large in size, and due to this, EEL ignores change sets with more than 50 related files. This approach tries to ensure that these untechnical changes to the system are not used in the expertise recommendation. Furthermore, since EEL uses the fre• quency of the commits as a factor in recommending experts, if a few of these commits are included, there is a high probability that they will not affect the recommendations. . Chapter 5. Validation 32

5.4.2 Internal Validity

Internal validity in a study ensures that there was a causal relationship between the method being studied and the results. The username to e-mail mappings could threaten the internal validity of the evaluation of EEL. The mapping between e-mail and username is a difficult problem and has no easy solution. Since we are comparing the performance of EEL to the performance of the "Line 10" rule, this should not affect the outcome. Both of the methods use the mappings similarly and compare to the same set of potential experts. This means that if one of the mappings is incorrect, it will affect both systems equally, therefore, not affecting the comparison between the two.

5.4.3 External Validity

External validity studies the generalizability, or applicability, of the results to other situations. One potential threat to external validity is that we only con• sidered open-source projects. This could be an issue since the processes used in a corporate environment might be different than that of an open-source com• munity. As the projects that we studied involve professional developers and the systems developed are of high-quality, we believe that the processes used and the structure of the projects are similar to those in a corporate situation. Another threat to external validity is the size of the teams that we inves• tigated. We believe our approach is better suited to large teams and targeted large teams in the evaluation with both of the systems having between 200 and 800 developers committing. Finally, the use of mature projects could be a threat to the generalizability of the validation. We feel that this is not a threat since any project that is new does not contain the information that is needed to provide recommendations. There is no easy way to test the validity of a recommendation tool on a new project since the project history is non-existent, therefore no information can be collected about the experts of the system. Furthermore, a recommender would produce moot results since only a few people have edited the files, meaning that they are the creators of that file and therefore the experts. On a new project, a profile-based expertise recommender would perform the best since developers can list their area of expertise. 33

Chapter 6

Discussion

In this chapter, we consider both extensions and limitations to EEL, as well as, describe the next steps in evaluating the approach.

6.1 Other Sources of Information

There are several different data sources that EEL could use to mine information for producing the emergent team recommendations other than source revision information, including bug reports and the developer's own activity. We chose to use revision information because our motivating use case is to recommend team members with expertise on the code. The best information for this is developers who have demonstrated work on the code. Bug report information may be useful for augmenting the source revision information. To do this, a few changes would need to be made to EEL. First, EEL would need to understand the bug that the user is currently working on and have the ability to extract data from the bug tracking system. Second, a mapping between the source control repository e-mail and username and the bug tracking systems' e-mail and username would be needed so that the users' expertise rankings could be unified within EEL. If using a tool like Jazz, this mapping is not needed since the names are unified throughout all of the compo• nents of the software life cycle. For EEL to properly utilize bug data, it needs dependency information to gather history of past bug fixes, but this information is sparse and many times non-existent since developers do not properly record this information. Some tools are being developed to attempt to automatically determine bug similarity for finding duplicate and dependant bugs, but these technologies are not mature enough to reliably produce correct results .that could be used for expertise locating (e.g., [5]). Alternatively, a developer's interaction with the system, similar to that col• lected by Mylar[7], could be helpful in two ways. First, Mylar's task contexts capture information about which files a developer referred to when performing a task. Files to which a developer referred a lot may be also areas of the system in which a developer is knowledgeable and for which the developer could be considered a member of the emergent team for that file. Second, Mylar's task information could be useful to maintain EEL's matrices on a per task basis. This would mean that when a user changes the task that they are working on, they would start with an empty list of experts until they begin investigating the code. This would ensure that the list of experts would be directly related to that Chapter 6. Discussion 34 task and not the previous one. However, we did not implement this since the number of related files is large, it will quickly clear the matrices, and therefore the expert list, as the user works. A drawback of this approach is that when a task is started there will be no information to recommend experts, leading to less recommendations when a developer may most need a recommendation, early on in a task's life cycle.

6.2 Using Emergent Team Information

There are some other potential uses for emergent team information other than recommending experts. Emergent team information could also be used to look at churn or determine the load on a particular developer on a project. One use of emergent team structure is to examine the churn of a project or component. Churn is when an entire team (or most of a team) continues to change over time. This means that the team is always composed of new developers that must learn the area of the system, therefore, resulting in a lack of experts for that part of the system. A project that has little churn normally means that the development team is stable and contains knowledge of the project. A manager could use this information to determine problem areas in a project and redistribute developers to fix the problem. By periodically determining the emergent team for an area of a system, one could compare the current team with previous team snapshots to determine the. size or frequency of the change on the emergent team. If a large number of developers on a team leave or if the team changes frequently, it could indicate a problem. Emergent team structure could also be used to determine the load on a developer. A manager could .use this information to reduce the load on one developer by reassigning work to other developers. To determine the load of a developer, the emergent teams of different areas of a project could be analyzed. If a developer is contained in a larger number of teams than other developers, this would indicate that they probably have a higher load and are key to the success of the project.

6.3 Future Evaluation

The next step in evaluating EEL is to deploy the tool into an active development project. Ideally, this project would have a relatively large code base (e.g., one million lines of code), follow some agile practices and communicate primarily through electronic means. We would like the team to follow some agile practices so that EEL is able to be useful and not recommend developers that are known to be experts by members of the team. It would be beneficial if the communication was done through electronic means so that we could track and analyze the communication that was initiated through EEL. The focus of this evaluation should be on both the correctness of EEL's recommendations and the users' experience. Data should be collected on an on- Chapter 6. Discussion 35 going basis when the tool is deployed. Information about the frequency of using EEL to make a recommendation, along with the frequency of communication initiated from EEL could be collected and analyzed. To further ensure accuracy, a more formal study could be performed in the likes of the one performed by McDonald[10]. This study involved having users rank a list of developers that could be experts on the system in question. The results provided by the users are then compared to the tools recommendations to determine the accuracy of the tool. This second study could be used to supplement the results collected from the deployment of EEL, and provide statistical evidence for the performance of EEL. Second, the experience of using EEL should be evaluated. The goal of EEL is to provide the simplest way to display the expert recommendations as well as how communication can be initiated. This evaluation could be run in parallel with the correctness study when EEL is deployed. Users could compare the usability of EEL to their previous methods of expertise location and communi• cation tools. Furthermore, evaluation could involve a separate study that tests the current way that EEL provides the list of experts to another system, such as, manually configured buddy lists. Finally, since we discovered that the performance of EEL is highly dependant on the amount of history data that it uses, it would be beneficial to investigate this further. This would require testing EEL using different lengths of time to examine in the software repository. We chose twelve months to increase the recency of the data that was being used, but using more or less time may produce better results. We recognize that there is a high probability that this factor is dependant on the project used. When inspecting the Eclipse results, using the entire history or the last twelve months produces similar results with EEL, whereas with the Mozilla projects, there was a large change in the results. It would be beneficial if there was an automated way to determine this value on a per-project basis so that the customization could be made available to EEL.

6.4 Limitations

A limitation of EEL's approach is that it is unable to easily work with many traditional version control systems like CVS and RCS. This limitation is due to these systems maintaining commit information on a per file basis; therefore, not containing any information pertaining to the files that changed along with it. This means that we are limited to newer version control systems such as Subversion and Jazz since they support atomic commits across a number of files. Tools and methods exist for extracting change sets from CVS, but it is infeasible to run this every time data is mined for a file. As an alternative, an external tool could be periodically run to extract this information, but this is an intensive operation and therefore would create the need for a server based approach which we are attempting to avoid. Many open-source projects still use CVS, but Subversion is gaining popularity and some major open-source projects (like Apache and Gnome) have either migrated or have plans to migrate to using Chapter 6. Discussion 36 this new repository system in the near future. Another limitation to EEL is that if a new developer is added to the team, but they are already an expert, there is no support to ensure that this person is correctly recommended. If a client-server approach was used, a simple skew or replacement value could be added to augment the recommendations to ensure that this new developer is recommended. This method could also be used if a developer leaves the team. To solve this within EEL, the ability to personalize the recommendations could be added. One personalization could be the ability to substitute a expert who is recommended by EEL with a different expert specified by the user. This could be done to replace a developer who left the project with a new hire. Another use of this type of personalization would be to augment the recommendations based on the social structure of the team. A developer may prefer to talk to an expert that they know over another member that has similar knowledge. Another way to solve this limitation would be to weigh the recent development activity heavier than older information. This would mean that a developer that has worked on a file more recently could be rated higher even if they are new to the team. To implement this mechanism, an in depth experiment would need to be performed to determine if it is feasible. 37

Chapter 7

Summary

To build successful complex software systems, software developers must collabo• rate with each other at all stages of the software life-cycle. Current development tools facilitate this collaboration by integrating communication tools, such as chat and screen sharing, within the development environment. By integrating these tools, it has made it easier for developers to communicate, but as the composition of software teams becomes increasingly dynamic, it may not be straightforward for a developer to keep a description of colleagues on the many teams in which she may work up-to-date. This leaves the developer to determine with whom he should collaborate in a particular situation, forcing the user to have some knowledge of who has expertise on particular parts of the system. EEL mitigates these problems by determining the composition of the team automatically so that developers do not need to spend time configuring mem• bership lists for the many teams to which they may belong. This is done by using the context from which the developer initiates communication combined with the project history to produce a recommendation of experts related to the area the developer is currently working. Using an automated validation and historical data from three different open- source projects, we found that EEL produces higher precision and higher recall than the "Line 10" rule. The results are promising but EEL still needs further validation using a team of developers due to the limited availability of expertise information. 38

Bibliography

[1] John Anvik, Lyndon Hiew, and Gail C. Murphy. Who should fix this bug? In Proceeding of the 28th international conference on Software engineering, pages 361-370, 2006.

[2] Marcelo Cataldo, Patrick Wagstrom, James Herbsleb, and Kathleen Carley. Identification of coordination requirements: Implications for the design of collaboration and awareness tools. In Proceedings of the 2006 20th anniver• sary conference on Computer supported cooperative work, pages 353-362, 2006.

[3] cvs2svn Community. How cvs2svn Works. http://cvs2svn.tigris.org/svn/cvs2svn/trunk/design-notes.txt, 2006.

[4] James .D. Herbsleb and Deependra Moitra. Global software development. IEEE Software, 18(2):16-20, Mar/Apr 2001.

[5] Lyndon Hiew. Assisted detection of duplicate bug reports. Master's thesis, University of British Columbia, 2006.

[6] Jim Highsmith and Alistair Cockburn. Agile software development: the business of innovation. IEEE Computer, 34(9):120-127, Sept 2001.

[7] Mik Kersten and Gail C. Murphy. Using task context to improve program• mer productivity. In Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering, pages 1-11, 2006.

[8] Roger Lewin and Birute Regine.. Complexity and business success. http://www.psych.lse.ac.uk/complexity/Seminars/1999/report99oct.htm, 1999.

[9] Greg Madey, Vincent Freeh, and Renee Tynan. The open source software development phenomenon: An analysis based on social network theory. In Americas conf. on Information Systems (AMCIS2002), pages 1806-1813, 2002.

[10] David W. Mcdonald. Evaluating expertise recommendations. In GROUP '01: Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work, pages 214-223, 2001. Bibliography 39

[11] David W. McDonald. Recommending collaboration with social networks: A comparative evaluation. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 593-600, 2003.

[12] David W. Mcdonald and Mark Ackerman. Just talk to me: a field study of expertise location. In CSCW '98: Proceedings of the 1998 ACM conference on Computer supported cooperative work, pages 315-324, 1998.

[13] David W. Mcdonald and Mark S. Ackerman. Expertise recommender: a flexible recommendation system and architecture. In Proceedings of the 2000 ACM conference on Computer supported cooperative work, pages 231- 240, 2000.

[14] Audris Mockus, Roy T. Fielding, and James D. Herbsleb. Two case studies of open source software development: Apache and mozilla. A CM Transac• tions in Software Engineering and Methodology, ll(3):l-38, 2002.

[15] Audris Mockus and James D. Herbsleb. Expertise browser: a quantitative approach to identifying expertise. In ICSE '02: Proceedings of the 24th International Conference on Software Engineering, pages 503-512, 2002.

[16] Ramon Sanguesa and Josep M. Pujol. Netexpert: A multiagent system for expertise location'. In International Joint Conference on Artificial In• telligence (IJCAI) Workshop on Organizational Memories and Knowledge Management, pages 85-93, 2001.

[17] Xiaodan Song, Belle L. Tseng, Ching-Yung Lin, and Ming-Ting Sun. Ex- pertisenet: Relational and evolutionary expert modeling. In Proceedings of the Tenth International Conference on User Modeling, pages 99-108, 2005.

[18] Dawit Yimam-Seid and Alfred Kobsa. Expert finding systems for organi• zations: Problem and domain analysis and the demoir approach. Journal of Organizational Computing and Electronic Commerce, 13(l):l-24, 2003.

[19] Annie T.T. Ying, Gail C. Murphy, Raymond Ng, and Mark C. Chu-Carroll. Predicting source code changes by mining change history. IEEE Transac• tions on Software Engineering, 30(9):574-586, Sept. 2004. 40

Appendix A

Complete Results

Figures A.l, A.2, A.3, A.4, A.5, A.6, A.7, A.8, A.9, A.10, A.ll and A.12 present detailed results of the validation of EEL. Each of the figures presents 27 differ• ent cases that were tested during the validation of EEL. Each of the test cases represent a unique combination of the bug comment partition, the size of the change set used to make a recommendation, and the number of recommenda• tions made. A.l describes the combinations for each of the test cases. Section 5.3 has a detailed description of how to read the box-and-whisker diagrams presented here. Appendix A. Complete Results 41

Table A.l: Mapping of test case number to unique combination of bug comment partition, size of change set and number of recommendations.

Test Case Number Data Combination 1 Comment Partition 1, 1/3 files, 3 recommendations 2 Comment Partition 1,1/3 files, 5 recommendations 3 Comment Partition 1, 1/3 files, 7 recommendations 4 Comment Partition 1,2/3 files, 3 recommendations 5 Comment Partition 1,2/3 files, 5 recommendations 6 Comment Partition 1,2/3 files, 7 recommendations 7 Comment Partition 1, all files, 3 recommendations 8 Comment Partition 1, all files, 5 recommendations 9 Comment Partition 1, all files, 7 recommendations 10 Comment Partition 2, 1/3 files, 3 recommendations 11 Comment Partition 2, 1/3 files, 5 recommendations 12 Comment Partition 2, 1/3 files, 7 recommendations 13 Comment Partition 2, 2/3 files, 3 recommendations 14 Comment Partition 2, 2/3 files, 5 recommendations 15 Comment Partition 2, 2/3 files, 7 recommendations 16 Comment Partition 2, all files, 3 recommendations 17 Comment Partition 2, all files, 5 recommendations 18 Comment Partition 2, all files, 7 recommendations 19 Comment Partition 3, 1/3 files, 3 recommendations 20 Comment Partition 3, 1/3 files, 5 recommendations 21 Comment Partition 3, 1/3 files, 7 recommendations 22 Comment Partition 3, 2/3 files, 3 recommendations 23 Comment Partition 3, 2/3 files, 5 recommendations 24 Comment Partition. 3, 2/3 files, 7 recommendations 25 Comment Partition 3, all files, 3 recommendations 26 Comment Partition 3, all files, 5 recommendations 27 Comment Partition 3, all files, 7 recommendations 1 2 3 4 5 8 7 S 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Test Number

EEL • LINE 10

Figure A.l: All Eclipse optimistic precision. 4 5 8 7 8 9 IB 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 28 27 Test Number

• EEL • LINE 10

Figure A.2: All Eclipse pessimistic precision. 10 11 12 13 14 15 18 17 18 19 20 21 22 23 24 25 28 27 2 3 Test Number

• EEL • LINE 10

Figure A.3: All Eclipse optimistic recall. 1 2 3 4 5 8 10 11 12 13 14 15 16 17 18 1B 20 21 22 23 24 25 26 27 Test Number

EEL • LINE 10

Figure A.4: All Eclipse pessimistic recall. 8 0 1D 11 12 13 14 15 16 17 18 18 20 21 22 23 24 25 26 27 Test Number

a EEL • LINE 10

Figure A.5: All Bugzilla optimistic precision. 2 3 4 5 8 10 11 12 13 14 15 16 17 18 10 20 21 22 23 24 25 28 27

Test Number

• EEL • LINE 10

Figure A.6: All Bugzilla pessimistic precision. 1 2 3 4 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Test Number

EEL • LINE 10

Figure A.7: All Bugzilla optimistic recall. 1 2 3 4 5 8 7 8 8 10 11 12 13 14 15 18 17 18 19 20 21 22 23 24 25 26 27 Test Number

EEL m LINE 10

Figure A.8: All Bugzilla pessimistic recall. 2 3 10 11 12 13 14 15 18 17 18 19 20 21 22 23 24 25 28 27 Test Number

EEL • LINE 10

Figure A.9: All Firefox optimistic precision. Q 2 3 4 5 8 7 8 9 10 11 12 13 141 15 16 17 18 19 20 21 22 23 24 25 26 27 Test Number

EEL • LINE 10

Figure A. 10: All Firefox pessimistic precision. •• J, H

1 2 3 4 5 8 7 I 0 10 11 12 13 14 15 18 17 18 10 20 21 22 23 24 25 28 27 Test Number

• EEL • LINE 10

Figure A.ll: All Firefox optimistic recall. In 1 2 3 4 5 6 7 3 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Test Number

• EEL • LINE 10

Figure A. 12: All Firefox pessimistic recall. in CO 54

Appendix B

Name Mapping Method

In the validation of EEL we needed to determine the set of developer's working on the system. At first, this step would seem to be simple: match the e-mail addresses used in the bug repository to the usernames from the source code repository. We needed to map e-mail addresses to a usernames since the vali• dation required that we compare information between the two different systems with disjoint username schemes. Bugzilla requires users to use their e-mail ad• dress as their login name and the source code repository has an independent set of usernames that are normally not based on e-mail address. Unfortunately, as described elsewhere [1], this mapping is non-trivial. To produce an initial map• ping, we applied a longest substring matching algorithm to the usernames and e-mail addresses. After attempting to automatically determine the mapping, the entire list was inspected and completed by hand. This approach produced a unique one-to-one mapping for many e-mail and username and e-mail ad• dresses but three situations remained: 1) no mapping could be determined, 2) a single username could map to multiple e-mail addresses (one-to-many) and 3) multiple e-mail and username could map to one or more e-mail addresses (many-to-many). The first case occurred only with a small subset of names and after some investigation, we determined that the users had not been active for at least three years, and determined that it was acceptable, for our situation, to discard these users from the recommended list of experts. We discarded this information since we were unable to validate the correctness of the recommendation if one of these users was suggested by EEL. The second case occurs when users changed their e-mail address or used a different address based on the parameters of the bug report (i.e., the product). This case did not have to be handled specially since EEL does not use the e-mail addresses, but the e-mail and username, therefore multiple e-mail addresses map to a single username. The third case exists since both projects changed their CVS username scheme after the project was started. Eclipse was originally developed by IBM, and when this transition occurred, many of the developers continued to work on the product, but were given new e-mail and username that followed the Eclipse standard. Mozilla decided to change the CVS usernames to be the e-mail ad• dress of the committer since many of the changes were submitted by patches and this was the easiest way to track who had made the change. To solve the problem of the many-to-many mapping, we chose one of the usernames to be the "master" username. This username is the one to which all of the e-mail ad- Appendix B. Name Mapping Method 55

Table B.l: E-mail to CVS username mapping statistics per project.

Eclipse Mozilla 1-1 Username to E-Mail Mappings 142 •( 82.1%) 493 ( 62.1%) 1-N Username to E-Mail Mappings 3 ( 1.7%) 103 ( 13.0%) N-M Username to Email Mappings 18 ( 10.4%) 178 ( 22.4%) Usernames With No Mappings 10 ( 5.8%) 20 ( 2.5%) Total Number Usernames 173 (100.0%) 794 (100.0%)

dresses associated with that username are mapped. Since the other usernames still existed within the software repository, we created a second mapping that mapped username to the "master" username if one existed. The master user- name is used as if it was a single username mapped to multiple e-mail addresses. In this way, we ensured that the information mined from the software repository was consistent with the information that we were using from the bug reports and that we did not loose any pertinent information. Table B.l provides the statistics of the username to e-mail mappings for each project. Appendix C provides a detailed list of all of the mappings used for the validation of EEL for each of the projects. 56

Appendix C

CVS to Bugzilla Name Mappings

C.l Eclipse

The following tables provide a detailed listing of the name mappings that were used in the validation of EEL for Eclipse. Table C.l contains all of the unique one-to-one mappings of username to e-mail address. Table C.2 contains the mappings where a single username mapped to multiple e-mail address, the du• plicate usernames, as well as the usernames that we were unable to map. Table C.l: Eclipse one-to-one username to e-mail mappings.

One-to-one Mappings akiezun=>[email protected] deboer= > [email protected] .com johna=> John [email protected] nick=>[email protected] toiTes=>alexandre. [email protected] dejan=>dejan@ca. ibm.com jolmw=>.Iolm_\Viegand@us .ibm.com obcsedin=->[email protected] aweiuand=>[email protected] jdeupree=>[email protected] kmaetzel=> kai-uwe_maetzel@ch. i bm. com othoniaim=> [email protected] acovas=>[email protected] dbaeumer=>dirk [email protected] gkarasiu=>[email protected] prapicau=>[email protected] anicfer=>[email protected] dj=>dj [email protected] karice=> [email protected] pdubroy= > [email protected] .com ahunter=> [email protected] eidsness=>[email protected] kcornell=> [email protected] pmulet=> [email protected] yamamikn=> [email protected] ]lioiuher=>[email protected] kdkelley=>[email protected] pwcbster=>pwebster@ca. ibm.com bbaumaii=>[email protected] jszm"szewsld=>tx;lipse@szui'Szcwski.com dkclm=>[email protected] rdiaves=>[email protected] bbawngmi=> [email protected] tcicher=>[email protected] kjolmson=>kent_jolmson@ca. ibm.com randyg= > R andy _G iffen@oti .com dbirsan=>[email protected] eduardo=>eduardo_per [email protected] kevinh=>[email protected] rpcretti—> [email protected] bbokowski=>Boris [email protected] emoffatt=>emofEatt@ca. ibm.com keviniii=>[email protected] semion=>[email protected] briany=> [email protected] egamma=>erich_gamma@ch. ibm.com khalsted=>khalsted@ca. ibm.com silenio=>Silenio_Quart [email protected] bshingar=> [email protected] bfam=>[email protected] khorne=> [email protected] skaegi=>[email protected] sarsenau=>[email protected] btripkovic=>[email protected] flieidric=> feli pe_hei dri [email protected]. com kmoir=> [email protected] sdimitro=> so ni [email protected] carolyn=> [email protected] ffusier=>[email protected] kradlofF—>[email protected] , steve=>steve [email protected] cmckillop=>[email protected] gheorghe=>gheoi"[email protected] kkolosow=>[email protected] ssarkar=>[email protected] celek=>[email protected] ginendcl=>gmendel@us .ibm.com krbarnes=>[email protected] sfranklin=>sus an [email protected] cgoldthor=> ego [email protected] ggayed=>[email protected] airvine=>lenar [email protected] sxenos=> sxenos@gmail .com schan—>cl [email protected] greg=>greg_ad [email protected] lkcm m el=> [email protected] tmaeder=>[email protected] cwong= > chcrie. wong@gmail .com giumar=>guniiar@wagei iknedit.org lparsons=>[email protected] thanson= > [email protected] cmarti=>du'[email protected] liar grave=>[email protected] lkues=>[email protected] tbay=> [email protected] ccoriiu=>christoplie_coniu@ca. ibni.com ikhelifi=>[email protected] inhuebscher=> [email protected] tellison=>[email protected] cknaus=> Claude [email protected] jlebrun=> Jacques Jebnm@oti .com mkeller=>[email protected] twatson=>tjwat.so [email protected] cmclarcn=>[email protected] jam es=> [email protected] maesclilimaim=>[email protected] twidmer= > tobias_widmer@cli .ibm .com curtispd=>[email protected] janeklb=> [email protected] mhatem=> Matthew [email protected] tod=> Tod [email protected] dmegert=>daiiieljnegert@ch. ibm.com j aburns=> jared _bur ns@us. ibm. com mdaniel=> [email protected] tyeimg= > tyeung@bea. com dswaiLSon=> [email protected] jbunis=>jar [email protected] in elder=> [email protected] .com veronilca=> [email protected] dwright.=>Darm_Wright@ca. ibm.com jlemieux=>jean-michol [email protected] mfaraj=> [email protected] vlad=> [email protected] daved=>[email protected] jed=> j ed. ander son@genui tec .com mremiie=> [email protected] wmelhem=> [email protected] dorme=>daveo@asc-iseri es.com jbrown=> [email protected] nival ent.a=> [email protected] droy=> [email protected] daudel=>david_audel@fr. ibm.com jefE=>[email protected] mcq=>Mike_Wilso [email protected] wharley=>[email protected] dspriiiggay=->david_springgay@ca. ibm.com jlaimeluc=> Jerome [email protected] mvoroiiiii=>Mikliail. [email protected] windiest=> [email protected] davids=>davi([email protected] jfogell=>jfogeLl@us .ibm.com mkaufman=>[email protected] drobeits=>[email protected] jgarms=> [email protected] mpawlowsk=> [email protected] dwilsoii=>[email protected] jdesrivieres=> [email protected] mvanmeek=>mvm@ca. ibm.com Table C.2: Eclipse one-to-many, duplicate and unknown username to e-mail mappings.

One-to-many Mappings Duplicate Username Mappings Unknown Usernames bbiggs=>[email protected] bbokowsk=>bbokowski fbelling bbiggs=> billy, [email protected] chrix=>ccornu lchui dpollock=>[email protected] darins=> dswanson malice dpollock=>[email protected] darin=>dwright ptff dpollock=>[email protected] droberts2=>droberts seven ebb=>[email protected] jszursze=>jszurszewski wadman ebb=>[email protected] erich=>egamma wchoi jeem=>jdesrivieres wmtest kent=>kjohnson Jonathan knutr=>kradloff oconstan lynne=>lkues maeschli=> maeschlimann oliviert=>othomann rodrigo=>rperetti ssq=>silenio sdimitro2=>sdimitro ptobias=>twidmer jeromel=>j lanneluc Appendix C. CVS to Bugzilla Name Mappings 59

C.2 Mozilla

The following tables provide a detailed listing of the name mappings that were used in the validation of EEL for Mozilla. Tables C.3, C.4, C.5, C.6 and C.7 contain all of the unique one-to-one mappings of username to e-mail address. Tables C.8, C.9 and C.10 contain the mappings where a single username mapped to multiple e-mail address. Tables C.ll and C.12 contains the duplicate user- names. Finally, table C.13 has all of the usernames that we were unable to map. Table C.3: Mozilla one-to-one username to e-mail mappings.

One-to-one Mappings aaronleventhal%moonset.net=>aaron [email protected] benjamin%smedbergs.us=>benjamin@smed bergs, us chanial%noos.fr—>[email protected] aaronr%us.ibm. com=>aaronr@us. ibm.com bent.mozilla%gmail.com=>[email protected] chjung%netscape.com=>[email protected] ajsclHilt%verizori.net=>[email protected] bernd.mielke%snafu.de=> [email protected] chouck=>[email protected] akhii.arora%sun.com=>[email protected] bliartOO%yahoo.com=> [email protected] chrisk%netscape.com=> [email protected] akkana%netscape.com=> [email protected] bishakhabanerjee% net scape, com=> [email protected] christophe. ravel.bugs%sun.com=>christophe.ravel. [email protected] akkziIla%sliallowsky.coni=>[email protected] bjorn%netscape.com=> [email protected] diuan g%n etscape.com=> ch uan g@n etscape. com alex%croczil!a.com=>alex@cro czilla.com BlakeR.1234%aol.com=>BlakeR [email protected] chuckb%netscape.com=>[email protected] atftx.fritze%croc'odilf>dips.(:oni = >atcx.iritze@crocodi]i>dips.corn blakeross%telocity.com = >[email protected] clt.bid%netscape.com=>cl [email protected] alexci. volkov.bugs%smi.com = >a [email protected] blythe%netscape.com—>bl [email protected] cl u %11etscape.com—> [email protected] a lexsavulov%netscape.com=>a.l ex [email protected] bmlk%gmx.de—>[email protected] cm p%mozilla.org—>cm [email protected] alfred.peng%sun.com=>al tied, [email protected] lmesse%netscape.com=> [email protected] colin%theblakes.com=>[email protected] aUa%lysator.tiu.se=>[email protected] bolian. yin%sun. com — > [email protected] colinp%oeone.com=>[email protected] allan%beauibi.r.dk=> [email protected] braddr%puremagic.com=>[email protected] conrad%ingress.com—>conrad@i ngress.com amardare%qnx.com—>[email protected] Bradley.Iunk%cinci.rr.com=> Bradley.! [email protected] coop%netscape.com—>[email protected] amasri%netscape. com=> [email protected] brate.l%lysator.liu.se=> [email protected] cotter%netscape.com=>[email protected] amiisil%netscape.com=>[email protected] brettw%gmail.com=>[email protected] cpeyer%adobe. com=> cpeyer@adobe. com anatoliya%netscape.com = >[email protected] briane%qnx.com—>[email protected] cvsliook%sicking.cc—>[email protected] andreww%netscape.com=>andrew\[email protected] bruce%citbik.org—>[email protected] d ac%x. cx=> dac@x. cx ann.adamcik%sun.com=>ann [email protected] bruce%cybersight.com=>[email protected] Dale.Stansberry%Nex warecorp.com=> [email protected] annie.su llivan%gmail.com = >an [email protected] bryce-mozilla%nextbus.com=>[email protected] daniel%glazman.org— >[email protected] anton.bobrov%>sun. com=>[email protected] bsharma%netscape. com=> [email protected] daniel.boelzle%sun.com=>[email protected] antonio.xu%sun.com=>[email protected] bsmed berg%covad.net—>bsmed [email protected] daumling%adobe.com—>[email protected] arielb%nets(:ape.com=>a [email protected] bugreport%peshkin.net—> [email protected] dave%intrec.com—> [email protected] arielb%rice.edu=>[email protected] bugzilla%arlen.demon.co.uk=> [email protected] davel%mozil!a.com — >davel@ mozilla. com arik%netscape.com = >[email protected] bugzilla%babylonsounds.com=>bugzilla@baby lonsounds.com davidm%netscape.com=> [email protected] aroiigthopher7olizardland.net.= >[email protected] bugzilla%glob.com.au=>[email protected] davidnic%netscape.com=>davi([email protected] arvid%quadrone.org=>[email protected] bugzilla%micropipes.com=>[email protected] d bl 8x%yahoo. com—> d b4 8x@yahoo. com ashishbhatt%netscape.com=> ash [email protected] bugzilla%standard8.demon.co.uk=> [email protected] dbragg%netscape.com—>[email protected] ashuk%eng.sun.com=>as [email protected] burnus%gmx.de—>[email protected] d con e%netscape.com—> dco ne@ netscape, com atremon%elansoftware.com=>atremon@elanso ftware.com buster%netscape.com=>[email protected] ddrinan%netsc ape.com—> [email protected] attinasi%netscape.com=>[email protected] bzbarsky%mit.edu—>bzbarsky@m it.edu dean_tessman%hotmaiI.com=>dean [email protected] av%netscape.cbm—> [email protected] carl. wong%intel.com—>carl. [email protected] depstein%netscape.com=>[email protected] axel%pike.org—>[email protected] cata%netscape.corn—>[email protected] despotdaemon%netscape. com=>despo [email protected] badami%netscape.com=> [email protected] cat.h leen %n'etscape. co m=> cath leen ©netscape .com dfm%netscape.com=>dfm@netscape,com barnboy%trilobyte.net=> [email protected] cavin%netscape.com—>cavin@net scape.com dianesun%netscape.com=>dtan [email protected] bear%code-bear.com=> [email protected] chak%netscape.com=>[email protected] dietrich%mozilla.com = >[email protected]

O Table C.4: Mozilla one-to-one username to e-mail mappings continued.

One-to-one Mappings Continued din1ator%11etscape.com — >[email protected] gayatrib%netscape.com = >gayat [email protected] jat%princetoii.edu=>[email protected] ding] is96qnx.com => [email protected] gbeasley%ne (scape, com—> [email protected] javi%11etscape.com=>[email protected] disttsc%bart.nl = >[email protected] german%netseape.com=>[email protected] * jay%mozilla.org=>[email protected] djani%netsca pe.com—>djani@netscape. com ghen dricks%no veil, com—> [email protected] jay. yan%sun.com=>jay. [email protected] dkl%redhat.com=>[email protected] gi jskru i tbosch %gm ail. com—>gij [email protected] j ban d %netscape. com=>j band@ netscape, com don%netscape.com=> [email protected] lbert.fang%sun.com—> gilbert, [email protected] j betak%11etscape.com=> [email protected] donglas%stcbiIa.ca=>douglas@stebi!a.ca nn.chen%sun.com — >[email protected] jcgriggs%sympatico.ca=>[email protected] dp%net scape.com—>[email protected] rish. manwani%eng.sun. com—>girish. [email protected] jdunn%netscape.com=>[email protected] dprice%netscape.com=> [email protected] glazman%netscape.com—> [email protected] jeff.dyer%compi]ercompany.com=>je [email protected] dr%netscape.com=>dr@net scape.com glen. beas!ey%suii.com=>[email protected] jeff.liedlund%matnxsi.com=>jefT. [email protected] drapeau%cng.sun.com—> [email protected] grail %cafebabe.org=> [email protected] jefft%iietscape.com=>[email protected] drichuis%playbeing.org—> [email protected] guha%netscape.com=>[email protected] jelwell%netscape.com=>jel [email protected] dschaffc%adobe.com—> [email protected] guru%startrek.com=>guru@st artrek.com Jerry.Kirk%Nexwarecorp.com=> [email protected] dsirnapalli%netscape.com=>dsirnapalli@nctsca pe.com hangas%netscape.com=> [email protected] jerry.tan%sun.com=>jerry. [email protected] d uncan %be. com=> d unc an@be. com hardts%netscape.com=>[email protected] jeveri ng%netscape .com=> je ver i iig@netscape. coin dwitt.e%staiiford.edu—>[email protected] hari shd%netscape. com—> [email protected] jfrancis%netscape.com=>[email protected] ebina%netscape.com=>[email protected] harrison% netscape, com—> [email protected] jgau nt%netscape. com=> jgau 11 L@netscape. com edburns%acm.org—>[email protected] henry .jia%sun.com—> [email protected] jgellman%netscape.com=>[email protected] eddyk%netscape.com=>[email protected] he witt% netscape, com—> [email protected] jglick%11etscape.com=>jgli [email protected] edwi n %woudt.nl=>ed [email protected] hidday%geocities.com—> [email protected] j i m _n ance%yahoo. com=> j i m _nance@yahoo .com emaijala%kolumbus.fi=>[email protected] hoa.nguyen%intel.com=>[email protected] jj %netscape. com=> jj @netsc ape. com endico%mozil]a.org=>[email protected] hpradhan%hotpop.com=> [email protected] jminta%gmail.com=>[email protected] enndeakin%sympatico.ca=>[email protected] hshaw%netscape.com—> [email protected] jocuri%softhome.net=>[email protected] ere%atp.fi=>[email protected] ian%hixie.ch—>[email protected] joe%retrovirus.com=>[email protected] ericb%neop]aiiet.(:om=>ericb@iieop!aiiet.com ian.mcgreer%sun.com=>[email protected] John.marmion%iireland.sun.com=>johii.in arm [email protected] etsai%netscape.com=>[email protected] idk%eng.sun. com —>[email protected] jol1ng%c0rel.com=> [email protected] e vaughan% netscape, com- > [email protected] igor%mir"2.org—> [email protected] joki %11etscape.com=> [email protected] eyork%netscape.com=>[email protected] igor .bu kanov%gm ai 1. com—> igor. bukanov @gm ai 1. com jonas.utterstrom%vit.tran.iiorniod.sc=>[email protected] fcrgus%!iet.scape.com=> [email protected] imajes%php.net—>[email protected] joshmoz%gmail.com=> [email protected] ttamingice%sourmilk.net=> [email protected] inaky.gonzalez%intel.com=> [email protected] joshua.xia%sun.com=>[email protected] fligtar%gmail.com—> [email protected] jab%atdot.org—> [email protected] jouni%heikniemi.net.=>[email protected] frankm%eng.sun.com=>[email protected] jag%tty. nl=> jag@tty. nl . jpierrc%inetscape.com =>[email protected] ftang%netscape.(;orn=>[email protected] jaggernaut%netscape.corn=>[email protected] jsun%netscape.com=>[email protected] gngan%11etscape.com—>gaga [email protected] Jan.Varga7ogmail.com—> Jan. [email protected] jtaylor%netscape.com=>j [email protected] garyf%netscape.coni=>[email protected] janc%11etscape.com—>[email protected] julien.pierre.bngs%sun.com=>[email protected] ga ry wade%11et.se: ape. com—>ga [email protected] jar%netscape.com —>[email protected] jwaa%jwatt.org=>jwat,t@j watt.org

05 Table C.5: Mozilla one-to-one username to e-mail mappings continued.

One-to-one Mappings Continued kairo%kairo.at=>[email protected] mark%moxienet.com = > [email protected] morse%netscape.com—>[email protected] kandrot%netscape. com = > [email protected] mark.lin%eng.sun.com=>[email protected] mostafah%oeone.com=>[email protected] karnaze%netscape.roni=>[email protected] markh%activestate.com=>[email protected] mozedi tor%floppy m oose .com=> m ozedi tor@floppy moose, com katakai%japaii.sim.com=>[email protected] marria%gm ail. com=>marria@gm ail.com mozil!a%justcameron.com—>[email protected] kevm%perldap.org=>[email protected] martijn.martijn%gmail.com=>mart [email protected] mozil!a%wei lbacher.org—> [email protected] klianson%i)etscftpe.<:om=>[email protected] marti iii%11etscape.com=> [email protected] mozilla. mano%sent. com=> [email protected] kiko%async.com.br=> [email protected] masayuki%d-toy box.com=>[email protected] m r bkap%gni ail. com—> m rbkap@g m ai 1. com %11etscape kin%netscape.com==> [email protected] mats.palmgren%bredband.net—>[email protected] mscott .com—> [email protected] kipp%net scape. com=> [email protected] matt%netscape.com=> [email protected] nistoltz%netscape.com — > [email protected] kjh-57277ocomcast.net=>kj [email protected] in auhias%i sorted.org=>[email protected] msw%oginip.org=>[email protected] kmcc]usk%netscape.com=>[email protected] mattwiliis%gmail.com = >[email protected] mvl%exedo.nl—> [email protected] koehler%mythrium.com=>koe]iler@my thrium.com matty%chariot.net.au=>[email protected] mwelcli%netscape.com—> [email protected] kostello7oiietscape.com=>[email protected] maxf%magma.ca=>[email protected] myk%mozilla.org— >[email protected] kvisco%ziplink.net=> [email protected] mccabe7onetscape.com=> [email protected] namachi%netscape.com—>[email protected] kyle.yuan%sun.com=>[email protected] mcgreer%netscape.com=>[email protected] navi ng%netscape. com—>na [email protected] nbhatla7onetscape kysmith%netscape.com=> [email protected] mcmullen%netscape. com=>mcmullen@netsca pe.com .com— >[email protected] kzhou%netscape.com=> [email protected] m cs%pear Icrescent. com=> m cs@pearlc rescent .com nboyd%atg.com=>[email protected] !aa%sparc.spb.su=>[email protected] mgalli%geckonnection.com=>[email protected] neeti%net scape, [email protected] !aw%netscape.com=>[email protected] mgleesonl7onetscape.com=> [email protected] neil%pftrkwaycc.co.uk=>[email protected] ]eif%netscape.com=> [email protected] mbammond%skippinet.com.au=>[email protected] neil. wi!liams%sun.com—>neil. [email protected] s011%b0lyard !eila.garin%eng.sun.com=> [email protected] niicliael.buettner%suii.com=>[email protected] nel .com—>[email protected] nels011b%11etscape leon.sha%sun.com=> [email protected] Michael. Kedl%Nexwarecorp.com=>Michael. [email protected] .com=> [email protected] ]eon.zhang%sun.com=> [email protected] michael.lowe%ibigfoot.com = >michael. [email protected] nhotta%netscape.com—>[email protected] kreeger7opark load runner%betak.net=> [email protected] michaelp%iietscape.com=>[email protected] nick. .edu=>[email protected] 11icolso11 locka%iol.ie—>[email protected] mike%neoplanet.com=> [email protected] nicolson%netscape.com — > @netscape.com longsonr%gmail.com=>[email protected] mike. morgan%oregonstatc.edu—> [email protected] nis%sparc.spb.su—>[email protected] lordpixel%mac.com=>[email protected] mike-Hnozilla7oineer.net=>mike-Hn [email protected] nislieeth%netscape.com—>[email protected] louie.zhao%sun. com=>louie. [email protected] mikep%oeone.com=>[email protected] nivedita%netscape.com—> [email protected] !ouis.martin%eng.sun.com=>[email protected] miodrag%netscape.coni=>[email protected] nki!ider%redhat.com^> [email protected] 11k !pham%netscape.com=> [email protected] mitchf%iietscape.com=>[email protected] wan % red hat. com—> n k wan@red hat. com lpso!it%gmail.com=> [email protected] mitesh%netscape.com=>[email protected] noriri ty%jcom.home.ne.jp—>[email protected] ltabb%netscape.coin=> [email protected] mj%digicool. com=> [email protected] norris%netscape.com—> [email protected] m_kato%ga2.so-net.ne.jp=>m_kato@ga'2.so-net. ne.jp mjudge%netscape.com=>mjudge@!ietscape.com oeschger%netscape.com—> [email protected] maolson7teart!ilink.iiet=>[email protected] m kaply%us.i bm .com=> m [email protected] .com olav%bkor.dhs.org=>[email protected] .fi marco%gnome.org=>[email protected] mnyromyr%tprac.de=>[email protected] 011i.Pettay%helsinki.fi->011i.Pettay@helsinki margaret.clian%sun.com=>[email protected] modgock%eng.sun.com=>[email protected] p_ch% verizon. net=> [email protected] Table C.6: Mozilla one-to-one username to e-mail mappings continued.

One-to-one Mappings Continued pamg.bugs%gmail.com=>[email protected] real peterv%mac.com=> [email protected] sar%netscape.com = >[email protected] paper%animecity.nu= >[email protected] redfive%acm.org—>[email protected] saul.edwards%sun.com=>saul.ed [email protected] pave!%gi ngerall.cz—> [email protected] reed % reedloden. com=> reed@ reed loden .com sayrer%gmail.com=>sayrer@gmai!.com pchen%netscape.com=> [email protected] relyea %net scape .com=> relyca@netscape. com scooterm6rris%comcast.net=>[email protected] pedemont%ils.ibm.co in — >[email protected] repka%netscape.com—> [email protected] scot t%scott^macgregor.org=> [email protected] pepper%inetscape.com—> [email protected] rgoodger%ihug.co.nz—>[email protected] scullin%netscape.com=>sculli [email protected] petc%a1phanumerica.com ==>pete@alphanumerica. com rhelmer%mozilla.com—> [email protected] sdagley%netscape.com=>[email protected] pete%imozdevgroup.com—>[email protected] rhp%net scape.com=> [email protected] sdv7osparc.spb.su—>sdv@sparc. spb.su pete.zha%sun.com—> [email protected] rich.burridge%suii.com=>[email protected] seaii7obeatnik.com=>[email protected] peter%propagandism.org=> [email protected] richm%stanfordalumni.org=>[email protected] serge%iietscape.com=>[email protected] Peter. VanderBeken%pandora.be—>Peter. [email protected] ri<:kg%netscape.com—>[email protected] sergei _d7ofi. tar tu.ee=> sergei _d@fi. tartu.ee risto%netscape.com=>[email protected] seth7ocs.brandeis.edu=>seth@cs. brandeis.edu peterlubczynski%11etscape.com—>peterlubc7Jynski@net scape.com peti tta%iiietscape.com—> [email protected] rje%netscape.com=> [email protected] sford3%swbell.net=>[email protected] pliil7oiiet.s(:ape.com=>[email protected] rjesup7owgate.com=> [email protected] sgehani7onetscape.com=>[email protected] phi!lip%inetscape.com=>philli [email protected] rko7otietscape.com—> [email protected] shanjiaii%netscape.com=>[email protected] phitringnalda%gmail.com—>philri [email protected] rlk%trfenv.com=>[email protected] shannond%netscape.com=>[email protected] pierre%11etscape.com—>pierre@netscape. com rob_strong%exchangecode.com=>robj;[email protected] sharparrow 1 %yahoo.com=>sharparrow 1 @yahoo.com pkasthig%google.com~>[email protected] robert%accettura.com—> [email protected] shawnp%earthling.net=>[email protected] pkw%us.ibm.com—>[email protected] robin.lu%sun.com=>[email protected] sherry.shen%siin.com=>[email protected] pnunn7onetscape.com=>[email protected] robinf%netscape.com=>[email protected] shliaJig7onetscape.com=>[email protected] poll mann%netscape.com=> [email protected] robodan%netscape. com=> [email protected] shrutiv%netscape.com=>shruti [email protected] pp%ludusdesign.com=>[email protected] rods7onetscape.com—> [email protected] silver%warwickcompsoc.co.uk=>sil [email protected] prasad%netscape.com—> [email protected] roeber%11etscape.com—>[email protected] simford.dong7csun.com=>[email protected] prass7oiietscape.com=>[email protected] rogc%netscape.com—>[email protected] simon%softel.co.il—>[email protected] psychoticwolf%carolina.rr.com—>psychot.icwolf@carolina. rr.com roger!%netscape.com=>[email protected] s!avomir.katuscak%sun.com=>slavom [email protected] puLterman%11etscape.com—>[email protected] rpallath%eng.sun.com=>[email protected] sman%netscape.com=>[email protected] quy7oigelaiis.com.au=>quy@igelaus. com.au rpotts%net scape.com—> [email protected] smeredith%netscape.com=>smeredi [email protected] racliam%nelscape.com—> [email protected] rrelyea7oredhat.com—>[email protected] smfr%smfr.org=>[email protected] radl1a%11etscape.com—>[email protected] - rth7ocygnus.com—> [email protected] sonja.mirtitsch%sun.com=>[email protected] raman%netscape.com—> [email protected] ruslan%netscape. com—> [email protected] sonmi%netscape.com=>[email protected] rangansen%netscape.com—> rangansen@net scape.com rusty.lyncb%intel.com=>rusty [email protected] spence%iietscape.com=>[email protected] rayw%]]iel.scape.com—> [email protected] r welt man %netscape. com — > [email protected] spider7onetscape.com=>[email protected] rbs%mal.hs. uq.edu.au=>rbs@maths. uq.edu.au saari %netsca.pc.com—>saari@net scape.com srilatha7oiietscape.com=>[email protected] rcassin%supernova..org=> [email protected] samuel7osieb.net—>[email protected] , srinivas%netscape.com=>srini [email protected] rchen%netscape.com—> [email protected] sa11cus%0ff.net—>sancus@ofT. net ssaux7onetscape.com=>ss [email protected] rdayal%11etscape.com—> [email protected] san deep, kon chady%sun. com=> sail deep, [email protected] ssu%netscape.com—>[email protected] Table C.7: Mozilla one-to-one username to e-mail mappings continued.

One-to-one Mappings Continued Stefan. Borggraefe%igmx.de=>Stefan. [email protected] valeski%11etscape.com—> [email protected] steffen.wilberg%web.de=> [email protected] var ada%o net scape, com— >var [email protected] stejohns%adobe.com—> [email protected] vidur%onetscape.com—>[email protected] stridey%gmai].com—> [email protected] vishy%onetscape.com=> [email protected] stuart.morgan%a1umni.case.edu—> smart, [email protected] vladd%b11gzilla.org—> [email protected] sudu%netscape.com=>[email protected] vladimir%pobox.com—> [email protected] suresh%netscape. com=> [email protected] wade%ezri .org= > wade@enri .org svn%xmlterm.org=>[email protected] waldem ar%oiietscape. com—> [email protected] waqar%netscape.com—> [email protected] syd%netscape.com=>[email protected] warren%11etscape.com—> [email protected] szegedia%freemail.hu=>[email protected] waterson%>netscape.com—> [email protected] t_mutreja%yahoo.com=> [email protected] wchangO222%0aol.com—>[email protected] • taek%netscape.com=>[email protected] wclouser%mozilIa. com—> [email protected] tague%netscape.com—>tague@net scape, com webmail%kmgerich.com—> [email protected] tajima%eng.sun.com=> [email protected] wr %rosenauer .org—> wr @rosenauer. org taka%netscape.com—>[email protected] wsharp%adobe.com—> [email protected] talisman%oanamorphic.com—> [email protected] vvtc%i netscape, com—> [email protected] tao%netscape.com=> [email protected] wtchang%oredhat.com—> [email protected] tara%tequilarista.org=> [email protected] wurblzap%gmail.com=> [email protected] tbogard%aol.net—> [email protected] Xiaobin.Lu%e11g.Sun.com—>[email protected] technutz%11etseape.net—> [email protected] yokoyama%onetscape.com—>yo [email protected] ted. mielczarek%ogm ail.com—>ted. [email protected] yueheng.xu%ointel.com=>[email protected] tfox%o net scape, com—>[email protected] yxia%011etscape.com—>[email protected] thayes%netscape.com—> [email protected] zach%zachlipton.com—>[email protected] thesteve%11etscape.com=> [email protected] zack%kde.org—>[email protected] thorn as. beni sch%psun. com—> [email protected] timm%netscape.coni—>[email protected] tingley%sundell.net—> [email protected] tomw%netscape.com=>tomw@net scape.com tony%oponderer.org—> [email protected] tonyr%fbdesigns.com — > tony [email protected] tor%cs.brown.edu=>[email protected] tmvi s%osedsy stem s.ca—>tra.vis@sedsy stems, ca troy%11etscape.com=>[email protected]. coin twalker%netscape.com—> [email protected] uriber%gm ail.com—> [email protected] val4%cornell.edu—> [email protected] Table C.8: Mozilla one-to-many username to e-mail mappings.

One-to-many Mappings a&ronl%netscape.com = >[email protected] bryner%netscape.com—> [email protected] dmose%mozilla.org=>dm [email protected] aaron]%i]etscape.eom = >aaroii]@netscape.eoni bryner%netscape.com=>[email protected] dmose%mozilla.org=>[email protected] alecf%netscape.com=>[email protected] bryner%11etscape.com—>[email protected] donm%netscape.com = >[email protected] alecf%netscape.com=> [email protected] bstei!%inetscape.com—>bs [email protected] donm%onetscape.com = >[email protected] andreas.otte%primn.s-onlme.de=>[email protected] bsteil%onetscape.com—>bstelt@netsca pe.com doronr%us. i bm .com => [email protected] andreas.otte%prinms-online.de=>andreas. [email protected] caillon%redhat.com—> [email protected] doronr%us. ibm.com = >[email protected] anthonyd%netscape.com=>[email protected] caillon%redhat.com—>[email protected] dougt%iiietscape.com=>[email protected] anthonyd%netseape.com=>[email protected] cailloii%redhat.com—>[email protected] dougt%oiietscape.com=>[email protected] cbegle%011etscape.com — >cbegle@formerly-netscape. com. tld asa%mozilla.org=>[email protected] ducarroz% netscape, com=> d ucarroz@du car roz .org cbegle%11etscape.com—>cbegle@nelsca pe.com asa%mozilla.org=>[email protected] ducarroz%netscape.com=>[email protected] cbiesisiger%web.de—>[email protected] bbaetz%acm.org—> [email protected] ducarroz%onetscape.com=>[email protected] cbiesmger%web.de—> [email protected] bbaeiz%acm.org—>[email protected] dveditz%iietscape.com=>[email protected] ccarlen%netscape.com=>[email protected] bbaetz%acm.org=>bbaef [email protected] dveditz%netscape.com=>dvedi [email protected] ccarlen%onetscape.com—>[email protected] bbaetz%ac in.org=> [email protected] erik%netscape.com=>[email protected] ecooper%netscape.com=>[email protected] beard %net.seape.coni=>[email protected] erik%netscape.com=>[email protected] ccooper%netscape.com—> [email protected] beard%netscape.com=> [email protected] erik%netscape.com=>[email protected] cmanske%netscape.com — >cmanske@j ivamedia.com ben%netscape.com—>[email protected] friedman%oiietscape.com = > [email protected] cmanske%onetscape.com=>[email protected] ben%netscape.com=>[email protected] friedman%inetscape.com=>fri [email protected] cst%andrew.emu.edu—> [email protected] ben%netscape.com=> [email protected] fur%onetscape.com=>[email protected] cst% and rew. cm u. edu=> cs t.@yecc. com fur%onetscape.com = > [email protected] ben%netscape.com=>[email protected] curt%11etscape.com—> cur [email protected] gandalf% firefox.pl=>[email protected] ben%netscape.com=>[email protected] curt%inetscape.com=>[email protected] gandal f% firefox.pl=>[email protected] bieiiveiiu%inetscape.com=>[email protected] cyeh%inetscape.com—>Chris. [email protected] bienvemi%iiietscape.com=>[email protected] cyeh%onetscape.com—>cyeh@bluem artini.com gavin%gavinsharp.com=>[email protected] blizzard%redhat.com=> [email protected] cyeh%netscape.com—> [email protected] gavin%ogavinsharp.com=>[email protected] blizzard%redhat.com=>[email protected] danm%011et.scape.com—>[email protected] gerv%gerv.net=>[email protected] bob%cbclary.com—>[email protected] danm%oiietscape.com=>[email protected] gerv%gerv.net=>[email protected] bob%ibc,lary.com—>[email protected] danm%iiietsca pe.com—>[email protected] gordon%netscape.com=>[email protected] brade%netscape.com=> [email protected] darin%111etscape.com—>[email protected] gordon%netscape.com=>[email protected] brade%11etscape.com=> [email protected] darin%oiietscape.com=>darin@m eer.net granrose%netscape.com=>[email protected] brendaii%omozilla.org=> [email protected] dariu%inetscape.com—>[email protected] granrose%netscape.com=> [email protected] brenda.ii%mozilla.org=> [email protected] dbaron%dbaron.org—>[email protected] heikki%netseape.com=>[email protected] briano%net scape, com=> [email protected] dbaron%dbaron.org—>[email protected] heikki%11etscape.com=> [email protected] briano%netscape.eom=>[email protected] dbaron%dbaron.org—>[email protected] hjtoi%ocomcast.net=>hj [email protected] brof!eld%jellycan.com=>[email protected] dbradley%netscape.com=>[email protected] hjtoi%comcast.net=>hj [email protected] brofield%jellycaii.com=>[email protected] dbradley%11etscape.com—>dbradley@nctscape. com hwaara%gmail.com=>liwaara@che!lo.se hwaara%gmail.com=> [email protected] Table C.9: Mozilla one-to-many username to e-mail mappings continued.

One-to-Many Mappings Continued hy«tt%mozilla.org=>[email protected] kestes%walrus. com->kest [email protected] peterv%11etscape.com—>[email protected] hyat.t%ruozilla.org=>[email protected] kestes7o walrus. eom=>kest [email protected] pet.erv%11etscape.com—> [email protected] jake%bugzillti.org=> [email protected] kestes% walrus. eom=>keslcs@walrus. com piiikerton7onetscape.com—>[email protected] jake%bugzilla.org=> [email protected] kestes% walrus. com=>kest [email protected] pinkerton %inotscape.com—>pinkerton@net scape.com jginyers7onetscape.com—>[email protected] kieran%et.enial.imdonet.eom==>kieran@etcrnal. undonet.com preed%imozilla.com=>[email protected] jginyers%netscape.com—> [email protected] kieran7oeternal.undonet.com = >[email protected] preed%mozilla.com=>[email protected] jkeiser%mct scape.com=>[email protected] kirke%netscape.com=> [email protected] preed%mozilla.com=>[email protected] jkeiser%net scape.com = >j [email protected] kirke7onetscape.com—> [email protected] pschwartau%netscape.com=>[email protected] jkeiser%net scape.com—>j oh [email protected] larryh7o netscape, com=> larry Jiowe@hgsi .com pschwartau%netscape.com=>[email protected] jmas%softcatala.org=>andrejohn. [email protected] larryh7onetscape.com=> [email protected] ramiro7onetscape.corn—>[email protected] leaf%mozilla.org=> [email protected] jmas%softcatala.org—>jmas@so ft.catala.org ramiro%netscape.com—> [email protected] leaf%mozilIa.org=> [email protected] joe.chou%sun.com=>[email protected] ramiro7oiietscape.com—> [email protected] mang%netscape.com=> [email protected] joe.chou%sun.com—> joe. [email protected] rgi nda%netscape. com—> [email protected] i mang7onetscape.com=>[email protected] . jrgm7onetscape.com—>j [email protected] rginda7onetscape.com=>[email protected] mbarnson7osisna.com=> [email protected] jrgm%net scape. com=>jrgmorri [email protected] rginda%netscape.com=>[email protected] mbarnson%sisna. com—> [email protected] jruderman%hmc.edu—>[email protected] rj.ke!ler7obeonex.com=>rj. [email protected] mcafee%net scape, com=> [email protected] jrudernian%hmc.edu=>jruderm [email protected] rj.keller7obeonex.com=>[email protected] nicafee%iietscape.com=>[email protected] jshin%mailaps.org=>[email protected] roc+%cs.cmu.edu=>[email protected] mconnor%steelgryphon.com=> [email protected] jshin%mailaps.nrg=>jshin] [email protected] roc+%cs.emu.edu=>roc-i-@cs. cmu.edu mconnor%steelgryphon.com=> [email protected] scc7omoziila.org=>[email protected] jst%mozilla.org=>[email protected] mcoiinor%steelgryphon.a;>m=> [email protected] scc7omozilla.org=>[email protected] jst7omozilla.org—>[email protected] mkanat7obugzilla.org=>[email protected] seawood7oiietscape.com —>[email protected] jst7omozilla.org=>[email protected] mkanat%bugzilla.org=> [email protected] jst7omozilla.org=> [email protected] mlm7onetscape.com=>mlm@aya. yale.edu seawood%netscape.com — > [email protected] jiistdave7obugzilla.org=>[email protected] m 1 m %netscape. com=> m 1 m ©netscape, corn selmer%11etscape.com —>[email protected] justdave7obugzilla.org—>[email protected] momoi%netscape.com=>momoi@a!umni.indiana.edu selmer%netscape.com=>[email protected] justdave7obugzilla.org—>justdave@syndicomm. com momoi7onetscape.com=> [email protected] sfraser7onetscape.com=>[email protected] jwalden7omit.edu—>[email protected] mozilla7ocolinogilvie.co.uk=>[email protected] sfraser7oiietscape.com=>[email protected] jwalden%init.edu=>[email protected] mozilla%colinogilvie.co.uk=>mozilla@co!inogi!vie.co.uk shaver%mozilla.org=>[email protected] jwz7oniozi!la.org=>[email protected] mozilla. BenB%bucksch.org=> ben. [email protected] shaver%mozilla.org=>[email protected] jwz7omozi!la.org—>jwz@netsca pe.com mozilla.BenB%bucksch.org=>[email protected] sicking7obigfoot.com=>[email protected] kaie%>netseape.com=>[email protected] mozilla. BenB%bucksch.org=>mozilla@bucksc!i.org sicking7obigfoot.com=>[email protected] kaie%nelseape.com=>[email protected] pavlov7onetscape.com=> [email protected] slamm=>[email protected] karl.kornel7omindspeed.com—>[email protected] pavlov7onetscape.com=> [email protected] slamm=>[email protected] karl.kornel7omindspeed.com=>[email protected] peterl%netscape.com—> [email protected] smontagu7onct.scape.com=>[email protected] kerz7onct.scape.com—>[email protected] peterl%netscape.com=> [email protected] smoiitagu7onetscape.eoin=>[email protected] kerz%netsco pe.com = > [email protected] Table C.10: Mozilla one-to-many username to e-mail mappings continued.

One-to-many Mappings Continued sspitzer7omozilla.org—>[email protected] sspitzer7omozilla.org=:>sspi [email protected] stephend7onetscape.com=>[email protected] stephend%netscape.eom=>[email protected] terry%mozi! la.org—> [email protected] terry%mozilla.org—> [email protected] t horn %netscape. com=> [email protected] thom%netseape.com=> [email protected] timeless%mozdev.org=> [email protected] timeless%mozdev.org—> [email protected] timeless%mozdev.org=:> [email protected] tomk7omitre.org=>[email protected] tomk%omitre.org=>[email protected] toshok%11etscape.com=> tosliok@hu ngry.com toshok%net.scape.com=> [email protected] varga%onetscape.com—> [email protected] varga%netscape.com — > [email protected] varga%netscape.com=> [email protected] zuperdee%penguinpowered.com=>zuperdee@'penguinpowered.com zuperdee%penguinpowered.com=>[email protected] Table C.ll: Mozilla Duplicate username mappings.

Duplicate Username Mappings aaronl%chorus.net=:>aaJ"onl%netscape.com cur t%scr uznet. com=>curt% netscape, com jake%acutex.net=>jake%bugzi lla.org akkana— > akkana%11etscape.com cyeh=>cyeh%netscape.com jevering=>jevering%netscape.com alecf%f1ett.org—>alecf%netscape.com cyeh%b1uemartini.com=>cyeh%netscape.com j gel lman=>jgel lman%011etscape.com andreas.otte%debitel.net=>andreas.otte%primus-online.de danm—>danm%netscape.com jgmyers%speakeasy.net=>jgmyers%onetscape.com ant! lony d—> antliony d%netscape. com [email protected]=>danin%netscape.com j kei ser%i name. co m=> j kei ser%netscape. com asasaki%netscape.com—>asa% mozilla. org darin%meer.net—> darin%11etscape.com joe.chou%eng.sun.com=>joe.chou%sun.com bbaelz%cs.mcgill.ca— >bbaetz%acm.org davidm=>davidm%netscape.com john%johnkeiser.com=>jkeiser%netscape.com bbaetz%studen t. usyd.edu .au—> bbaetz%ac m .org dbaron%fas.harvard.edu—> dbaron%dbaron.org johnkeis=>j keiser%11etscape.com bclary%bclary.eom — > bob%bclary.com dbragg— > dbragg%11etscape.com joki=>jokt%netscape.com beard—>beard%netscape.com dcone—> dcone%netsca pe. com j st%ci tec. fi=>jst%om ozilla.org ben%0be11goodger.com —>ben%netscape. com despotdaemon=>despotdaeinon%netscape.com jst%mozilla.jstenback.com=>jst%mozil!a.org beiig%obengoodger.com=>ben%netscape.com dfm—>dfrn%netscape.com jst%netscape.com=>jst%mozil!a.org bieiiveini%onventure.com—>bienvenu%netsca pe.com dmose%net scape, com—>dmose%mozilla.org justdave%syndicomm.com=>justdave%bugzilla.org bjorn=> bjon1%11etscape.com don=>don%netscape.com jwz=>jwz%mozilla.org blizzard%appl iedtheory.com—>blizzard%redhat.com donm—>donm%netscape.com jwz%onetscape.com=>jwz%mozilla.org blythe—>blythe%netscape.com donm%bluemartini.com=>donm%netscape.com kaie%kuix.de=> kaie%11etscape.com brade=> brade%netscape.com dougt%omeer.net—>dougt%netscape.com karl%kornel .name= > karl. kornel%m i ndspeed.com brade%comcast.net= > brade%netscape. com dveditz—>dveditz%netscape.com kar naze= > kar naze% ne tscape. com brendan—>brendan%m ozilla.org dveditz%cruzio.com=>dveditz%netscape.com kerz%mozillazine.org=>kerz%netscape.com bre11dan%11etscape.com—>brendan%mozilla.org erik%dasbistro.com—>erik%oiietscape.com kestes%staff.mail.com=>kestes%walrus.com briano==>briano%netscape.com erik%vanderpoel.org—>erik%oiietscape.com kestes%otradinglinx.com=>kestes%wa!rus.com bryner%brianryner.com—>bryner%netscape.com eyork—> eyork%11etscape.com kestesisme%yahoo.com=>kestes%walrus.com bryner%ui uc.edu=>bryner%netscape. com friedman->friedman%netscapc.com ki!i=>kin%onetscape.com bstell%ix.netcom.com=>bstel!%netscape.com ftang— > ftang%11etscape.com kipp=>kipp%onetscape.com buster—>buster%netscape.com fur—>fur%netscape.com kirk.erickson%isuti.com=>kirke%netscape.com caillon%returnzero.com=>caillon%redhat.com fur%fur%)netscape.com knicclusk=>kmcclusk%netscape.com ccarlen%omac.com—>ccarlen%netscape.com gagan— > gagan%netscape.com kostello=>kostello%neLscape.com ccooper%deadsquid.com—> ccooper%11etscape.com ghendricks—>ghendricks%novell.com law=> law%11etscape.com CI1ris.Yeh%110kia.com—>cyeh%netscape.com gordon—>gordon%netscape.com leaf=>leaf%mozilla.org chuang—>chuang%onetscape.com guha—>guha%netscape.com lcaf%onetscape.com=>leaf%moziila.org c ls%seawood. org—> sea wood %11etscape.com hardts=> liardts%netscape.com leif=>leif%netscape.com cltbld—>cl tbld%11etscape.com heikki%citec.fi=>heikki%oiietscape.com ltabb=>lLabb%netscape.com clu=>clu%oiietscape.com hshaw—>hshaw%)iietscape.com mang%subearrier.org=>mang%netscape.corn cmanske=>cmanske%netscape.com hwaara%cliello.se—>hwaara%gmail.com matt=> matt%11etscape.com cmanske%ji vamedia.com—>cm anske%11etscape.com hyatt=>hyaU%mozilla.org mbarnson%excitehome.net=>mbarnsoii%sisna.com cst%yecc.com—>cst%oandrew.cmu.edu hyatt%onetscape.com=>byatt%omozilla.org mcafee=>mcafee%onetscape.com

05 OO Table C.12: Mozilla Duplicate username mappings continued.

Duplicate Username Mappings Continued mcafee(7pmocha.com=>rncafee%netscape.coni scc%netscape.com—>scc%mozi Ua.org mccabe—>mccabe%netscape.com scullin—>scullin7oiietscape.com mcomior%myrealbox.com=>mconnor7osteelgr yphon.com sdagley—> sdagley%11etscape.com michaelp—>michaeip%ne tscape.com sehner—>se!mer%netscape.com mjudge—>mjudge%netscape.com sfraser—>sfraser7onetscape.com mkanat%kerio.com—>mkanat%bugzilla.org shaver—> shaver7om ozi lla.org mlm—> ml m%netscape.com shaver7onetscape.com=>shaver%mozilla.org morse— > morse%11etscape.com si am m—> slam m %netscape .com mozilla%ducarroz.org—>ducarroz%ne tscape.com sman—>sman%netscape.com nisheeth—> nisheeth%11etscape.com smontagu%smontagu. org—>smontagu%netsca pe.com norris—>iiorris%netscape.com speiice=>spence%netscape.com pavlov%opavlov.net—>pavlov%netscape.com . spider=>spider%netscape.com peterl—>peterl%netscape.com srini vas—>srini vas%11etscape.com peterv%ipropagandism.org—>peterv%netscape.com 'sspitzer%11etscape.com—>sspitzer%mozilla.org pierre—>pierre7onetscape.com 'stephendonner%yahoo.com—>stephend%netscape.com pinkerton—>pinkerton 7oiietscape.com sudu—>sudu%netscape.com pinkerton%aol.net=:>pinkerton%netscape.com tague— >tague7oiietscape.com pnunn—>pnunn%netscape.com terry—> terry %mozilla.org pollmann—>pollmann%netscape.com , • terry%netscape.com=> terry %mozilla.org preed%11etscape.com—>preed%mozilla.com thorn—> thorn 7oiietscape.com preed %sigkil 1 .com=> preed %mozi 11 a. com timeless%ni ac.com — >timeless7omozdev.org racham— >racham%netscape.com timm=>timm%netscape.com radha—>radha%netscape.com toshok—>toshok%>netscape.com raman—>raniaii7oiietscape.com toshok7ohungry. com=> toshok %netscape. com raniiro=>ramiro%net.scape.com troy—> troy7oiietscape.com ramiro%eazel.com=>ramiro%netscape.com valeski—>valeski%netscape.com ramiro%fateware.com—>ramiro%netscape.corn varga%nixcorp.com=> varga7onetscape.com rginda%hacksrus.com—>rginda%netscape.com varga%u tcru.sk—> varga7oiietscape.com rginda%ndcico.com=>rginda%inet.scape.com ' vidur—> vidur7oiietscape.com rickg—>rickg%netscape.com waldemar—> waldemar%111etscape.com rjc—>rjc%netscape.com warren—>warren%netscape.com robinf—>robinf%netscape.com wa ter son=> waterson7onetscape.com rods—>rods7otietscape.com wtc—>wtc7oiietscape.com rpotts=>rpotts%netscape.com zuperdee%yahod.com=>zu per dee 7openguinpowered.com sar—>sar%netscape.com scc=>scc%mozilla.org Table C.l3: Mozilla unknown username mappings. O Unknown Usernames uid401 uid402 O uid408 uid502 5 uid504 uid623 uid815 root ta atotic cboatwri c clayton dario

djw jsw S5 mervin montulli relliott ricardob henrit I S' cn

o