Network Analysis of the Evolution of an Open Source Development Community

Network analysis of the evolution of an open source development community by Zhaocheng Fan A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Applied Science in Technology Innovation Management Carleton University Ottawa, Canada, K1S 5B6 September 2013 © 2013 Zhaocheng Fan Abstract This research investigated the evolution of an open source development community by undertaking network analysis across time. Several open source communities have been previously studied using network analysis techniques, including Apache, Debian, Drupal, Python, and SourceForge. However, only static snapshots of the network were considered by previous research. In this research, we created a tool that can help researchers and practitioners to find out how the Eclipse development community dynamically evolved over time. The input dataset was collected from the Eclipse Foundation, and then the evolution of the Eclipse development community was visualized and analyzed by using different network analysis techniques. Six network analysis techniques were applied: (i) visualization, (ii) weight filtering, (iii) degree centrality, (iv) eigenvector centrality, (v) betweenness centrality, and (vi) closeness centrality. Results include the benefits of performing multiple techniques in combination, and the analysis of the evolution of an open source development community. i Acknowledgements Firstly, I thank my parents for their constant encouragement without which this research would not be possible. I take this opportunity to express my profound gratitude and deep regards to my guides Prof. Michael Weiss and Prof. Steven Muegge for their exemplary guidance, monitoring and constant encouragement throughout the course of this thesis. I also would like to thank Prof. Tony Bailetti and Prof. Mika Westerlund, who also gave me many valuable suggestions on my research. I want to thank Wayne Beaton in Eclipse Foundation. He provided the important input database that was used in this study. ii Table of Contents Abstract ................................................................................................................................ i Acknowledgements ............................................................................................................. ii List of Figures ..................................................................................................................... v List of Tables ..................................................................................................................... vi List of Appendices ............................................................................................................ vii Definitions of Key Terms ................................................................................................ viii Chapter 1 Introduction ................................................................................................... 1 1.1 Objective .............................................................................................................. 4 1.2 Deliverables .......................................................................................................... 4 1.3 Relevance ............................................................................................................. 5 1.4 Contributions ........................................................................................................ 5 1.5 Overview of method and results........................................................................... 6 1.6 Organization ......................................................................................................... 7 Chapter 2 Literature Review.......................................................................................... 9 2.1 Open Source Communities and Project Participation ........................................ 10 2.2 Structure of Open Source Development Communities ...................................... 11 2.3 Network Analysis of Open Source Communities .............................................. 12 2.4 Lessons Learned ................................................................................................. 17 Chapter 3 Research Design and Method ..................................................................... 19 3.1 Unit of Analysis ................................................................................................. 19 3.2 Study Period ....................................................................................................... 19 3.3 Method ............................................................................................................... 19 3.3.1 Define research objectives .......................................................................... 20 3.3.2 Data collection ............................................................................................ 21 3.3.3 Network analysis techniques selection ....................................................... 22 3.3.4 Network analysis framework ...................................................................... 25 3.3.5 Tools ........................................................................................................... 26 3.3.6 Network Analysis Approach ....................................................................... 28 3.3.7 Results Comparison .................................................................................... 28 Chapter 4 Results ......................................................................................................... 29 iii 4.1 Visualization....................................................................................................... 29 4.1.1 Project Development Network .................................................................... 32 4.1.2 Community Structure .................................................................................. 34 4.1.3 Community Evolution ................................................................................. 36 4.2 Weight Filtering and Most Active Committer ................................................... 45 4.2.1 Weight Filtering .......................................................................................... 46 4.2.2 Primary Committer ..................................................................................... 50 4.3 Degree Distribution and Eigenvector Centrality ................................................ 55 4.3.1 Company network ....................................................................................... 55 4.3.2 Degree Distribution ..................................................................................... 59 4.3.3 Eigenvector Centrality ................................................................................ 66 4.4 Betweenness Centrality and Closeness Centrality ............................................. 72 4.4.1 Betweenness Centrality ............................................................................... 72 4.4.2 Closeness Centrality.................................................................................... 78 Chapter 5 Discussions ................................................................................................. 84 5.1 Summary of results............................................................................................. 84 5.2 Answers to the Research Questions ................................................................... 88 5.3 Contributions ...................................................................................................... 92 5.3.1 Methodological contributions ..................................................................... 93 5.3.2 Practical contributions ................................................................................ 96 5.4 Reliability and Validity ...................................................................................... 98 Chapter 6 Conclusions ............................................................................................... 101 6.1 Limitation ......................................................................................................... 103 6.2 Opportunities for future research ..................................................................... 105 References ....................................................................................................................... 107 Appendix ......................................................................................................................... 116 Appendix 1 Eclipse project network (2001-2012) ...................................................... 116 Appendix 2 Eclipse company network (2001-2012)................................................... 123 Appendix 3 Company network converter (A piece of code) ...................................... 129 iv List of Figures Figure 3.1 Illustration of the Data extraction .................................................................... 22 Figure 3.2 Study framework ............................................................................................. 26 Figure 3.3 Company network converter ........................................................................... 27 Figure 4.1 The overall Eclipse development network (2001-2012) ................................. 33 Figure 4.2 Connection between two local communities ................................................... 35 Figure 4.3 Eclipse project

Network Analysis of the Evolution of an Open Source Development Community

DICE Framework – Initial Version

D2.1 State-Of-The-Art, Detailed Use Case Definitions and User Requirements

Eclipse Ganymede at a Glance 10/13/09 9:39 AM

BIRT) Mastering BIRT

Flexibility at the Roots of Eclipse

Annex 1 Technology Architecture 1 Source Layer

D2.1 OPENREQ Approach for Analytics & Requirements Intelligence

Introduction to the BIRT Project | © 2007 by Actuate; Made Available Under the EPL V1.0 Business Intelligence and Reporting Primer

Eclipse Members Meeting, December 16, 2004, Teleconference

COVID-WAREHOUSE: a Data Warehouse of Italian COVID-19, Pollution, and Climate Data

Using Actuate BIRT Designer Professional Information in This Document Is Subject to Change Without Notice

Installing and Deploying Eclipse BIRT Information in This Document Is Subject to Change Without Notice