Relationship Between Geographical Location and Evaluation Of

1 Relationship between Geographical Location and Evaluation of 59 2 60 3 Developer Contributions in GitHub 61 4 62 5 Ayushi Rastogi Nachiappan Nagappan 63 6 64 7 UCI, Irvine Microsoft Research, Redmond 65 8 [email protected] [email protected] 66 9 67 10 Georgios Gousios André van der Hoek 68 11 TU Delft, Netherlands UCI, Irvine 69 12 [email protected] [email protected] 70 13 71 14 ABSTRACT 1 INTRODUCTION 72 15 Background Open source software projects show gender bias sug- Open source software (OSS) development has always been envi- 73 16 gesting that other demographic characteristics of developers, like sioned as a merit-based model [24]. This gave rise to the term ‘code 74 17 geographical location, can negatively influence evaluation of con- is king’ [11][24], indicating a belief that the quality of the code 75 18 tributions too. Aim This study contributes to this emerging body being contributed should be the sole factor in determining whether 76 19 of knowledge in software development by presenting a quantitative or not the code is accepted to be included in the primary line of 77 20 analysis of the relationship between the geographical location of de- development. 78 21 velopers and evaluation of their contributions on GitHub. Method Various studies, however, have shown that in addition to techni- 79 22 We present an analysis of 70,000+ pull requests selected from 17 cal factors relating to code quality, social factors influence accep- 80 23 most actively participating countries to model the relationship be- tance or rejection decisions. For instance, social closeness between 81 24 tween the geographical location of developers and pull request submitter and integrator, as built through prior interactions, can 82 25 acceptance decision. Results and Conclusion We observed struc- positively influence acceptance [27]. As another example, status 83 26 tural differences in pull request acceptance rates across 17 coun- in the community increases acceptance [27]), while size of the 84 27 tries. Countries with no apparent similarities such as Switzerland contribution decreases acceptance [15]. 85 28 and Japan had one of the highest pull request acceptance rates Only recently, a demographic attribute of contributors – gender, 86 29 while countries like China and Germany had one of the lowest pull is found to influence evaluation of contributions in OSS projects. 87 30 request acceptance rates. Notably, higher acceptance rates were In a study of 1.4 million GitHub user profiles, Terrell et al. found 88 31 observed for all but one country when pull requests were evaluated that code contributions by female developers were less often ac- 89 32 by developers from the same country. cepted than their male counterparts when their gender was identi- 90 33 fiable [26]. 91 34 CCS CONCEPTS This paper contributes to this emerging body of knowledge in 92 35 software development with a study that focuses on a demographic 93 • Software and its engineering → Open source model; Pro- 36 attribute of developers - geographical location. We elicit the re- 94 gramming teams; • Human-centered computing → Empirical 37 lationship between the geographical location of developers and 95 studies in collaborative and social computing; 38 evaluation of their contributions by modeling GitHub projects’ 96 39 archival data using country of residence of developers to measure 97 40 KEYWORDS geographical location and pull request acceptance decision to mea- 98 41 Open source, geographical location, pull requests, GitHub. sure evaluation of contributions. 99 42 We present an analysis of 70,000+ pull requests originating from 100 ACM Reference Format: 43 17 most actively participating countries on GitHub. We control 101 Ayushi Rastogi, Nachiappan Nagappan, Georgios Gousios, and André van 44 for the influence of other factors known to influence pull request 102 der Hoek. 2018. Relationship between Geographical Location and Evalu- 45 acceptance decision to quantitatively analyze the relationship of the 103 ation of Developer Contributions in GitHub. In ACM / IEEE International 46 Symposium on Empirical Software Engineering and Measurement (ESEM) country of submitters with overall pull request acceptance decision. 104 47 (ESEM ’18), October 11–12, 2018, Oulu, Finland. ACM, New York, NY, USA, We also consider the case in which submitter and integrator are 105 48 8 pages. https://doi.org/10.475/123_4 from the same country for its possible relationship with pull request 106 49 acceptance decision. 107 50 Our findings reveal that there are statistically significant dif- 108 51 Permission to make digital or hard copies of part or all of this work for personal or ferences in the acceptance rates among the pull requests issued 109 52 classroom use is granted without fee provided that copies are not made or distributed by developers from different countries. We, however, could not 110 for profit or commercial advantage and that copies bear this notice and the full citation 53 on the first page. Copyrights for third-party components of this work must be honored. attribute the observed differences to the factors which may seem 111 54 For all other uses, contact the owner/author(s). obvious otherwise. For example, we examined whether contribu- 112 55 ESEM, 2018 tors from non-English speaking countries have lower pull request 113 © 2018 Copyright held by the owner/author(s). 56 ACM ISBN 123-4567-24-567/08/06. acceptance rate compared to English speaking countries. We did 114 57 https://doi.org/10.475/123_4 not find any pattern though. 115 58 1 116 ESEM, 2018 Ayushi Rastogi, Nachiappan Nagappan, Georgios Gousios, and André van der Hoek 117 2 BACKGROUND called the integrator or committer, receives what is called a pull 175 118 This section presents a brief history on bias at work originating request when a contributor submits their proposed update. The 176 119 from individuals’ demographic attribute - geographical location, integrator then ‘pulls’ the code from the repository of the contribu- 177 120 and how it translates to OSS projects. This is followed by relevant tor, examines it closely, and makes their decision as to whether to 178 121 background material concerning the notion of pull-based develop- push the code to the main branch. This two-phased process allows 179 122 ment and factors that are already known to influence pull requests. projects to be more transparent, with open discussions taking place 180 123 surrounding complex pull requests, thereby making the overall 181 124 2.1 Bias at work originating from individuals’ process more democractic [18]. Today, nearly half of the projects 182 125 geographical location on GitHub use the pull-based development model [12]. Particularly 183 126 because these projects tend to be larger and public, numerous pre- 184 127 The role of individuals’ demographic attribute - geographical loca- vious studies have focused on various aspects and implications of 185 128 tion - in influencing evaluation of contributions is known for long the pull-based development model (e.g., Yu et al. examined factors 186 129 in traditional workplaces, where people meet in-person for work influencing latency in pull request evaluation32 [ ] and Gousios et 187 130 and are aware of the demographic attributes of fellow contributors. al. studied aspects of work practices and challenges in pull-based 188 131 Indeed, a whole body of literature has emerged, albiet dispersed development [13]). 189 132 over many communities. For instance, in the Olympics and other 190 133 forms of international competitions, experienced judges with sig- 2.3 Factors influencing pull request acceptance 191 nificant training in fairly evaluating all participants were found 134 Previous studies have already begun to look at pull request accep- 192 to nonetheless rank participants from their own nations higher 135 tance and the influence that different factors may have. For instance, 193 than participants from other nations [23]. As another example, in 136 it has been found that a developer’s technical and social reputa- 194 academia, papers with authors from some regions receive fewer 137 tion positively influences acceptance of the changes they submit 195 citations than papers from authors of other regions, even when the 138 in their pull requests [7][15][20][27]. Some of these studies fur- 196 papers were of comparable quality [19]. 139 ther demonstrated that adhering to a project’s technical and social 197 In open source software development, in contrast, developers 140 norms increases the chances of acceptance [15][17][27], although 198 collaborate online, from same or different geographical locations 141 with the caveat that, the more mature and popular a project is, the 199 and were not aware of the demographic attributes of fellow de- 142 lower the chances of pull request acceptance are overall [15][27]. 200 velopers. In recent years, with the rise of environments such as 143 Focusing on the nature of a pull request itself, it has also been 201 GitHub [3], Bitbucket [1], and others (e.g., [4][5]), OSS developers 144 shown that the size of the change, its perceived quality, and the 202 today can become much more aware of the demographic attributes 145 theme and objective of the pull request, among others, influence 203 of their fellow developers [28], which may lead to them consciously 146 its chances of acceptance considerably [15][27][31][25]. 204 or subconsciously changing the way they interact with and make 147 Broadly, the factors that have been found to seemingly influence 205 decisions toward their fellow developers [29]. 148 pull request acceptance can be categorized as being related to: (1) 206 A recent study of 1.4 million GitHub user profiles reported bias 149 the developer themselves, (2) the project as a whole, and (3) the 207 in evaluation of code contributions when a demographic attribute 150 specific pull request.

Load more