Klaas-Jan Stol, Björn Lundell, Gregory R. Madey (Eds.)

Proceedings of the Doctoral Consortium at the 9th International Conference on Open Source Systems, 2013

Koper-Capodistria, Slovenia, 25 June 2013

Skövde University Studies in Informatics 2013:2

Proceedings of the Doctoral Consortium at the 9th International Conference on Open Source Systems, Koper-Capodistria, Slovenia, 2013. Edited by K. Stol, B. Lundell, and G. Madey.

Copyright of the papers contained in this proceedings remains with the respective authors.

Skövde University Studies in Informatics 2013:2 ISSN 1653-2325 ISBN 978-91-978513-5-0 www.his.se

Proceedings of the Doctoral Consortium at the 9th International Conference on Open Source Systems, Koper-Capodistria, Slovenia, 2013

Edited by:

Klaas-Jan Stol Lero—The Irish Software Engineering Research Centre University of Limerick, Ireland

Björn Lundell University of Skövde, Sweden

Gregory R. Madey University of Notre Dame, IN, USA

Preface

Open Source Software (OSS) remains to be an active research area and the OSS conference as its premier publication venue has reached its ninth edition this year.

To facilitate new researchers with an arena to present and receive feedback on their research, the OSS conference has had a Doctoral Consortium for several years. The principle objective of the consortium is to provide doctoral students the opportunity to present their research at various stages of production – from early drafts of their research design to near completion of their dissertation – in a forum where they can get critical and helpful feedback from a community of interested scholars and other students as they work to finish their degree.

This volume contains the five papers, each of which was reviewed by members of the program committee. After the reviews, authors were given the opportunity to revise their paper based on the input they received from the reviewers. This volume contains the revised versions of the papers, which were presented at the Doctoral Consortium at the Ninth International Conference on Open Source Systems, in Koper-Capodistria, Slovenia in June 2013. The papers provide a look into the research-in-progress by Ph.D. students who study a variety of aspects of OSS.

We wish to thank the members of the Program Committee of the Doctoral Consortium who have provided valuable feedback on the papers. We also thank all Ph.D. students for their participation.

Klaas-Jan Stol Björn Lundell Gregory Madey

Program Committee

Cornelia Boldyreff University of East London, UK Andrea Capiluppi Brunel University, UK U. Yeliz Eseryel University of Groningen, Netherlands Joseph Feller University College Cork, Ireland Patrick Finnegan University of New South Wales, Australia Jesús M. Gonzalez-Barahona Universidad Rey Juan Carlos, Spain Imed Hammouda Tampere University of Technology, Finland Chris Jensen Google, USA Stefan Koch Bogazici University, Turkey Lorraine Morgan Lero, NUI Galway, Ireland Gregorio Robles Universidad Rey Juan Carlos, Spain Barbara Russo Free University of Bolzano-Bozen, Italy Walt Scacchi University of California, Irvine, USA Maha Shaikh University of Warwick, UK Giancarlo Succi Free University of Bolzano-Bozen, Italy Tony Wasserman Carnegie Mellon Silicon Valley, USA

Table of Contents

Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks ...... 1 Amir Azarbakht and Carlos Jensen

Professionalization Dynamics in a Network of an Open Source Software Project ...... 15 Clément Bert-Erboul

Open-Source Platform-based Strategies: Multiple Case Studies in the Mobile Devices Industry ...... 27 Jose Teixeira

QA Practices and FLOSS Communities ...... 37 Adina Barham

The role of Free/Libre Open Source Software User’s Face-to-Face Meeting: A Case Study of Drupal User’s Local Meetup ...... 57 Eunyoung Moon

Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks

Amir Azarbakht, Carlos Jensen

Oregon State University, School of Electrical Engineering & Computer Science 1148 Kelley Engineering Center, Corvallis OR 97331, USA {azarbaam,cjensen}@eecs.oregonstate.edu http://eecs.oregonstate.edu/˜azarbaam http://eecs.oregonstate.edu/people/jensen

Abstract. How can we understand FOSS collaboration better? Can social issues :.': +3+8-+ (+ /*+4:/A+* '4* '**8+99+* (+,58+ /: /9 :55 2':+ '4 :.+ )533; nity heal itself, become more transparent and inclusive, and promote diversity? We propose a technique to address these issues by quantitative analysis of social *?4'3/)9 /4 !! )533;4/:/+9 $+ 685659+ ;9/4- 95)/'2 4+:=581 '4'2?9/9 3+: rics to identify growth patterns and unhealthy dynamics; giving the community a .+'*9;6 =.+4 :.+? )'4 9:/22 :'1+ '):/54 :5 +49;8+ :.+ 9;9:'/4'(/2/:? 5, :.+ 6850+):

Keywords. !! !5)/'2 ?4'3/)9 "+3658'2 4'2?9/9 8++6+4 !5;8)+ !5,: ware, FLOSS, Forking, Social Network Analysis

1 Introduction

!5)/'2 4+:=5819 '8+ ' ;(/7;/:5;9 6'8: 5, 5;8 95)/'2 2/<+9 '4* :.+ )8+':/54 5, 542/4+ 95 cial communities has been a natural extension of this phenomena. Free/Open Source Software (FOSS) development efforts are prime examples of how community can be 2+<+8'-+* /4 95,:='8+ *+<+2563+4: '9 +,,58:9 '8+ ,583+* '85;4* )533;4/:/+9 5, /4:+8 est, and depend on continued interest and involvement in order to stay alive [Nyman 2011]. Though the bulk of collaboration and communication in FOSS communities occurs 542/4+ '4* /9 6;(2/)2? '))+99/(2+ :.+8+ '8+ 3'4? 56+4 7;+9:/549 '(5;: :.+ 95)/'2 *? 4'3/)9 /4 !! )533;4/:/+9 850+):9 3/-.: -5 :.85;-. ' 3+:'3586.59/9 =.+4 ,')+* =/:. '4 /4B;> 5, 4+= *+<+256+89 58 :.+ /4<52<+3+4: 5, '4 5;:9/*+ 58-'4/@':/54 54 B/):9 (+:=++4 *+<+256+89 8'/9+* '9 :.+ 8+9;2: 5, */<+8-+4: 56/4/549 '(5;: :.+ ,;:;8+ 5, :.+ 6850+): 3/-.: 2+'* :5 '4 +49;/4- ,581 5, 6850+): '4* :.+ */2;:/54 5, :.+ )533;4/:? 581/4- +/:.+8 '9 '

Please cite as: A. Azarbakht and C. Jensen, 2013, Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks, in: K. Stol, B. Lundell and G. Madey (Eds.) Proceedings of the OSS Doctoral Consortium. 2 A. Azarbakht and C. Jensen process. Second, many social dynamics in FOSS have been studied using a case-study methodology, focusing on a selected subset of the available data. In this paper, we propose to use social network analysis to study the evolution and social dynamics of FOSS communities. With these techniques we hope to identify mea- 241$2 22."( 3$# 6(3' 4-'$ +3'8 &1.4/ #8- ,("2 $& 2(,,$1(-& ".-;("3 2 6$++ 2 early indicators of major events in the lifespan of an online community. One dynamic we are especially interested in are those of forked FOSS projects. We will seek to validate this technique by comparing the results of our analysis to the results of a study of forked FOSS projects by [Robles and Gonzalez-Barahona 2012]. The goal is to demonstrate that this quantitative approach can be applied to commonly available FOSS archives to get a better understanding of the evolution of these communities. This paper is organized as follows: We present related literature on online social ".,,4-(3($2 1$".4-3(-& 3'$(1 %."42 -# 3'$ :-#(-&2 $ 3'$- /1$2$-3 3'$ & / (- 3'$ literature, and what further study needs to be done. Next, we discuss why the issue -$$#2 3. !$ ##1$22$# -# 6'. !$-$:32 %1., (3 (- 3'$ ,.3(5 3(.- 2$"3(.- .++.6(-& that, we present three research questions framed as hypotheses that we are going to test. After that, we propose a methodology as to how to test the validity of the hypotheses, 6'("' (-"+4#$2 & 3'$1(-& # 3  #.(-& 3'$ - +82(2 -# 3'$ 5(24 +(9 3(.- .% 3'$ :-#(-&2 At the end, we present future work and challenges.

2 Related Work

The social structures of FOSS communities have been studied extensively over the past decade. Researchers have studied the social structure and dynamics of team commu- nications [Howison et al. 2006, Bird et al. 2008], identifying knowledge brokers and their associated activities in FOSS projects [Sowe et al. 2006], their sustainability [Ny- man 2011], FOSS forking [Nyman 2011, Robles and Gonzalez-Barahona 2012], their topology [Bird et al. 2008], their demographic diversity [Kunegis et al. 2012], gender differences in the process of joining them [Kuechler et al. 2012] and the role of the core team in their communities [Torres et al. 2011]. They have tended to look at community as a static structure rather than a dynamic process. This makes it hard to determine cause and effect, or the exact impact of social changes. The study of communities has grown in popularity in part thanks to advances in 2."( + -$36.1* - +82(2 1., 3'$ $ 1+($23 6.1*2 .- 234#8(-& (-%.1, 3(.- ;.6 -# /1$ #("3(-& ".-;("3 -# :22(.- (- &1.4/2 !8  "' 18  "' 18   3. 3'$ ,.1$ 1$"$-3 works of [Leskovec et al.] on the statistical properties of community structure in social networks, there is a growing body of quantitative research on online communities. The earliest works on communities was done with a focus on information diffusion (- ".,,4-(38  "' 18    "' 18 (-5$23(& 3$# 3'$ :22(.- .% ".,,4-(38 3'$ /1."$22 .% ".,,4-(3($2 2/+(33(-& (-3. 36. .1 ,.1$ / 132 $ %.4-# 3' 3 :22(.- ".4+# !$ /1$#("3$# !8 //+8(-& 3'$ .1#4+*$12.- ,(-"43 +&.1(3', .1# -# 4+*$12.-   Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks 3



Table 1. *' /'#574'5 1( &+8'45+6; 75'& $; !574 '6 #.  "

Metrics Meaning 6#$+.+6; '0&'0%; 1( # 01&' 61 *#8' +06'4#%6+105 9+6* 6*' 5#/' 01&'5 18'4 6+/' 1%+#$+.+6; '0&'0%; 1( # 01&' 61 *#8' &+(('4'06 +06'4#%6+105 0A7'0%' 7/$'4 1( (1..19'45 # 01&' *#5 10 # 0'6914- #0& *19 +65 #%6+105 #4' %12+'& #0&14 (1..19'& $; 16*'4 01&'5 ') 9*'0 +6 ,1+05.'#8'5 # %108'45#6+10 /#0; 16*'4 01&'5 ,1+0.'#8' 6*' %108'45#6+10 611 127.#4+6; 7/$'4 1( 01&'5 +0 # %.756'4 *19 %419&'& # 57$%1//70+6; +5

*' %1//70+%#6+10 2#66'405 1(  &'8'.12'45 +0 # $7) 4'215+614; 9'4' ':#/ +0'& $; !19+510 '6 #.  " *'; %#.%7.#6'& 176&')4'' %'064#.+6; #5 6*'+4 /'64+% 76&')4'' %'064#.+6; /'#574'5 6*' 2412146+10 1( 6*' 07/$'4 1( 6+/'5 # 01&' %106#%6'& 16*'4 01&'5 176)1+0) 18'4 *19 /#0; 6+/'5 +6 9#5 %106#%6'& $; 16*'4 01&'5 +0%1/ +0) *'; %#.%7.#6'& 6*+5 %'064#.+6; 18'4 6+/' =+0 &#; 9+0&195 /18+0) 6*' 9+0&19 (149#4&  &#;5 #6 # 6+/'> *'; (170& 6*#6 =9*+.' %*#0)' #6 6*' %'06'4 1(  241,'%65 +5 4'.#6+8'.; 70%1//10> 2#46+%+2#6+10 #%4155 6*' %1//70+6; +5 *+)*.; 5-'9'& (1..19 +0) # 219'4.#9 &+564+$76+10 9*'4' /#0; 2#46+%+2#065 #22'#4 (14 # 5*146 2'4+1& 1( 6+/' #0& # 8'4; 5/#.. 07/$'4 1( 2#46+%+2#065 #4' #6 6*' %'06'4 (14 .10) 2'4+1&5 74 #2241#%* 4 A. Azarbakht and C. Jensen is similar to theirs in how we form collaboration graphs and perform our temporal anal- ysis. Our approach is different in terms of our project selection criteria, the metrics we examine, and our research questions. The tension between diversity and homogeneity in a community was studied by /( !#- . &  &  &#-.- 60 ( .1),% -..#-.#- ." 3 /-  .) 2'#( ." evolution of large-scale networks over time. They found that except for the diameter, all other measures of diversity shrunk as the networks matured over their lifespan. Kunegis et al. [Kunegis et al. 2012] argued that one possible reason could be that the community structure consolidates as projects mature.

Table 2. The measures of diversity used by [Kunegis et al. 2012]

Network property Network is diverse when A network is diverse when Paths between nodes Paths are long Effective diameter Degrees of nodes Degrees are equal #(# ) 6# (. ) ."  !, #- tribution Communities Communities have similar sizes Fractional rank of the adjacency matrix Random walks Random walks have high probabil- Weighted spectral distribution ity of return Control of nodes Nodes are hard to control Number of driver nodes

To date, most studies on FOSS have only been carried out on a small number of projects, and using snapshots in time. To our knowledge, no study has been done of project forking that has taken into account the temporal dimension.

3 Motivation

To better understand and measure the evolution and social dynamics of FOSS projects, integral components to understanding their evolution and direction, we need new and  .. , .))&- #." ."#- %()1& ! ( ." - .))&- 1 )/& " &* *,)$ .- , 7 . )( ." #, actions, and help community leaders make informed decisions about possible changes or interventions. We want to map the dynamics of communities to real world phenomena.  (.#6.#)( #- ." 6,-. -. * .) , .# 3 ( /( -#,  3('#  ), ." '! #- )(  A community that does not manage growing pains may end up stagnating or dissolving. Managing growing pains is especially important in case of FOSS, where near half the project contributors are volunteers [Forrest et al. 2012]. [Oh et al. 2004] have argued that openness in FOSS is “[...] generally perceived as having a positive connotation, however, the term can also be interpreted as referring to some unconstructive charac- teristics, such as unobstructed exit, susceptible, vulnerable, fragile, lacking effective Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks 5 regulation, and so on. The unobstructed exit and lack of regulatory force inherent in the OSS community can result in a community’s susceptibility and vulnerability to herded exits by its participants. Commercial vendor intervention, an alternative project becom- ing available, and licensing issues can result in some original core members ceasing to provide their loyal service for the community, which can prompt their coworkers to leave as well” [Oh et al. 2004]. &$*1&3 '02 35$$&33 02 34"(/"4*0/ 3534"*/"#*-*49 02 '2"(.&/4"4*0/ $05-% #& *%&/4*> able, leading to a set of best practices and pitfalls.

4 Research Questions

& "2(5& 4)"4 4)& 30$*"- */4&2"$4*0/3 %"4" 2&?&$43 4)& $)"/(&3 4)& $0..5/*49 (0&3 through, and will be able to describe the context surrounding a forking event. Robles and Gonzalez-Barahona [Robles and Gonzalez-Barahona 2012] classify the main reasons for forking into six classes, listed in Table 3.

Table 3. )& ."*/ 2&"30/3 '02 '02,*/( "3 $-"33*>&% #9 0#-&3 "/% 0/:"-&:"2")0/"  !

Reason for forking Example forks Technical (Addition of functionality) Xpdf & Poppler More community-driven development EGCS & GCC Discontinuation of the original project Apache web server Commercial strategy forks *#2&'>$&  1&/'>$&02( Legal issues X.Org & XFree Differences among developer team OpenBSD & NetBSD

Three of the six listed reasons are socially related, and so should arguably be re- ?&$4&% 30.&)07 */ 4)& 30$*"- */4&2"$4*0/ %"4" 3 "/ &8".1-& *' " '02, 0$$523 #&$"53& of a desire for “more community-driven development”, we expect to see an interaction patterns in the collaboration data showing. a strongly-connected core community get- ting more isolated from the rest of the community, and the formation of another very "$4*6& "/% 7&--$0//&$4&% $02& 02& 31&$*>$"--9 052 2&3&"2$) )9104)&3&3 "2& Hypothesis #1: Social interactions data can be used to predict an imminent fork. Hypothesis #2: “Personal differences” and “technical differences” are associated with distinctly different interaction patterns. It is possible to tell the difference between when the reason for forking was personal vs. technical. Hypothesis #3: It is possible to tell through collaboration patterns between when the reason for forking is “more community-driven development” and “differences among the developer team”. 6 A. Azarbakht and C. Jensen 5Proposed Methodology

Determining how FLOSS networks evolve over time requires us to adopt a case study methodology. To this end, we propose to examine several FLOSS projects. This will involve obtaining mailing list archives, bug repositories, Git logs, creating the collabo- ration graphs, applying social network analysis technique and measuring SNA metrics, !.$ 6)35!,):).' 4(% %6/,6).' '2!0(3 % 0,!. 4/ (!6% =6% 0(!3%3 !3 $%3#2)"%$ ). 4(% following:

5.1 Phase 1: Data Mining

We plan to study six projects that have forked and six projects that have not forked as a control group. Data collection involves mining mailing list archives, bug repositories, and Git logs. [Howison and Crowston 2004] described their experiences of “spidering, parsing and analysis of SourceForge data.” Building on their research, we will carefully avoid the pitfalls prone to skew the study. We plan to contribute to and use FLOSSmole [Howison et al. 2006] resources, which provides scripts for FLOSS research data and analyses.

5.2 Phase 2: Creating Communication Graphs

Many social structures can be represented as graphs. The nodes represent actors/players and the edges between them represent the interaction among the actors. Such graphs can be a snapshot of the network, which forms a static graph, or a longitudinally changing network, also called a dynamic graph. In this phase, we process the data to form a communication graph representing the community. This is fraught with complications. Collaboration graph for a large com- munity, e.g. Linux Kernel with 9515 contributors will be a complex network of 9515 nodes and millions of edges. Special graph sampling algorithms are needed to use clus- tering techniques and social network analysis tools on such a large and complex net- work. These graph-sampling techniques have been studied and compared by [Wang et al. 2011] and [Lu et al. 2011]. [Chi et al. 2007] e.g. applied a temporal smoothness to their spectral clustering algorithm, and argued that incorporating this method results in a more robust clustering results, while it provides less sensitivity to short-term noise and is “adaptive to long-term clustering drift.”

5.3 Phase 3: Temporal Evolution Analysis

In this phase, we want to analyze the changes that happen to the community over a given period of time, e.g. two years before and two years after the forking event. To this end, we are interested in seeing the changes in the roles individuals have, any shifts in the center of gravity of the project, unusual dilution or concentration of the part of Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks 7 the community and how information can diffuse along the community. These changes can translate into questions, e.g. is everyone still connected to everyone like before, or there are bridges burnt down. The roles individuals take in the community are in par with the individual’s importance or centrality in the network. We can measure each individual’s importance with a social network analysis measure called centrality. i.e. who is more important/central than the others? e.g. who has the most number of friends in a community? (degree centrality) There are many ways of looking at an individual’s importance. This is measured by a centrality called closeness centrality. The farness 0( # /0&' +4 &'?/'& #4 5*' 46. 0( +54 distances to all other nodes. The closeness 0( # /0&' +4 &'?/'& #4 5*' +/7'34' 0( 5*' (#3 ness. More informally, the more central a node is the lower its total distance to all other nodes. Closeness centrality can be used as a measure of how fast information will spread through the network [Chakrabarty and Faloutsos 2006]. Secondly, if we are looking for people who can serve as bridges between two distinct communities, we could measure the node’s betweenness centrality. Betweenness centralities for mediators who act as intermediate entities between other nodes are higher [Chakrabarty and Faloutsos 2006]. *+3& +( %3044%0..6/+5: %0--#$03#5+0/ +4 5*' (0%64 8' %#/ .'#463' edge between- ness centrality. Edges connecting nodes from different communities have higher edge centrality values. In the community collaboration graph, edge betweenness or stress of an edge is the number of these shortest paths that the edge belongs to, considering all shortest paths between all pairs of nodes in the graph. Fourth, one can claim that certain people in the community are more important than others, and whoever is close to them, is relatively more important than others. In graph terms, this is measured by eigenvector centrality 8*+%* +4 $#4'& 0/ 5*' #446.15+0/ 5*#5 %0//'%5+0/4 50 *+)*130?-' /0&'4 %0/ 53+$65' .03' 50 5*' +.1035#/%' 0( # /0&' 00)-'>4 #)'#/, -+/,#/#-:4+4 #-)03+5*. $: !3+/ #/& #)'  " +4 # 7#3+#/5 0( 5*' '+)'/7'%503 %'/53#-+5: .'#463' / 4*035 %'/ trality measures have been used in several studies to identify key player in a community. In addition to the centrality measures, we plan to look into the resilience of the community as well. By resilience, we mean how well the network holds its structure and form when some parts of it are deleted, added, or changed. For a graph, the resilience of a graph is a measure of its robustness to node or edge failures. This could occur for +/45#/%' 8*'/ #/ +/@6'/5+#- .'.$'3 0( 5*' %0..6/+5: -'#7'4 #/: 3'#-803-& )3#1*4 are resilient to random failures but vulnerable to targeted attacks. Resilience can be related to the graph diameter: a graph whose diameter does not increase much on node or edge removal has higher resilience [Chakrabarty and Faloutsos 2006]. ' 1-#/ 50 .'#463' 5*'4' .'53+%4 &'?/'& 0/ 5*' %0..6/+%#5+0/ )3#1* #4 8'-- #4 %-645'3+/) %0'(?%+'/5 0( /0&' &')3'' &+453+$65+0/ 0( 5*' '/5+3' /'5803, #/& 5*' )3#1*>4 &+#.'5'3 8*04' &'?/+5+0/4 #3' 065 0( 5*' 4%01' 0( 5*+4 1#1'3 *' 4+)/+?%#/%' 0(  .'#463'4 0/ # 4+/)-' %0--#$03#5+0/ )3#1* +4 &'?/'& #4 %0.1#3'& 50 # 3#/&0. )3#1* ') #/ 3&=04<'/:+ 3#/&0. )3#1* 0( 5*' 4#.' 4+;' !3&=04 #/& <'/:+  " 50 '9#.+/' 5*'+3 3'-#5+7' 4+)/+?%#/%' ' 1-#/ 50 &0 5*+4 #/#-:4+4 07'3 8 A. Azarbakht and C. Jensen

Fig. 1. Heat-map color-coded examples of nodes with high centrality metric are shown above. The same network is analysed four times with the following centrality measures: A) Degree centrality, B) Closeness centrality, C) Betweenness centrality and D) Eigenvector centrality [Rocchini 2012] time, rather than just a one-snapshot analysis. So, the temporal changes in measured values are what we are primarily interested in. We are aware that while datasets consisting of huge real-world graphs are obtain- able, their sizes may range from thousands nodes and several million edges. While con- ventional algorithms to compute graph metrics (shortest paths, centrality, betweenness, etc.) exist, many of these algorithms become impractical for large graphs. Leskovec [Leskovec et al. 2008] argues that one can use sampling algorithms – but that is a non- trivial problem since the goal is to maintain structural properties of the network.

5.4 Phase 4: Temporal Visualization

&7&3"- 7*46"-*;"5*0/ 5&$)/*26&4 "/% 500-4 "3& 64&% */ 5)& =&-% 0' 40$*"- /&5803, "/"-: sis, for instance Gephi [Bastian et al. 2009], NetLogo [Wilensky 1999], igraph [Csardi and Nepusz 2006], NetworkX [Hagberg et al. 2008], SoNIA [Bender-deMoll and Mc- Farland 2006], NodeXL [Smith et al. 2009]. Gephi, is a FLOSS tool for exploring and manipulating networks. It is capable of handling large networks with more than 20,000 nodes and features several SNA algorithms. It is customizable with plugins and can be used for dynamic network visualization. Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks 9

Fig. 2. Github community visualization, color-coded by users’ primary programming language used [Cuny 2010]

5.5 Phase 5: Interviewing & Triangulation

# #!$ !"( ( %)"(((* ("%)' + $ " (# "(&*+ , $ ,&' "(/ in the community and the situation studied to gain their perspective on the community as well. This will be useful in cross-validating the results, as the part of communication that was not logged online could be taken into account in this way. Furthermore, we will #!$& #)& /""' (# (#' # # ' " #"- - &#"  ' '#)  allow us to determine how well the SNA techniques work to identify the underlying reasons for a fork.

5.6 Challenges

The proposed technique uses the data from online communications. The assumption that all the communication can be captured by mining repositories is intuitively imperfect, but inevitable. Hence, to minimize the effect of this assumption, we complement the quantitative approach with a qualitative approach of interviewing key individuals from the community. This will help triangulate the results. 10 A. Azarbakht and C. Jensen 6Preliminary Results

Some initial work is done, namely scraping mailing list data off several FOSS projects, as well as their bug repositories. We have also done a literature review of the existing work on social network analysis on FOSS communities, and have experienced working with large graphs. There are a myriad of possible applications for the possible outcomes of this line of research, and the gap in the existing body of research shows a need for further study of FOSS social network analysis.

7 Acknowledgments

We would like to thank the peers at the Human-Computer Interaction Lab at Oregon State University, with special thanks to Victor Kuechler and Jennifer L. Davidson for their feedback.

References

1. Asur, S., S. Parthasarathy, and D. Ucar, (2009), “An event-based framework for characteriz- ing the evolutionary behavior of interaction graphs,” in ACM Trans. Knowledge Discovery Data. 3, 4, Article 16, (November 2009), 36 pages. 2009. 2. Bastian, M., S. Heymann, and M. Jacomy, “Gephi: an open source software for exploring and manipulating networks,” presented at the International AAAI Conference on Weblogs and Social Media, 2009. 3. Bender-deMoll, S. and D. A. McFarland, “The Art and Science of Dynamic Network Visual- ization,” Journal of Social Structure, vol. 7, 2006. 4. Bergquist, M. and Ljungberg, J., (2001). , “The power of gifts: organizing social relation- ships in open source communities,” Info. Systems J., 11, 305-320. 2001. 5. Bird, C., D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu, “Latent social structure in open source projects,” in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, New York, NY, USA: ACM, pp. 24-35, 2008. 6. Brin, S. and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” in Proceedings of the 7th international conference on World Wide Web 7 (WWW7), Philip H. Enslow, Jr. and Allen Ellis (Eds.). Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 107-117. 1998. 7. Chakrabarti, D. and C. Faloutsos. “Graph mining: Laws, generators, and algorithms,” ACM Computing Surveys, 38, 1, Article 2, 2006. 8. Chi, Y., X. Song, D. Zhou, K. Hino, and B. L. Tseng, “Evolutionary spectral clustering by incorporating temporal smoothness,” in Proceedings of the 13th ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining (KDD ’07). ACM, New York, NY, USA, 153-162. 2007. 9. Crowston, K., and Scozzi, B., (2002). “Open Source Software Projects as Virtual Organiza- tions: Competency Rallying for Software Development,” IEE Proceedings Software. 149(1), 3-17. Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks 11

10. Crowston, K., K. Wei, J. Howison. 2012. “Free/Libre open-source software development: What we know and what we do not know,” ACM Comput. Surv. 44, 2, Article 7, 2012. 11. Csardi, G. and T. Nepusz, “The igraph software package for complex network research,” in InterJournal Complex Systems, 2006. 12. Cuny, F., (Mar. 25 2010), Github Explorer, Available at: 0;;8>>>G1+39+75807;7:.9)6+3 /4460144638/, 2010. 13. De Souza, C. R. B., Froehlich, J., and Dourish, P. (2005), “Seeking the Source: Software Source Code as a Social and Technical Artifact,E !97+  6;-96 76. #<8879;16/ 97<8 Work (GROUP 2005), 197-206, Sanibel Island, Florida. 2005. 14. Elliott, M. and Scacchi, W., (2003), “Free Software Developers as an Occupational Commu- $ ). (%"+ $ %$1 )( $ %()' $ %""%') %$E !97+  6;-96 76. #<8879;16/ Group Work, 21-30, Sanibel Island, FL, November. 2003. 15. Erd¨os, P. and A. R´enyi, “On random graphs, I,” Publicationes Mathematicae (Debrecen), vol. 6, pp. 290-297, 1959. 16. FLOSS (2002). , “Free/Libre and Open Source Software: Survey and Study, FLOSS Final Report,” 0;;8>>>G7::8972-+;79/9-879;   17. Ford, L. R. and D. R. Folkerson, “ ( #&" "%' )# %' 0$ $ #- #" $),%'! 1%,( and an application to the Hitchcock problem,E )6),1)6 7<96)4 7. );0-5);1+: =74  88 210-218, 1957. 18. Hagberg, A. A., D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using NetworkX,E 16 !97+--,16/: 7. ;0- ;0 !@;076 16 #+1-6+- 76.-9-6+- (SciPy2008), Pasadena, CA USA, 2008. 19. Howison, J. and K. Crowston. “The perils and pitfalls of mining SourceForge,” In Proceed- 16/: 7. ;0- 6;-96);176)4 '793:078 76 1616/ #7.;>)9- "-87:1;791-: #"   88  2004. 20. Howison, J., K. Inoue, and K. Crowston, “Social dynamics of free and open source team com- munications,E 16 !97+--,16/: 7. ;0- ! #-+76, 6;-96);176)4 76.-9-6+- 76 8-6 #7<9+- Systems, 319-330, 2006. 21. Howison, J., M. Conklin, and K. Crowston, “FLOSSmole: A collaborative repository for FLOSS research data and analyses,E 6;-96);176)4 7<96)4 7. 6.795);176 $-+06747/@ )6, Web Engineering, 1(3), 1726. 2006. 22. Huntley, C.L., (2003), “Organizational Learning in Open-Source Software Projects: An Analysis of Debugging Data,” IEEE Trans. Engineering Management, 50(4), 485-493. 2003. 23. Jergensen, C., A. Sarma, and P. Wagstrom. 2011. “The onion patch: migration in open source ecosystems,E 6 !97+--,16/: 7. ;0- ;0  ## $ :@587:1<5 )6, ;0-  ;0 <978-)6 +76.-9-6+- 76 7<6,);176: 7. :7.;>)9- -6/16--916/ ## F  -> (793 ( USA, 70-80. 2011. 24. Koch, S. (Ed.), (2005), “Free/Open Source Software Development,” Idea Group Publishing, Hershey, PA. 2005. 25. Kuechler, V., C. Gilbertson, and C. Jensen, “Gender Differences in Early Free and Open Source Software Joining Process,” Open Source Systems: Long-Term Sustainability, pp. 78- 93. 2012. 26. Kunegis, J., S. Sizov, F. Schwagereit, and D. Fay, “Diversity dynamics in online networks,” 16 !97+--,16/: 7. ;0- 9,  76.-9-6+- 76 @8-9;-?; )6, #7+1)4 -,1) 14>)<3-- Wisconsin, USA, 2012. 12 A. Azarbakht and C. Jensen

27. Leskovec, J., Kleinberg, J., and Faloutsos, C.: “ )( * '-) +!% &*!2+!'& $.* * )!&# !& !%+)* & ('**!$ /($&+!'&*” in Proceedings of the SIGKDD International Con- ference on Knowledge Discovery and data Mining, 2005. 28. Leskovec, J., K. J. Lang, A. Dasgupta, and M. W. Mahoney, “++!*+!$ ()'()+!* ' '%%, &!+0 *+),+,) !& $)  *'!$ & !&')%+!'& &+.')#*” in Proceedings of the 17th Interna- tional Conference on World Wide Web (WWW ’08), ACM, New York, NY, USA, 695-704. 2008. 29. Lu, J. and D. Li, “%($!& '&$!& *'!$ &+.')#* 0 )&'% .$#” in Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, ACM, New York, NY, USA, 33-40. 2012. 30. Madey, G., Freeh, V., and Tynan, R., (2002), “  (& ',) -$'(%&+  &'%&'& & &$0*!* * '& '!$ +.')#  ')0” Proc. Americas Conf. Info. Systems (AM- CIS2002), 1806-1813, Dallas, TX. 2002. 31. Madey, G., Freeh, V., and Tynan, R., (2005), “'$!& +   '%%,&!+0  ,&+! ++!- &-*+! +!'&” in S. Koch (ed.), Free/Open Source Software Development, 203-221, Idea Group Publishing, Hershey, PA. 2005. 32. Nyman, L. , “&)*+&!& ' ')#!& !& '(& *',) *'+.)” in Proceedings of the 7th International Conference on Open Source Systems Doctoral Consortium, Salvador, Brazil, 2011. 33. Nyman, L., T. Mikkonen, J. Lindman, and M. Foug`ere, “ ')#!&  +  !&-!*!$ & ' *,* +!&!$!+0 !& '(& *',) *'+.)” in Proceedings of SOS 2011: Towards Sustainable Open Source, 2011. 34. Oh, W. and S. Jeon, “%)* !( 0&%!* & &+.')# *+!$!+0 !& +  '(&*',) '%%, &!+0 +  !*!& ()*(+!-” in Proceedings of the International Conference on Information Systems, 2004. 35. Ovaska, P., Rossi, M. and Marttiin, P. (2003), “) !++,) *  '')!&+!'& ''$ !& ,$+! !+ '+.) -$'(%&+” Software ProcessImprovement and Practice, 8(3), 233-247. 2003. 36. Robles, G. and J. M. Gonzalez-Barahona, “ '%() &*!- *+,0 ' *'+.) ')#* +* )*'&* & ',+'%*” in Proceedings of the 8th International Conference on Open Source Systems, Hammamet, Tunisia, 2012. 37. Rocchini, C. (Nov. 27 2012), Wikimedia Commons, Available: http://en.wikipedia.org/wiki/File:Centrality.svg, 2012. 38. Scacchi, W., (2004), “ )(& ',) '+.) -$'(%&+ )+!* !& +  '%(,+) % '%%,&!+0” IEEE Software, 21(1), 59-67, January/February. 2004. 39. Scacchi, W., (2007), “ )(& ',) '+.) -$'(%&+ &+ *) *,$+* & + '*” in M.V. Zelkowitz (ed.), Advances in Computers, 69, 243-295, 2007. 40. Smith, M. A., B. Shneiderman, N. Milic-Frayling, E. Mendes Rodrigues, V. Barash, C. Dunne, T. Capone, A. Perer, and E. Gleave, “&$01!& *'!$ %! &+.')#* .!+ '” in Proceedings of the 4th International Conference on Communities and Tech- nologies (C& T ’09), 2009. 41. Sowe, S., L. Stamelos, and L. Angelis, “ &+!0!& #&'.$  )'#)* + + 0!$ *'+.) & !&)!& #&'.$  !&  ()'"+*” Information and Software Technology, vol. 48, pp. 1025-1033, Nov 2006. 42. Torres, M. R. M., S. L. Toral, M. Perales, and F. Barrero, “&$0*!* ' +  ') % '$ !& (& ',) '%%,&!+!*” in Complex, Intelligent and Software Intensive Systems (CI- SIS), 2011 International Conference on, pp. 109-114. IEEE, 2011. Analyzing FOSS Collaboration and Social Dynamics with Temporal Social Networks 13

43. Wang, T., Y. Chen, Z. Zhang, T. Xu, L. Jin, P. Hui, B. Deng, and X. Li. “Understanding graph sampling algorithms for social network analysis,” in Distributed Computing Systems Workshops (ICDCSW), 31st International Conference on, pp. 123-128. IEEE, 2011. 44. Wilensky, U., “NetLogo,” in Center for Connected Learning and Computer-Based Modeling, Northwestern University. Evanston, IL., 1999. 45. Xu, J., Y. Gao, S Christley, and G Madey. 2005, “A Topological Analysis of the Open Souce Software Development Community,” in Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences - Volume 07 (HICSS ’05), Vol. 7. IEEE Computer Society, Washington, DC, USA. 2005. 46. Ye, Y. and Kishida, K., (2003), “Towards an understanding of the motivation of open source software developers,” Proc. 25th Intern. Conf. Software Engineering, Portland, OR, 419-429, IEEE Computer Society, May. 2003. 47. Zachary, W., “   !    !       ” Journal of Anthropological Research, vol. 33, no. 4, pp. 452-473, 1977. ! !

! ! ! !

! ! (This page intentionally left blank) From Community to Collective: the History of an Open Source Software Project

Clément Bert-Erboul PhD student in Sociology at Clersé Laboratory CNRS UMR 8019 Lille University, France [email protected]

Abstract. In our work, we focus on the correlation between some changes in contribution rules in a free software project and changes in the structure of the contributors’ network. By studying the archives of a mailing list in the Videolan project, we show how a community of engineering students has turned into a digital collective of voluntary contributors working in IT companies. The software development by students is characterized by a high turnover rate of contributors, and a rampant development in a small clique living in the same place. After, the network structure has changed with the arrival of worldwide IT professionals and more stable core contributors.

Keyword: social network, open source community, diffusion list, digital collective.

1 Introduction

We analyze the emancipation of an open source developers group', in relation to its birth context1. At the beginning the organization was led by members of personal networks and looked like a community. During this period some students of Ecole Centrale de Paris (ECP) developed and wrote pieces of the personal and experimental source code. Thereafter the group was composed by people working in the IT industry and are now linked through a more formal, stable and hierarchic organization. The rules of contributions are passed from the commitment of engineers student to the participation of professional developers based on professional standards. These transformations exhibit that the relations between various actors are recomposed during key moments in the life of the digital collective (Kilamo 2011, Bert-Erboul, 2012). During these periods, some norms and motivations of commitment (Lakhani and Wolf, 2003) are redefined, and determine whether the activity of the group is extended or the contributors stop it. By taking as an example the case of the nonprofit association VideoLAN, we study the changes in a group of contributors producing, via the Internet, an open software able to broadcast and play multimedia content (containing video and / or

1 I thank Catherine Darville for her review.

Please cite as: C. Bert-Erboul, 2013, From Community to Collective: the History of an Open Source Software Project, in: K. Stol, B. Lundell and G. Madey (Eds.) Proceedings of the OSS Doctoral Consortium. 16 C. Bert-Erboul sound). After the opening of the source code in 2001 the Videolan software becomes popular as it has been downloaded for free more than a billion times. Such popularity reveals the quasi-industrial distribution activity conducted by the free software movement (Weber, 2004). Sharing the source code has been a practice in existence since the beginning of the computer industry (Armer, 1980). In the 1950s, the writing and the use of software programs required varied technical skills often regrouped within a team of developers. Under these circumstances the flow of source code allows the coordination of programmers working on similar software. The exchange of the source code between software developers was formalized in the late 1980s by free licenses (Stallman et al., 2011). The use of free licenses ensures that developers are allowed to copy, study, modify and redistribute the software and forbids the individual ownership of the work collectively done. These trading rules are largely maintained through the hobbyist activities outside the economic system mainly based on the closing software source code. Computer clubs (Levy, 1984) have played an important role in the software economy by institutionalizing and legitimizing the open source code production. VideoLAN developers follow this tradition by producing an open source software massively used by individuals and businesses on the Internet. By a historical study of the nonprofit organization and an analysis of the relationships generated by the exchange of messages between 2001 and 2011 on a mailing list, we show some changes in the network linking the contributors. At first we present elements to define the term “community” and “social network” in Open Source Project. Afterwards we present the methodology to analyze the set of emails exchanged on a mailing list. In a third step, we show that the network of VLC developers was firstly a set of personal networks of students. Later this group became a bigger social network with multimedia professional engineers linked in a hierarchical structure.

2 Communities and Social Network in an open source project

Some social changes in the Videolan contributors’ network between 2001 and 2011 show the commitments (Becker, 1960) and the withdrawals (Fillieule, 2005) of the open source project developers. These back and forth successions of contributors are studied by some sociologists of the social movements. To answer Olson’s paradox (Olson, 1965), the researchers analyze the influence of the context to understand why people participate to a collective action (Tilly, 1978) without a direct reward. In the VLC case, series of test/ errors between participants are close to the thesis of the actor- network theory. Michel Callon in his research on the scientific and economic systems reveals how profit sharing, the mechanisms of enlistment and betrayal, leads the actors to enter or leave the collective action (Callon, 1986). With the ethnographic approach From Community to Collective: the History of an Open Source Software Project 17 of the actor-network theory we study the changes in the social-technical groups, and we analyze if some heterogeneous contributors define a coherent set of behaviors.

2.1 The Videolan community

The unit, the longevity and growth of VideoLAN are symptomatic of this coherence. Indeed, the Videolan collective had never forked. To explain the success and continuation of these remote organizations, social sciences are observing the governance arrangements of these organizations (O'mahony and Ferraro, 2007). The strict technical conventions governing the writing of the source code imply the existence of a strong social control among the programmers. Sociological and psychological literacy uses the term “community” (Proulx and Latzko-Toth, 2005) to describe the remote groups organizing on the Internet. In the context of the Ecole Centrale de Paris, the term "community" (Rheingold, 1993) describes quite well the VideoLan activities. The students were a small population composed of individuals, attached to each other by strong institutional and emotional links. The system of solidarity in the building of the ECP imposes on individuals a daily social control. For these students, to quit the Videolan collective and the ECP would be catastrophic for their personal identity. However, the relevance of the community term is limited to the ECP.

2.2 The Videolan collective

The social network analysis on the Internet showed transformation in the relationship in a group are built by different type of ties (Wellman, 1979) such as friendship, family, neighborhood or work for example. Our study shows that the group of developers was more or less at the beginning a set of personal network of students in an engineering school. These developments were crowned by institutional prize2. After, the digital collective became a more structured and hierarchical social network with a large and fickleness set of contributors from different backgrounds. Hence the term "virtual community" seems wrong to account for the type of relationship caused by the participation in software development. Indeed, within the sociological sense of community (Tönnies, 1957), leaving the group is socially hard, because the outgoing people are losing a part of their identities. The core contributors of VideoLAN have already been studied. Thomas Basset (Basset, 2005) produced a monograph on this group of contributors to the ECP between 2001 and 2003. The research highlights the paradox existing between the use of free licenses and the difficulty for information to flow in a group confined in a boarding school. In this study, the author presents as a collegial social structure the

2 The IBM Linux Challenge in 2001, the Apple Design Award in 2003. 18 C. Bert-Erboul

VideoLAN organization (Waters, 1989). In this network a strong technical component centralizes the expertise around a small clique of participants. The class mates turnover tends to reinforce the specialization of some contributors, who each year develop new features. Our study completes this previous research. Since 2006, the VideoLAN project has no longer been hosted by the Ecole Centrale de Paris3, and contributors from all backgrounds have contributed in the project. The maintenance of the activity is accompanied by a partial rewriting of the source code in accordance with professional technical conventions. The annual renewal of contributors within the ECP had created inconsistencies in the source code. Indeed the successive experiments had rarely been updated by the following generations of students. The redesign of the code by professionals follows the technical and legal agreements allowing contributors from the multimedia professions to make comments or contributions without additional cognitive costs. Finally the Videolan collective has gone through various steps in its organization. At the beginning the developers were student and they constructed a sort of community but it was recomposed each year. The activity was popular all around the world. Then the group became an international digital collective of people who work in the IT industry. With a social network analysis we will show that the rules of participation are not the same in both organizations.

3 Methodology

To analyze the commitment in the collective, we decided to study the network of exchanged messages in a mailing list and interpret these results. To check the quantitative data we use the qualitative knowledge (Seaman 1999) of the Videolan collective, built on interviews and observations during development days between 2010 and 2012. Having presented the method of data extraction, we will clarify the meaning given to the degree of centrality obtained by our protocol.

3.1 Data extraction

We study the exchange of messages between February 2001 and December 2011 in cohorts of 12 months4. These messages were exchanged by the participants of the list "VlcDevel" dedicated to the development of the media player software VLC, the most famous project of the Videolan collective. The messages on the list concern the use of the software, source code changes, or more rarely the distribution of rights between contributors. The archive retrieved is public5, and represents 90,860 messages

3 The growth of VLC user's number increases the technical bandwidth costs. 4 Except for the first year which count 11 months. 5 http://mailman.videolan.org/pipermail/vlc-devel/ From Community to Collective: the History of an Open Source Software Project 19 exchanged between 2,944 different email addresses. The subscription to the list is open and free. To compose the corpus each year, we take into account all the individuals who have posted at least one message on the list. Every January 1st we restart our counting. The use of mailing lists creates a relationship between the subscribers, based on semi-private asynchronous exchanges. By sharing the contents the contributors provide a reusable cohort of information (Gensollen, 2004) and enable the improvement of information. The subscribers to the list do not exclude each other in a discussion. The presence of a contributor does not reduce the possibility to another contributor to read the message at the same time. On the list, messages written in English are unmoderated and the objects of threads concern few subscribers, although all subscribers can read and participate in the discussion.

3.2 Study of the centrality of degree

After downloading the online archive we converting it to a matrix format to reconstruct the networks of the participants in the annual mailing list6. To study the interactions in the discussions we established the relationships between the participants with different topics of discussion (threads). From this bimodal network (a two mode network), contributor / thread / contributor, we decided to connect with each other all participants involved in the same thread in a unimodal undirected (a one mode network) network contributor/contributor. The centrality of degree (Freeman, 1979) obtained represents the number of contacts an individual contributor is connected with. A contact is labeled by an email address and represents a human being or bots. If no one responds to the message of an individual, the centrality of the participant will be 0. If a contributor is connected to another contributor, its centrality increases by 1 for each new contributor contacted. If the contributor responds to its own message (loop), its centrality increases once by 2 contacts. This quantification of centrality in a sociotechnical network enables to taking into account the relationships which are often ignored between machines and human beings. Indeed the mailing list contains a lot of computer-generated emails. By sending a large number of messages, the bots participate in the complexity of the network. However, the bots depend on the human beings activity. So we did not want to give them a disproportionate importance. Despite the large amount of messages they distribute, the bots have a degree of centrality of 2. The human beings do not return directly to these subjects which are automatically generated. These threads are updates that follow other updates or series of error messages. Thus, the network nodes represent human actors (n = 2941) and bots (n = 3).

6 I thank Daniel Caillibaud to have written the program of data preparation 20 C. Bert-Erboul

To study and quantify the commitment in a mailing list, we decided that the messages had equivalent values. We did not detail the multiplexity of relationships. Thus, a “thank you” weight as much as a longer message, a bug report has the same weight as a bug fix. If all the messages are equal, all the contributors have also the same weight. A member of the association is as important as a non official member. To preserve the centrality and hierarchy of the contributors we refined email addresses (Bird et al., 2006) as much as could7. We merged the various email addresses of the same contributor when they were readable. Thus, counting all the contacts we preserve the qualitative nature of relationship of a contributor. A contributor will be more central if he is in contact with different people with whom he will discuss various topics. Our analysis focuses on the degree of centrality distribution in the group, that is to say the number of different contacts between peers which exists in the collective. This diversity reflects a part of the collective’s activity. The changes in the number of contacts between the contributors have concomitant variations with changes in the organization of the collective. We chose as changing period, the departure of the Videolan developers from the Ecole Centrale de Paris in 2006. After this period the technical rules governing the digital collective are renegotiated by the contributors. To observe the changes, we analyze the levels of commitment in the mailing list participants. By classifying the contributors' degree of centrality each year we study how the most active contributors are organizing.

4 An OSS social network

By studying the distribution of centrality in the group of participants to the list of VLC developers, we find three groups of contributors often mentioned in the literature (Crowston and Howison, 2005) and by the developers themselves. In the center, there are the most active contributors. Around this core there are less regular contributors. In the periphery we find the sophisticated users of the software. Our longitudinal study shows that the distribution of the contributors within these three groups varies in parallel with the changes in the collective. The rewriting of some part of the source code (since the departure of the group of ECP) has seemed to have encouraged the dialogue between the contributors by increasing the number of contacts within the active contributors. In addition, the transition from a student activity to an activity of IT professionals tends to increase the difference between the most central actors in the network and the least connected one. During all the observed period the less linked periphery represents the largest share of the population present on the list.

7 Some contributors encrypt their address, others use symbols not supported by the archiving system From Community to Collective: the History of an Open Source Software Project 21

4.1 The center of the network

In the center of the group, we distinguish a core of contributors with a large number of contacts. In our study we put in this class the contributors having between 7 and 90 contacts. The whole gap of this interval is due to the growth of the developers group. This evolution shows that contributes in an emerging open source project is different from contributes in an international one. This large scale of magnitude indicates the steps of the project at different maturity stages. This core of contributors represents between 8% and 13% of the workforce and centralizes 40 to 50% of existing relationships in the network each year. Over time the core's members have changed. However, the staff turnover in this category slows sharply after the departure from ECP. Between 2007 and 2011, 15 contributors remained active in the community, while during the first five years (2001-2005) only six contributors were present for several years on the list, and among them, only three were active each year. The regular change of contributors during the ECP in the VideoLAN collective is related to the successive class mates. This generational rotation underscores the student dimension of the software related to the training of young engineers. The rotation of participants is not related to a mathematical phenomenon of exclusion of one group by another. The number of subscribers is theoretically unlimited. The increase in the number of contacts does not question the centrality of others. Instead, the arrival of new contributors tends to increase the centrality of the remaining individuals. The commitment and withdrawal of the contributors in the collective indicates that they are not driven by a common purpose or by a systematic search of prestige. When the VideoLAN project was hosted by the Ecole Centrale Paris (2001 to 2006) there was no significant variation between the participants with little or no contact and the contributors with a broad range of contacts. Between 2001 and 2006, the contacts standard deviation was stable at 3.5. However, the differences in levels of commitment of the most central contributors has increased from 2007, and the deviation from the average has fluctuated between 4.5 and 7.5 after the departure of the ECP group.

4.2 Active contributors

Around the core's contributors, we defined a group of participants whose members had between 3 and 6 contacts. The group was stable over time and had about 10% of the workforce during the ECP period and 12% in the next period. Between 2001 and 2006, this intermediate category represented 20% of the relationships of the network. From 2007, the proportion of existing relationships in the network held by the middle circle has been between 25 and 30%. The increased involvement of the members corresponds to the period of the source code formatting with professional standards (Fig. 1 and 2). The technical standardization is linked with the increase of contributors' 22 C. Bert-Erboul discussions built on common conventions (Favereau, Biencourt, et al. 2002) of computer programming. Thus, the transformations of the commitment device give rise to remarks from diverse stakeholders.

Figure 1. Network 2006 VLCdevel contributors to the list with a score of degree centrality of at least 3 (N = 57 or 13% of the population in 2006).

Figure 2. 2009 Network contributors VLCdevel the list with a score of degree centrality of at least 3 (N = 95 or 17% of the population in 2009)

From Community to Collective: the History of an Open Source Software Project 23

4.3 A wide fringe

In the network, the majority of contributors in the mailing list are not highly interconnected. Each year, more than 80% of the workforce in the list has had between 0 and 2 contacts. When the software was hosted at ECP this population represented about 35% of network relations. This proportion decreased due to the arrival of IT workers in the digital collective. We see that users have has few or no contact but they have concentrated 25% of network relations. The low integration in the network is partly due to the fickleness of the contributors. Between 2001 and 2011 about 50% of the email addresses indexed had no contact with other participants of the list. In other words, each year half of the participants list received no response to their message, and did not participate to another subject. This lack of relationship has different explanations. The original messages may call no response. Other threads can be opened in parallel on the same issue. Sometimes, the contributor doesn't know any answer on the proposed topic. This part of the network with a high proportion of solitary individuals, kept away open source developers from a "community" definition. In fact, a community in the sociological sense is not "disconnected" from half of its members. This device with solitary people is important for the collective insofar as they relay bugs, ask questions on the use of the software and materialize the success of the software on the Internet. This continuous flow of information about the VLC software produced by heterogeneous actors gives a panoptic vision about the software, promoting the establishment of a shared and coherent representation of the contribution activity. In the digital collective the independence of the individuals is also expressed by the large number of egocentric messages (loops discussion). Contributors update threads wherein they have already spoken. Each year, more than 90% of messages link the author with himself. This emphasizes both the involvement of contributors and the strong specialization in the software project. The original message can be left to the review of others subscribers of the list. The contribution is updated thereafter by successive improved versions. Loops show that the mediated and asynchronous public dialogue is a paradoxical way to talk with itself. This allows a glimpse of the relationship between individualistic practices and collective action on the Internet.

5 Conclusion and Discussion

The network of open source software developers is built by three groups well identified by the scientific literacy: core, active, fringe. The fringe contributors’ area seems to be attracted by the advanced functionalities of the software without a dialogue with the developers to improve the source code. In the two other groups, the 24 C. Bert-Erboul contributors have established conventions marked by a technical language included in the source code. In these three groups we analyze the changes of the organization of the social network. Firstly the group of developers was composed by some personal networks of students living in the same boarding school. This group composed a kind of community renewed each year. Afterwards developers were regrouped in an international social network of IT workers. During this second period, the developers formed a digital collective with a hierarchical structure and shared standards. Our study shows that the mode of organization of a group depends on the context. The identification of changes in the collective's network by stages of maturity highlights a methodological point. Indeed, if groups of open source developers groups change how can they be compared? We believe it is necessary to make a story of each group to evaluate the heuristic scope of comparison. Our text shows that a network of community and a network of digital collective have not the same structural network characteristics.

Bibliography

Armer, P., 1980. SHARE-A Eulogy to Cooperative Effort. Annals of the History of Computing 2, 122-129. Basset, T., 2005. Coordination and social structures in an open source project: Videolan., in: Koch, S. (Ed.), Free/Open Source Software Development. Idea Group Pub. Becker, H.S., 1960. Notes on the Concept of Commitment. American Journal of Sociology 66, 32-40. Bert-Erboul, C., 2012. Dynamics and organization in a digital collective : the case of an open educative publishing project, in: Klaas-Jan Stol., C.M.S., Imed Hammouda., (Ed.), Department of Software Systems. Tampere University of Technology, pp. 37, 48. Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A., 2006. Mining email social networks, Proceedings of the 2006 international workshop on Mining software repositories. ACM, Shanghai, China, pp. 137-143. Callon, M., 1986. Some elements of a sociology of translation: domestication of the scallops and the fishermen of St Brieuc Bay. Power, action and belief: A new sociology of knowledge 32, 196-233. Crowston, K., Howison, J., 2005. The social structure of free and open source software development. First Monday 10. Favereau, O., Biencourt O., et Eymard-Duvernay F., 2002. Where do markets come from? From (quality) conventions! in Conventions and Structures in Economic Organization: Markets, Networks, and Hierarchies. Fillieule, O., 2005. Le désengagement militant. Belin. Freeman, L.C., 1979. Centrality in social networks conceptual clarification. Social networks 1, 215-239. Gensollen, M., 2004. économie non rivale et communautés d'information. Réseaux 124, 141- 206. From Community to Collective: the History of an Open Source Software Project 25

Kilamo, T. 2011. Essential Properties of Open Development Communities. Salvador, Brazil, 20. Lakhani, K., Wolf, R., 2003. Why hackers do what they do: Understanding motivation and effort in free/open source software projects. Levy, S., 1984. Hackers: Heroes of the Computer Revolution. O'Reilly Media. O'Mahony, S., Ferraro, F., 2007. The emergence of governance in an open source community. Academy of Management Journal 50, 1079-1106. Olson, M., 1965. The logic of collective action: public goods and the theory of groups. Harvard University Press. Proulx, S., Latzko-Toth, G., 2005. Mapping the Virtual in Social Sciences: On the Category of "Virtual Community". Rheingold, H., 1993. The virtual community: homesteading on the electronic frontier. MIT Press. Seaman, C. B. 1999. Qualitative methods in empirical studies of software engineering. Software Engineering, IEEE Transactions on, 25(4), 557-572. Stallman, R.M., Williams, S., Masutti, C., 2011. Richard Stallman et la révolution du logiciel libre: Une biographie autorisée - Une initiative Framasoft. Eyrolles. Tilly, C., 1978. From mobilization to revolution. McGraw-Hill New York. Tönnies, F., 1957. Community & society. Courier Dover Publications. Waters, M., 1989. Collegiality, Bureaucratization, and Professionalization: A Weberian Analysis. American Journal of Sociology 94, 945-972. Weber, S., 2004. The Success of Open Source. Harvard University Press. Wellman, B., 1979. The community question: The intimate networks of East Yorkers. American Journal of Sociology, 1201-1231. ! !

! ! ! !

! ! (This page intentionally left blank) Open-platforms: Multiple case-studies in the mobile devices industry

Jose Teixeira University of Turku Turku Centre for Computer Science EIT ICT labs [email protected] WWW home page: http://www.jteixeira.eu/

Abstract. This doctoral student research addresses the emergence of open- source based platforms currently attaining notorious relevance within the mobile devices industry. It aims at providing a better understanding on “how and why corporations follow open-source based platform strategies?” within high- networked markets under constant technological evolution. Drawn over the constructs of multi-sided platforms (Rochet & Tirole, 2003; Hagiu & Wright, 2011) and plenty of studies on open-source software adoption such as Lerner and Tirole (2002), Feller and Fitzgerald (2002) and Weber(2004), this research also bridges business strategy in network economics (Gawer & Henderson, 2007; Gawer & Cusumano, 2008) with the psychological and anthropological concepts of sense of community (SoC) and communities of practice (CoP).The researcher follows a theory testing approach, where established theories developed on previous decades over the Personal Computer and Operating Systems are replicated in a completely new and emergent context. By taking a pure qualitative case-study approach on the molds of Eisenhardt (1989) and Yin (2008 , the researcher developed a set of descriptive and in-depth case-studies within he mobile devices industry, answering the lack of empirical research on strategies used by open-source platform providers. Natural occurring data was collected from Internet s and semi-structured interviews were conducted within an industrial key-player with the goal of providing a framework and a theory that would extend what is known on open-source products to open-source platforms. Practitioners are then provided with better guidance for designing and implementing their networked technological strategies, to let their computer- based platforms succeed in high competitive under network effects.

Keywords. OSS, FLOSS, Open-source, Platforms, Technological-strategy

Please cite as: J. Teixeira, 2013, Open-platforms: Multiple case-studies in the mobile devices industry, in: K. Stol, B. Lundell and G. Madey (Eds.) Proceedings of the OSS Doctoral Consortium. 28 J. Teixeira

1 Introduction

The research author seeks to complement the existing body of theoretical knowledge on open-source software, drawn mostly from research on open- source software products, by pointing his lenses to software platforms. With a purposive-focus on the telecommunications industry, the researcher studies the emergence of mobile open-source platforms such as Maemo, Meego, Android and Bada empowering big corporations such as Google, Nokia, Intel and Samsung within the high-competitive mobile devices market. Computer-based platforms combine core components with complementary products and services habitually made by a variety of external entities (Gawer & Cusumano 2008). In the case of high-technology and high-competitive markets, many firms follow a platform-based strategies due to the impossibility of satisfying by themselves an exceedingly complex consumer group (Hagiu 2004).

2 Research questions

The aim of this research study is to investigate the strategies used by open-source platform providers. Its core objective is to provide a better understanding on how companies like Apple, Google and Nokia integrate open-source software technological components under into their platforms empowering proprietary mobile devices. A set of research questions include:

 RQ1: What are the open-source software technological components integrated by the mobile devices vendors?  RQ2: How open-source software affects the competitive dynamics of the mobile devices industry?  RQ3: How rival platform-vendors collaborate in the open-source arena?  RQ4: Why companies follow open-source platform-based strategies? The research questions follow and What, How and Why logic. The researcher strategically addressed first the What and How questions, without rushing to answer to the final Why question (RQ4).

3 Literature review

This research reviews multi-disciplinary literature, mostly within the information systems, economics and strategy research streams. The research author pursued to Open-platforms: Multiple case-studies in the mobile devices industry 29 cover the current body of theoretical knowledge addressing the open-source software (OSS) phenomena, as addressed by Lerner and Tirole (2002), Feller and Fitzgerald (2002) and Weber (2004); research addressing computer-based platforms and multi- sided platforms (Rochet & Tirole, 2003; Hagiu, 2004; Hagiu & Wright, 2011); and research bridging business strategy and network economics (Gawer & Henderson, 2007; Gawer & Cusumano, 2008). Regarding the review of OSS literature, a crucial part of this research, the author identified several recent and comprehensive literature reviews addressing the OSS phenomena (Stol & Babar 2009; Aksulu & Wade 2010; Hauge et al. 2010; Lindman 2011). A notable systematic literature review by Stol and Babar (2009), covering specialized conference proceedings, pointed-out the heterogeneity and lack of empirical studies dealing with OSS in organizations. Another systematic literature review from Hauge et al. (2010), confirmed the heterogeneity in which companies approach OSS prompting lack of empirical research on OSS adoption in organizations. Lindman (2011) extended the two previous mentioned reviews, in yet another systematic literature review covering both open-source journals (Hauge et al. 2010) and top Information Systems journals (Rainer & Miller 2005). He argues that current body of knowledge emphasis on OSS community-driven development with limited interest on OSS within organizations. Last but not the least, yet another literature review by Aksulu and Wade (2010) paused and reflected on the state of open-source research; analyzed and categorized a wide-set of open-source research; and proposed a framework to situate OSS research within a wider nomological network while proposing future directions for open-source research. Besides the existence of the previously reported recent, comprehensive and systematic literature reviews on OSS, and in order to review relevant theoretical knowledge on other relevant constructs such as computer-based platforms and multi- sided platforms (Rochet & Tirole, 2003; Hagiu, 2004; Hagiu & Wright, 2011), business strategy and network economics (Gawer & Henderson, 2007; Gawer & Cusumano, 2008), the author was forced to perform his own systematic literature review. From the beginning of the doctoral project, the author kept reviewing by systematic manners a literature review bridging OSS with platforms. This in-progress literature review, based on search keywords and research databases, was initially defended at the Information Systems Research Seminar in Scandinavia (IRIS) seminar in 2011, and submitted as a working paper to the European Conference on Information Systems (ECIS) in 2012. The research author identified a set of emergent research, with stronger focus on platforms over products, that highlighting the limited empirical research on strategies used by open-source platform providers on industry. For concluding this chapter, from reviewing a very wide body of theoretical knowledge, the author identified a limited but emergent research stream addressing 30 J. Teixeira both OSS and platforms. If the scholars Stol and Babar (2009), Hauge et al. (2010) and Lindman (2011) argue, on a common voice, that current research lacks empirical studies bridging OSS with organizations, the author argues that current research lacks even more empirical studies bridging OSS platforms with organizations.

4 Theoretical background

This study lays down on previous research built around the concept of multi-sided platforms as introduced by Rochet and Tirole (2003) and further developed by Hagiu (2004), Rochet and Tirole (2006) and Boudreau (2008), the concept of computer- based platforms first coined by Pohl (1998); and key studies on OSS adoption (Wang & Wang 2001; Feller & Fitzgerald 2002; West 2003; Dedrick & West 2004). With a strong multidisciplinary research character, constructs from outside the perceived boundaries of the information system research were included; Such as the Gawer and Cusumano (2008) strategy tool kit for firms pursuing platform based operational models and the work from Katz and Shapiro (1994) who looks at the role of network effects in the telecommunication industry in a pure network economical context. From philosophy, the theoretical propositions of sense of community (SoC) from McMillan and Chavis's (1986) were adopted, as well as the concept of community of practice (CoP) as proposed by the cognitive anthropologists Lave and Wenger (1991).

5 Methodology and scope

5.1 Epistemological Ontologies and Method

The research author, committed to an article-collection dissertation strategy, did not follow a single epistemological and ontological trajectory. Rather than choosing a particular research approach apriori of the field work, the researcher adopted different approaches for each paper while pursuing epistemological consistency on each individual research project. The author took the stance of a pluralist (Mingers, 2004) employing multi- dimensional research strategies (Mason, 2006) that is, accepting and celebrating a wide variety of paradigms and research approaches on the same topic (Mingers, 2001), in this case the integration of OSS by platform-vendors in the emergent mobile devices industry. Open-platforms: Multiple case-studies in the mobile devices industry 31

Driven by What, How and Why research questions, the approach of this research is mostly qualitative and most original articles are interpretative (Klein & Myers 1999; Walsham 1995). The research was conducted according to ethnographic principles that empower data collection with limited influence from the established body of theoretical knowledge (Myers 1999; Atkinson 2006). The author was more interested by grounding theories rather than building variance based models (Benbasat et al. 1987; Myers 1997; Markus & Robey 1988; Urquhart & Fernández 2013). By taking and ethnographic approach, the theoretical integration was performed a posteriori of the field work and data collection. The researcher built its methodological awareness both by attending doctoral-level courses and seminars and by reviewing three books in the following order Silverman (2009) , Eriksson & Kovalainen (2008) and Myers (2002), all wide-recognized methodology book on qualitative research applied on social sciences, business studies and Information Systems. The following Table 1 summarizes the research questions, methods and methodological sources driving each individual research project, as reported in more detail on the author's article collection bundled in doctoral dissertation.

Table 1. Methods employed in each article/research-question

Article Question Method References 1 What do we Literature review Webster & Watson 2002 know about Järvinen 2008 open-source and platforms? 2 Measuring the Literature review Webster & Watson 2002 openness of Järvinen 2008 computer-based multi-sided platforms 3 RQ1 Descriptive case-study Yin 1989 4 RQ2 Descriptive case-study Yin 1989Eisenhardt 1989 Kozinets 2002 5 RQ3 Ethnographic case- Eisenhardt 1989 study Myers 1999 6 RQ4 Interpretative case- Eisenhardt 1989 study Darke et al. 1998

32 J. Teixeira

Table 2. Data-collected for each article/unit-of-analysis

Article Question Unit of Analysis Collected data Article 1 What do we Body of Research papers retrieved from know about theoretical research databases indexing open-source and knowledge books, journals and conferee platforms? proceedings. Article 2 Measuring the Body of Research papers retrieved from openness of theoretical research databases indexing computer-based knowledge books, journals and conferee multi-sided proceedings. platforms Article 3 RQ1 Mobile devices Open-source software projects industry players source-code and documentation controlling the iOS, Android and Maemo platforms. Article 4 RQ2 Mobile devices Companies public financial industry players information, companies-press controlling the releases, generalist press and iOS, Android specialized press on technology and Maemo and mobile devices platforms. Article 5 RQ3 The WebKit Open-source software projects open-source bug-repositories and the change- software project log from an open-source software project version-control- system. Article 6 RQ4 Major firm 8 Semi-structured interviews: within industry Software developers, Testers, steering an open- Integrators, Marketing and source based Strategy mobile platform

Open-platforms: Multiple case-studies in the mobile devices industry 33

Table 3. Data-analysis and theory-building approaches within each article/research- question

Article Collected data Approach Methodological references 1 Research papers retrieved Analysis of literature Webster & Watson 2002 from research databases review Järvinen 2008 indexing books, journals and conferee proceedings. Vom Brocke et al. 2009 Okoli, K Schabram 2010 2 Research papers retrieved Analysis of literature Webster & Watson 2002 from research databases review Järvinen 2008 indexing books, journals and conferee proceedings. Vom Brocke et al. 2009 Okoli, K Schabram 2010 3 Open-source software Multiple-case study Yin 1989 projects source-code and Description of OSS Eisenhardt 1989 documentation components integrated by Apple, Nokia, Google

4 Companies public financial Multiple-case study Yin 1989 information, companies- 3 rich description of the press releases, generalist OSS strategies employed press and specialized press by Apple, Nokia, on technology and mobile Google devices Cross-case analysis 5 Open-source software Network Visualization Elias 1978 projects bug-repositories Figuralisation Callon 1986 and the change-log from an open-source software Heterogeneous Network Walsham 1997 project version-control- Analysis Cambrosio et al. 2004 system. Actor-Network Theory 6 Semi-structured interviews: Categorization Benbasat et al. 1987 Software developers, Grounded theory Eisenhardt 1989 Testers, Integrators, Marketing and Strategy Darke et al. 1998 Urquhart & Fernández 2013 34 J. Teixeira

5.2 Data collection

Data was collected mostly from public-available and naturally-occurring qualitative data on the Internet such as: listed-companies public financial information, companies- press releases, generalist press, specialized press on technology and mobile devices, open-source software projects source-code and documentation, open-source software projects bug-repositories and the change-log from an open-source software project version-control-system. In addition, and for addressing the Why research question (RQ4), semi-structured interviews were conducted in a major firm within the Mobile devices industry. The included Table 2 sensitizes the research data-collection procedures employed at each individual research project. More details are provided on each article bundled with author's doctoral dissertation.

5.3 Data analysis and theory-building

Reflecting the pluralism employed on this research (Mingers 2001; Mingers 2004; Mason 2006) a multitude of qualitative data-analysis and theory-building approaches were employed. The Table 3 provides more details on the employed data-analysis and theory-building approaches and corresponding methodological sources that guided the research design. More detail is provided in each dissertation article.

References

Aksulu, Altay, and Michael Wade. “A Comprehensive Review and Synthesis of Open Source Research.” Journal of the Association for Information Systems 11, no. 11 (2010): 576–656. Atkinson, Paul. “Rescuing Autoethnography.” Journal of Contemporary Ethnography 35, no. 4 (2006): 400–404. Benbasat, Izak, David K. Goldstein, and Melissa Mead. “The Case Research Strategy in Studies of Information Systems.” MIS Quarterly (1987): 369–386. Boudreau, Kevin. “Opening the Platform Vs. Opening the Complementary Good? The Effect on Product Innovation in Handheld Computing.” The Effect on Product Innovation in Handheld Computing (August 24, 2008) (2008). Callon, Michel. “The Sociology of an Actor-network: The Case of the Electric Vehicle.” Mapping the Dynamics of Science and Technology 23 (1986). Cambrosio, Alberto, Peter Keating, and Andrei Mogoutov. “Mapping Collaborative Work and Innovation in Biomedicine: A Computer-assisted Analysis of Antibody Reagent Workshops.” Social Studies of Science (2004): 325–364. Darke, Peta, Graeme Shanks, and Marianne Broadbent. “Successfully Completing Case Study Research: Combining Rigour, Relevance and Pragmatism.” Information Systems Journal 8, no. 4 (1998): 273–289. Open-platforms: Multiple case-studies in the mobile devices industry 35

Dedrick, Jason, and Joel West. “An Exploratory Study into Open Source Platform Adoption.” In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference On, 10 pp. IEEE, 2004. Eisenhardt, Kathleen M. “Building Theories from Case Study Research.” Academy of Management Review (1989): 532–550. Elias, Norbert. What Is Sociology. Columbia University Press, 1978. Eriksson, Päivi, and Anne Kovalainen. Qualitative Methods in Business Research. SAGE Publications Limited, 2008. Feller, Joseph, and Brian Fitzgerald. Understanding Open Source Software Development. Addison-Wesley London, 2002. Gawer, Annabelle, and Michael A. Cusumano. “How Companies Become Platform Leaders.” MIT/Sloan Management Review 49 (2008). ———. Platform Leadership. Harvard Business School Press Boston, 2002. Gawer, Annabelle, and Rebecca Henderson. “Platform Owner Entry and Innovation in Complementary Markets: Evidence from Intel.” Journal of Economics & Management Strategy 16, no. 1 (2007): 1–34. Hagiu, Andrei. “Japan’s High-Technology Computer-Based Industries: Software Platforms Anyone?” RIETI Column October 19th (2004). Hagiu, Andrei, and Julian Wright. Multi-sided Platforms. Harvard Business School, 2011. Hauge, Øyvind, Claudia Ayala, and Reidar Conradi. “Adoption of Open Source Software in Software-intensive organizations–A Systematic Literature Review.” Information and Software Technology 52, no. 11 (2010): 1133–1154. Järvinen, Pertti. “On Developing and Evaluating of the Literature Review.” In The 31st Information Systems Research Seminar in Scandinavia (Workshop 3), 2008. Katz, Michael L., and Carl Shapiro. “Systems Competition and Network Effects.” The Journal of Economic Perspectives 8, no. 2 (1994): 93–115. Klein, Heinz K., and Michael D. Myers. “A Set of Principles for Conducting and Evaluating Interpretive Field Studies in Information Systems.” MIS Quarterly (1999): 67–93. Kozinets, Robert V. “The Field Behind the Screen: Using Netnography for Marketing Research in Online Communities.” Journal of Marketing Research (2002): 61–72. Lave, Jean, and Etienne Wenger. Situated Learning: Legitimate Peripheral Participation. Cambridge university press, 1991. Lerner, Josh, and Jean Tirole. “Some Simple Economics of Open Source.” The Journal of Industrial Economics 50, no. 2 (2002): 197–234. Lindman, Juho. Not Accidental Revolutionaries: Essays on Open Source Software Production and Organizational Change. Aalto University, School of Economics, Department of Information and Service Economy, 2011. Markus, M Lynne, and Daniel Robey. “Information Technology and Organizational Change: Causal Structure in Theory and Research.” Management Science 34, no. 5 (1988): 583–598. Mason, Jennifer. “Mixing Methods in a Qualitatively Driven Way.” Qualitative Research 6, no. 1 (2006): 9–25. McMillan, David W., and David M. Chavis. “Sense of Community: A Definition and Theory.” Journal of Community Psychology 14, no. 1 (1986): 6–23. Mingers, John. “Combining IS Research Methods: Towards a Pluralist Methodology.” Information Systems Research 12, no. 3 (2001): 240–259. 36 J. Teixeira

———. “Real-izing Information Systems: Critical Realism as an Underpinning Philosophy for Information Systems.” Information and Organization 14, no. 2 (2004): 87–103. Myers, Michael. “Investigating Information Systems with Ethnographic Research.” Communications of the AIS 2, no. 4es (1999): 1. Myers, Michael D. “Qualitative Research in Information Systems.” Management Information Systems Quarterly 21 (1997): 241–242. Myers, Michael D., and David Avison. Qualitative Research in Information Systems: A Reader. SAGE Publications, Incorporated, 2002. Okoli, Chitu, and Kira Schabram. “A Guide to Conducting a Systematic Literature Review of Information Systems Research” (2010). Pohl, Jens G. “The Future of Computing:< Em> Cyberspace.” Collaborative Agent Design (CAD) Research Center (1998): 35. Rainer Jr, R. Kelly, and Mark D. Miller. “Examining Differences Across Journal Rankings.” Communications of the ACM 48, no. 2 (2005): 91–94. Rochet, Jean-Charles, and Jean Tirole. “Platform Competition in Two sided Markets.” Journal of the European Economic Association 1, no. 4 (2003): 990–1029. ———. “Two sided Markets: a Progress Report.” The RAND Journal of Economics 37, no. 3 (2006): 645–667. Silverman, David. Doing Qualitative Research. SAGE Publications Limited, 2009. Stol, Klaas-Jan, and Muhammad Ali Babar. “Reporting Empirical Research in Open Source Software: The State of Practice.” In Open Source Ecosystems: Diverse Communities Interacting, 156–169. Springer, 2009. Urquhart, Cathy, and Walter Fernández. “Using Grounded Theory Method in Information Systems: The Researcher as Blank Slate and Other Myths.” Journal of Information Technology (2013). Vom Brocke, Jan, Alexander Simons, Bjoern Niehaves, Kai Riemer, Ralf Plattfaut, and Anne Cleven. “Reconstructing the Giant: On the Importance of Rigour in Documenting the Literature Search Process.” In Proceedings of the 17th European Conference on Information Systems, Verona, Italy, 2206–2217, 2009. Walsham, Geoff. “Actor-network Theory and IS Research: Current Status and Future Prospects.” Information Systems and Qualitative Research 466 (1997): 80. ———. “Interpretive Case Studies in IS Research: Nature and Method.” European Journal of Information Systems 4, no. 2 (1995): 74–81. Wang, Huaiqing, and Chen Wang. “Open Source Software Adoption: A Status Report.” Software, IEEE 18, no. 2 (2001): 90–95. Weber, Steve. The Success of Open Source. Vol. 368. Cambridge Univ Press, 2004. Webster, Jane, and Richard T. Watson. “Analyzing the Past to Prepare for the Future: Writing a Literature Review.” Mis Quarterly 26, no. 2 (2002). West, Joel. “How Open Is Open Enough?: Melding Proprietary and Open Source Platform Strategies.” Research Policy 32, no. 7 (2003): 1259–1285. Yin, Robert K. Case Study Research: Design and Methods. Vol. 5. SAGE Publications, Incorporated, 2008.

QA Practices and FLOSS Communities

Adina Barham Hitotsubashi University, Graduate School of Social Sciences, 2-1 Naka, Kunitachi, Tokyo, 186-8601, Japan [email protected]

Abstract. As FLOSS software is gaining more popularity, communities are realizing that a more rigorously controlled development process is necessary in order to produce high quality products. Such a process includes verification steps and implementing formal quality assurance (QA) procedures that are increasingly present in FLOSS development. These changes in development affect the community structure as new groups of contributors emerge. Although FLOSS communities have been the main focus of many studies, little research has been done on how this emerging ‘layer’ is integrated in the community structure and how it affects the overall dynamics. This study aims to start filling this gap by analyzing available communication data from communities that have formally adopted QA and develop a set of hypotheses. The conclusions reached so far, although provisory, suggest that quality assurance activity may not be increasing steadily over time but dependent on other factors. In addition, data collected points to a highly connected network that contains a large group of members communicating over various channels.

Keywords: quality assurance, test, social network analysis, information flow

1 Introduction

The development process behind FLOSS products has stirred a lot of interest because it seemed to contradict traditional software development by employing a more “bazaar” like approach to the development process. However, despite the fact that many still believe a somewhat mythical description of FLOSS [1], communities started adopting methodologies that were traditionally employed only in proprietary software development. The result is a new hybrid form that is impacting the software market offering a whole new array of possibilities [2-6]. As the user base of FLOSS products increased so did the demand for higher quality from FLOSS products. One of the consequences is a higher need for more complex development process that includes rigorous verification. For example, formal quality assurance (QA) used to characterize proprietary software development but it has been increasingly present within FLOSS. Due to the fact that communities themselves represent the most important resource in deciding the success or failure of a FLOSS 38 A. Barham project [7], it is important to study how implementing such a new phase affects their structure and dynamics.

2 Background and Motivation

One of the first steps of this research consisted in assessing the presence of QA procedures within popular FLOSS projects [35]. For this reason, the top 50 FLOSS software products ranked by number of downloads on www.ohloh.net were accessed and it has been discovered that almost one third have QA procedures present. It has been considered that QA is present if there are clear references to its existence (for example, mailing lists, IRC channels, webpages, wikis and so on) and they are easily identifiable. For the next phase of this preliminary assessment, the top 100 software projects ranked by number of users were analyzed. The conclusion remained almost the same as 27 projects had some form of QA present in the development process. In addition it has been noted that almost half of these projects had dedicated communication channels for members performing QA tasks. The remaining projects either had channels for contributors to post test results or other resources dedicated to describe QA within the project such as wikis or websites. The main conclusion that can be drawn from this preliminary research is that QA teams exist in communities where there are dedicated communication channels for members performing QA tasks. However, whether the members of these teams are exclusively performing QA tasks represents an issue that needs to be further studied. Verification, and more extensively quality assurance procedures under the FLOSS development model is a topic that attracts a lot of interest within the academic community and is reflected in both papers [8-15] and conferences. In addition, this trend is also reflected by the interest that various organizations and governments have shown by starting up initiatives related to quality within FLOSS such as the Qualipso1 or Qualoss2 projects. Due to the fact that communities represent the driving force behind FLOSS projects, researchers have also focused on aspects such as structure and dynamics [16- 18], communication patterns between core and periphery [19-21], or migration within the hierarchy of FLOSS projects [22]. Despite extensive research on both QA and communities little research has sought to link QA with the rest of the community. This research aims to start filling that gap and create a set of hypotheses that can be confirmed in later study.

1 Qualipso (Trust and Quality in Open Source Systems), http://www.qualipso.org/ 2 Quality in Open Source Software, http: //www.qualoss.org/ QA Practices and FLOSS Communities 39

3 Research Questions

Q1: What QA activities are present in FLOSS development?

Software quality assurance represents a set of complex activities that are applied throughout the development process. According to Pressman [23], quality assurance includes activities such as competent analysis, design, coding, and testing, as well as formal technical reviews, a multitiered testing strategy, better control of software work products and the changes made to them, and the application of accepted software engineering standards. The question becomes whether within FLOSS development processes one can find the same types of QA activities as in proprietary software.

Q2: What are the characteristics of QA activities and who is performing them?

QA activities can vary based on technical difficulty, for example users may provide automated test tools which might mean that QA might be divided into two subgroups based on activity type. In the case of , peripheral members (i.e. members of the periphery as defined in the onion model of FLOSS communities) perform some QA tasks such as posting bugs on the issue tracker and the percentage is 20 to 25% [20]. It would be interesting to compare peripheral involvement in other case studies or with other QA related activities.

Q3: How does a QA contributor fit into the community?

Despite the fact that research has shown that the onion model is more complex than previously thought [22], [24] it is generally accepted that FLOSS communities can be split into four layers: passive users, active users, co-developers and core developers. (Passive and active users are together included in the periphery). This research aims to investigate how the emerging QA teams fit into FLOSS communities. More specifically, it asks whether QA is established as a separate category or whether members that belong to other layers perform QA related tasks. Previous research has shown that participants ascend and descend the organizational hierarchy as well as move laterally to other tracks but it has also been suggested that many QA tasks are assimilated by other roles [22]. It would be interesting to note, if all projects taken into consideration share the same structure with respect to the QA contributors.

Q4: What are the communication patterns between QA members as well as with other project participants? 40 A. Barham

The goal of analyzing communication patterns between QA members is to find the central figures, or in other words members with high activity within the community and observe their evolution over time within the project. Another issue would be comparing central figures based on different data sets, for example central figures based on e-mailing activity and central figures based on issue tracker activity. Social networks are not static and the structure is continuously changing over time [25], which means that participant position within the network is dynamic. For this reason it would be useful to learn the path, or paths, which the central members took to get into that position. Furthermore, previous research has shown that information access correlates with productivity and participants who have better access to information are able to contribute more efficiently [26]. Hence it would be useful to know whether there are participants who control the information flow; such dependence on a few participants may be risky for the project’s long-term sustainability.

4 Data and Research Method

“The problem of quality management is not what people don't know about it. The problem is what they think they know” [27]. In other words, quality assurance (QA) is often misunderstood and is actually a process that requires a complex set of methodologies and procedures [23]. Considering the complexity of QA, an important step in this study was to define QA within the scope of this research. As a result, the following tentative QA definition was devised:

Testing, contributing code to automated testing tools or any test related activity, triaging bugs or any activity performed on the projects issue tracker, participating on the QA dedicated communication channels represent QA related tasks.

However, defining QA raises a circularity issue due to the fact that these activities emerged from analyzing publicly available data. It may be possible that some project participants perform other QA activities than the ones observed but are not active on communication channels. The first solution to this problem would be looking at the definition of QA within proprietary (traditional) development and comparing it with activities discovered within FLOSS communities. Another solution for addressing this issue would be method triangulation by confirming the definition through interviews with key members of the community at a later stage in this research. In addition, as one of the premises of this research relies on the correct identification of participants performing QA tasks, it is very important to use additional methods to confirm QA participants. Therefore interviews will also be used to clarify if there are private communication channels between community members. QA Practices and FLOSS Communities 41

This research uses five FLOSS projects as case studies in order to generate hypotheses about QA within FLOSS development. The question whether case studies provide sufficient evidence in order to generalize is still a controversial topic. However, it has been agreed that case studies, beside being used for validating existing theory can also provide enough insight in order to produce hypotheses that can be validated through other research methods [28]. A widely cited paper in FLOSS research that uses the case study approach is the study of Apache by Mockus et. al [29]. A set of hypotheses resulted after examining the development process within the Apache server community. Similarly, this research will follow the observational path as a research framework which means that the “goal is to collect a set of observations and to explain them in terms of a set of meaningful concepts'' [30]. This goal will be achieved by starting with the substantive domain or area of interest, which is QA within FLOSS, and a technique, which is the case study methodology. For this purpose, the following communities were chosen as case studies:  -- The community behind Mozilla produces successful FLOSS products such as Firefox and Thunderbird. It is a mature community that is well known for stable releases and a high quality standard.  Ubuntu Community -- The community behind Ubuntu produces an operation system (a Linux distribution). Similar to Mozilla is an established and mature community that is well known for stable releases and a high quality standard.  Plone -- Plone is a content management system. As opposed to the Mozilla and Ubuntu communities, the Plone community is rather small but it produces a very popular content management system.  KDE - the Plasma Desktop environment and many other products  LibreOffice - office software

Communication data was downloaded and stored into local PostgreSQL databases using Python scripts that perform “screen scraping” (web crawlers) or open source tools (for mailing lists exchanged between Plone community members)3. In addition, www.ohloh.net4 was used for data cleaning and further clarifying contributor roles as it contains code contributors’ usernames and nicknames. For example, a member might have more than one username/nickname and www.ohloh.net was used to merge these usernames. However, at this point data cleaning was conducted superficially as data download is not yet complete. An additional clean-up will be performed after all data sets have been downloaded into local databases. Social network analysis techniques are employed because they allow easy discovery of communication patterns, community evolution and user group affiliation and migration paths. In order to apply social network analysis techniques,

3 https://github.com/MetricsGrimoire/MailingListStats 4 https://www.ohloh.net/

42 A. Barham communities will be represented as graphs. Project participants will be represented as nodes (vertices) while interactions will be represented as edges (arcs). The number of interactions between participants will be represented as the weight (value) of the graphs’ arcs. For the purpose of this research interactions are defined as e-mails exchanged between community members, comments on the issue tracker or any kind of information exchange on a public communication channel. These interactions are defined using either quotation-based analysis [31] either thread-based analysis [16], [24]. Thread based analysis assumes that relations are created between members that are exchanging messages under the same thread. Quotation-based analysis assumes that relations are created between two members if one has quoted another member’s message in his own message. Quotation based analysis was used for mailing lists (except for Mozilla mailing lists) while thread based analysis was used for Mozilla mailing lists and issue tracker communication. Social networks are dynamic, with a structure that is continuously changing and for this reason, as previous research has shown, it is important to consider time frames [24], [32-33].

4.1 Mozilla

The Mozilla community was chosen as a first case study in order to help draw preliminary conclusions and create a framework for further research. Mailing list data was downloaded in July 2011. At that time, according to the Mozilla Quality Assurance (QMO) website5, there were 5 sub teams: Web QA, Desktop Firefox, Browser Technologies, Automation and Services6. Web QA, Desktop Firefox, Browser Technologies, Services teams used the mozilla.dev-quality mailing list while the Automation team used the Mozmill developer mailing list. data was acquired in February – March after which data cleaning was performed on both data sets by removing double posts, removing spam or normalising name formats. As expected, the QA mailing list was created much later than the issue tracker. For the purpose of analysing activity data should be normalised however, if activity pertaining to a certain period of time will not be taken into account then it might be possible to omit migration of certain members between layers of the community. The Mozilla.dev-quality mailing list data contains 2535 e-mails exchanged between February 2006 and June 2011 while the Mozmill developer mailing list data contains 1155 e-mails exchanged between October 2008 and July 2011. The traffic and number

5 https://quality.mozilla.org/ 6 The structure of the Mozilla Quality Assurance teams is dynamic considering the fact that the Services team was later dropped and that since data was collected, other changes have also taken place. QA Practices and FLOSS Communities 43 of users is higher on the Mozilla.dev-quality mailing list than the Mozmill developer mailing list (see Table 1)7. This is to be expected as the Mozmill developer list is addressed to more technically aware users since it is dedicated to an automated testing tool produced by the Mozilla community primarily to test their own products.

Table 1. Mozilla.dev-quality: 2006/17/2-2011/6/30, Mozmill developer 2008/10/1-2011/7/21

Mozilla.dev-quality Mozmill developer Total

Topic 1042 313 1299

Messages 2535 1155 3689

Thread initiators 199 47 233

Distinct author 293 61 332

The issue tracker (Bugzilla) data set covers all Mozilla products from 1998 to 2012 containing 687,221 bugs with 5,834,507 associated comments. Bug ids range from 0 to 724,339; the collected bugs thus represent 94.87% of the id range. The remaining 5.13% were not collected because they were not publicly available or due to bad html that could not be parsed. Approximately 4,400 distinct project members were identified as assigned to fix bugs. Without getting the data associated with code commits it is not safe to assume that these members were also the members that posted the bug fix. However, they can be considered as known code contributors within the community. These users are also active when it comes to posting bug comments as well as sending e-mails on the QA mailing lists. After cross-referencing members active on the mailing lists and code committers, 883 bugs were found, most of which pertain to Firefox. From all the e-mails exchanged 152 (4.12%) were sent by authors who had sent only one e-mail throughout the period taken into consideration for this research. On the other hand 135,466 bugs (19.70%) were posted by members who had posted only one bug throughout the period taken into consideration. Most activity has steadily increased over time; however activity on the QA mailing lists do not seem to be linked to time progression. As a result it would be useful to find out if the activity peak in 2009 is linked to interior events or external factors. Developer activity includes bugs to which they are assigned, comments posted on the issue tracker as well as e-mails send on the QA mailing lists. If we consider that 11 e-mails (average number of e-mails sent) is the lower limit for active users then one can draw the conclusion that only a small percentage of the community can be considered active as only 16.8% of the users sent more than 11 e- mails and 17.69% of users received more than 11 replies. Following the same

7 The difference in the total is due to cross posting and users belonging to both lists 44 A. Barham principle, only 4.39% of users show a higher than average activity posting more than 39 comments and 9.25% more than 6 bugs.

Table 2. Member activity on a yearly basis

2006 2007 2008 2009 2010 2011

Comments 328846 335323 467087 528199 658030 703857 Bugs 42015 41995 56785 60880 78089 78896

E-mails 343 361 556 1307 739 384

Dev bugs 119571 123234 174742 177776 227123 226555

Dev Comments 258458 271679 375729 449539 541707 561853

Dev e-mails 196 286 343 953 500 264

In this phase of the research, due to the fact that data collection and cleaning took longer than anticipated, social network analysis techniques could not be applied to the whole data set. Instead interaction was analyzed between active members on the mailing list (more than 10 e-mails sent – 55 users) and 10 members with an average activity on the issue tracker (not among the top/bottom commentators). The resulting network does not depict relations between all QA members and its role is only to offer a sample of the interaction patterns within the community. After eliminating loops (replies to themselves) this sub-network had a number of 1433 participants with 2593 connections; 933 of these connections were formed by more than one interaction. The average degree is 3.16, which means that the average number of connections a member has is approximately 3.

Fig.1. Social Network - Mailing list and 10 active members on the issue tracker QA Practices and FLOSS Communities 45

4.2 Ubuntu

Ubuntu was chosen as the next case study for this research as it is yet another mature and successful FLOSS project. As opposed to the Mozilla projects, Ubuntu has a dedicated QA team8 as well as a Bug Squad9. The Bug Squad’s tasks include assigning bugs to packages, ensuring that bug reports are adequate, finding duplicate bugs, replicating bugs and so on. The QA team dataset contains e-mails from October 2007 to August 2012, the Bug Squad team dataset contains e-mails spanning from July 2006 to August 2012 while the Laptop testing team contains e-mails from August 2005 to September 2012 was downloaded. In total 7595 e-mails were exchanged on all three mailing lists. At this point of the research mailing list data was not cleaned thus the results presented in this paper are only provisionary.

Table 3. Member activity on a yearly basis

2005 2006 2007 2008 2009 2010 2011 2012 Total

QA mailing list - - 30 324 321 694 486 591 2446

Bug Squad mailing list - 206 515 536 475 1232 633 280 3877

Laptop testing team 338 729 117 0 35 20 12 21 1272

If we consider that 5 e-mails (the average number of e-mails sent per user) is the lower limit for active users, then we can conclude that a small percentage of the community has a higher than average activity as only 14.51% of participants had send more that 5 e-mails. In this phase of the research, due to the fact that data was not cleaned thoroughly, final conclusions regarding communication patterns between members can’t be drawn. However a preliminary analysis was conducted using Pajek by integrating the three mailing lists. After eliminating loops (replies to themselves) this sub-network had a number of 1086 participants with 2642 connections; 472 of these connections were formed by more than one interaction. The average degree is 4.8, which means that the average number of connections a member has is approximately 5. The network contains 30 components from which 29 contain less than 3 members. In other words, there is a large connected group of 1019 members.

8 http://qa.ubuntu.com/ 9 https://wiki.ubuntu.com/BugSquad

46 A. Barham

Fig.2. Social Network – QA mailing list and Bug squad mailing list

4.3 Plone

Plone was also chosen as its development process includes a QA step and is yet another successful project10. As opposed to Ubuntu and Mozilla, Plone has only one QA dedicated mailing list. The QA team has a webpage11 where one can find basic information such as activity description, communication channels and team leaders. In the case of Plone, QA activities include triaging new bugs, validating submitted patches, ensuring that new releases are usable and generally help in the release process. In order to measure QA activity levels on more than one channel, issue tracker data as well as mailing list data were taken into account. Data was retrieved in December 2012 – January 2013. The issue tracker data contains a total of 13026 bugs with 55883 associated comments while the mailing lists contain a total of 29525 e-mails which were downloaded using MailingListStats12, an open source tool. In addition, a list of contributors containing names and names used in code repositories was downloaded from Ohloh.net. This list was used to perform data cleaning as follows: names contained in the issue tracker and the mailing list were compared with names from code repositories and merged. A detail that can be observed from Table 4 is that there is no correlation between time progression and activity levels. In addition, there seems to be no correlation between the number of bugs opened and the comment activity level, which leads to question what causes these spikes in activity levels.

10 http://plone.org/ 11 http://plone.org/community/teams/qa-team 12 https://github.com/MetricsGrimoire/MailingListStats QA Practices and FLOSS Communities 47

Table 4. Member activity on a yearly basis

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Bugs 748 1503 1461 1349 944 1482 1336 1132 1391 1060 620

Comments 1102 3106 3322 4057 3636 5193 5592 11251 6297 4015 8312

E-mails - - - 556 2792 4661 7615 6578 1241 3375 2707

QA e-mails ------47 64

The QA mailing list activity started in 2011 and includes 41 members from whom approximately 70% had sent only one e-mail. As can be seen from Table 5, most users are active on other mailing lists and even contribute code, which means that in the case of Plone, QA can’t be considered as a separate layer in the community.

Table 5. QA mailing activity – users that sent more than one e-mail

QA Other Code 13 Author e-mails e-mails Lists Bugs Comments Contributor Eric S 43 137 5 76 886 Yes alan r 8 152 5 0 0 Yes David G 6 190 6 0 0 Yes Mindsways 4 0 0 0 0 No Dylan J 4 264 8 0 0 Yes Unrecognizable 3 2 2 0 0 No Laurence R 3 132 7 40 170 Yes American PM 3 0 0 0 0 No Gil F 2 26 2 0 0 No robert r 2 0 0 0 0 No Christian L 2 37 3 0 0 No Anthony E 2 0 0 0 0 No Elizabeth L 1 23 3 0 865 Yes

13 For layout purposes some e-mail author names were shortened. For the full table containing full names please contact the paper author. 48 A. Barham

An interesting finding after analysing activity levels of QA list members is that the person listed as QA lead on the Plone webpage has only sent one e-mail using her name on the mailing list. To create the network graph only members that created at least one relation were taken into consideration. The next step consisted of eliminating loops or arcs starting and pointing to the same vertex. An additional reduction was performed in order to remove vertices that had no connections with other vertices. The resulted network contains 3414 vertices connected by a total of 16042 arcs from which 5093 have a value that is different than 1. This means that 10949 connections (68.25%) are created by only one interaction. One could draw the conclusion that these members are occasional contributors or part of the periphery. From the remaining arcs approximately 30.67% have values between 2 and 79, which means that arcs with values between 1 and 79 account to almost 99% of all arcs. The “heaviest” 10 arcs in the network have values between 1243 and 7781. The average degree is 9.39, which means that in average a person interacts with approximately 9 other people. The density is 0.0013 where the density represents the number of lines expressed as a proportion of the maximum number of lines (that the greater the density the tighter the network structure is). To get an idea of how users that perform different tasks are distributed in the community structure, contributors were divided into the following four groups (clusters):

1. Code contributors and members of the QA mailing list 2. Code contributors but not members of the QA mailing list 3. Not code contributors but members of the QA mailing list 4. Not code contributors and not members of the QA mailing list

Table 6. Cluster member frequency Cluster Frequency Frequency% Representative 1 21 0.60 Hanno 2 344 9.94 Khink 3 16 0.46 Steve 4 3079 88.98 jonstahl

Table 6 presents the size of each cluster and how much percent does that cluster represent from the total of the community. It can be easily observed that the cluster containing members who are neither code contributors neither QA contributors is the largest representing almost 89% of the whole community. On the other hand people performing QA task appear to add up to only 1.06% of the community. At this phase of the research it is still unclear whether members of cluster 4 are periphery members that occasionally contribute or if they have different roles in the community. QA Practices and FLOSS Communities 49

The following image represents an aggregated graph of the Plone network where each cluster is represented by a different color.

Fig.3. Social Network – Plone

To analyze the communication patterns only between users that are active on the QA mailing list a sub-network containing only clusters 1 and 3 was extracted and isolate vertices were removed. The resulted network had 31 vertices linked by 133 arcs of which 58 had a value greater than 1. The density has a value of 0.14 while the average degree is 8.58. The “heaviest” 10 arcs have values between 49 and 125. In order to portray an accurate image of the network evolution after dedicating a communication channel to the QA team, social network analysis methods were applied using 6 months time frames. The size of the network varied between 226 and 746 vertices, which is to be expected considering the fact that many arcs were created after only one interaction. In addition the highest degree was 7.52 while the lowest was 4.29. The next step in this research is to apply smaller time frames and then compare central figures in different time frames as to define activity change in time or in other words how the contribution level (interest) of contributors develops over time. 50 A. Barham

4.4 Other Projects

In order to compare previous results with other communities, other two projects were chosen, namely KDE14 and LibreOffice15. The mailing lists associated to the QA team of both projects were downloaded in January 2013.

Table 7. Activity levels on a yearly basis

Project Name 2011 2012 2013 Total KDE 0 275 7 282 LibreOffice 806 2553 2 3361

Graphs associated to the social networks of both projects were created using methods similar to the ones used for previously described communities. The KDE graph contains 48 vertices connected by 117 arcs of which 35 have a value greater than 1. The density is 0.05 while the average degree is approximately 5. The LibreOffice graph contains 168 vertices connected by 868 arcs of which 345 have a value greater than 1. The density is 0.03 while the average degree is approximately 10.

Fig.4. Social Network – KDE and LibreOffice QA mailing lists

At this phase of the research no conclusion can be drawn regarding these two projects as the data gathered represents only mailing list communication between QA members. As a next step, the list of contributors will be downloaded to the local database from ohloh.net and clusters will be created as in the Plone study case. In addition, other mailing lists and bug tracker data will be added.

14 http://community.kde.org/Getinvolved/Quality 15 http://www.libreoffice.org/get-involved/qa-testers/ QA Practices and FLOSS Communities 51

5 Conclusions

Q1: What QA activities are present in FLOSS development?

A clear definition of QA activities has not been made, however evidence such as the existence of a QA mailing list oriented to more technically aware users, in the case of Mozilla, might suggest that there is more than one type of QA task. In addition, tasks performed by Ubuntu's Bug Squad may also be considered as QA activities thus pointing out that a detailed classification is necessary. In the case of Plone there does not seem to be a clear separation in QA activity types.

Q2: What are the characteristics of QA activities and who is performing them?

Members of the QA team are active on other mailing lists and also contribute code. This aspect needs to be further studied in order to determine the percentage in which QA contributors perform non-QA activities. The activity of community members that “hit and run” is higher on the Issue Tracker than on the QA mailing list for Mozilla projects. This may suggest that QA contributors have a more sustained activity in the Mozilla community. Another difference is that issue tracker activity has shown an increase over time while mailing list data showed a peak level. In addition, Ubuntu mailing list activity and Plone mailing list and issue tracker activity showed similar spikes. This might suggest that activity levels may not be related to time progression but to other variables that need to be identified.

Q3: How does a QA contributor fit into the community?

In the case of Plone, QA is not a separate layer within the community considering the fact that the most active members have also contributed code to the project and were active on other mailing lists. However, it may be possible that non-QA activities were performed in different time frames than the ones in which the contributors were part of the QA team. For further clarifying this aspect it is necessary to compare these time frames in another study. Approximately 70% of QA mailing list members have sent only one e-mail and 30% have not been active on other channels. Due to the fact that data cleaning was performed superficially at this phase, a later reassessment is necessary. Furthermore, members who are active on the QA mailing list account to only 1.06% of the whole community. In the case of Mozilla, Ubuntu, KDE and LibreOffice it is not clear yet whether QA is established as a separate layer in the community as more data needs to be collected. However, these teams represent a layer in the community considering that they have dedicated communication channels. 52 A. Barham

Q4: What are the communication patterns between QA members as well as with other project participants?

The data associated with Ubuntu, Mozilla, KDE and LibreOffice is incomplete and thus general conclusions cannot yet be drawn regarding communication patterns. However, based on the preliminary social network analysis, the conclusion is that there is a core of people that are highly active. In the case of Mozilla, the data collected so far points to the fact that there is a large team spanning both mailing lists and issue tracker. In addition, judging from the activity of QA members and code committers on the issue tracker it is safe to say that interaction with other community members has been increasing. In the case of Plone, the community as a whole seems to form a large component that spans both issue tracker and mailing list with the exception of some smaller sub-networks. This means that for both projects there is a lower risk of a few members controlling the information flow and jeopardizing communication.

6 Limitations and Further research

The purpose of this research is not to create a recipe for success that can be applied to all FLOSS projects as every project has its unique characteristics. The goal is to start clarifying how and by whom is QA performed in FLOSS projects by creating a set of hypotheses that may be confirmed in future studies. However, assessing whether the quality of FLOSS projects has improved or not after implementing QA is difficult even though previous studies have attempted to create a methodology for measuring FLOSS quality [34] and thus is beyond the scope of this study. One of the limitations of this study is that the community members might also use other communication channels that are not publicly available. In addition, at this point of the research data cleaning was performed superficially thus the results presented are only provisory. The next step of this research is integrating additional data such Ubuntu contributor list and issue tracker data (download is still in progress). Also, a thorough data cleaning is necessary before applying any additional social network analysis techniques. Follow-up interviews are necessary in order to correlate data peaks and anomalies, clarify the private communication aspect as well as define QA activities and to confirm the findings of this study. In addition, community analysis should use time frames for all the projects taken into consideration. Furthermore, community evolution should be analyzed with time frames, before adopting formal QA as well in order to track down potential migration of members from one team to the other. QA Practices and FLOSS Communities 53

7 References

1. Fitzgerald, B.: The Transformation of Open Source Software. In: MIS Quarterly (2006) 2. Sharma, S., Sugumaran, V., Rajagopalan, B.: A framework for creating hybrid- open source software communities. In: Information Systems Journal, vol. 12, pp.7–25 (2002) 3. Shah, S.: Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. In: Management Science - Management , vol. 52, no. 7, pp. 1000-1014 (2006) 4. Bonaccorsi, A., Giannangeli, S., Rossi, C.: Entry Strategies Under Competing Standards: Hybrid Business Models in the Open Source Software Industry. In: Management Science - Management , vol. 52, no. 7, pp. 1085-1098 (2006) 5. Carillo, K., Okoli, C.: The Open Source Movement: a revolution in software development. In: Journal of Computer Information Systems vol. 49, pp. 1-9 (2008) 6. Koru, G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. In: The Journal of Systems & Software, Vol. 80, No. 1, pp. 63-73 (2007) 7. Kilamo, T.: Essential Properties of Open Development Communities. In: Proceedings of the OSS 2011 Doctoral Consortium (2011) 8. Halloran, T. J., Scherlis, W. L.: High quality and open source software practices. In: 2nd Workshop on Open Source Software Engineering (2002) 9. Hedberg, H., Iivari, N., Rajanen, M., & Harjumaa, L.: Assuring Quality and Usability in Open Source Software Development. In: First International Workshop on Emerging Trends in FLOSS Research and Development, FLOSS’07, pp. 2-2 (2007) 10. Michlmayr, M., Hunt, F., Probert, D.: Quality practices and problems in free software projects. In: Proceedings of the First International Conference on Open Source Systems, pp. 24–28 (2005) 11. Schmidt, D. C., Porter, A.: Leveraging open-source communities to improve the quality & performance of open-source software. In: Proceedings of the 1st Workshop on Open Source Software Engineering (2001) 12. Chengalur-Smith I., Sidorova A., Daniel S.: Sustainability of Free/Libre Open Source Projects: A Longitudinal Study. In: JAIS 11 (2001) 13. Spinellis D., Gousios G., Karakoidas V., Louridas P., Adams P. J, Samoladas I., Stamelos I.: Evaluating the Quality of Open Source Software. In: Electronic Notes in Theoretical Computer Science, Volume 233, Proceedings of the International Workshop on Software Quality and Maintainability (2009) 14. Aberdour M.: Achieving Quality in Open Source Software. In: IEEE Software, pp. 58-64 (2007) 15. Zhao, L., Elbaum, S.: Quality assurance under the open source development model. In: Journal of Systems and Software - JSS, vol. 66, no. 1, pp. 65-75 (2003) 54 A. Barham

16. Crowston K., Howison, J.: The social structure of Free and Open Source software. In: First Monday, Vol. 10, No. 2 (2004) 17. Mockus A., Fielding R. T, Herbsleb J. D.: Two Case Studies Of Open Source Software Development: Apache And Mozilla. In: ACM Transactions on Software Engineering and Methodology, volume 11, number 3, pp. 309–346 (2002) 18. K. Crowston et al.: Self-organization of teams for free/libre open source software development, Inform. Softw. Technol. (2007) 19. Oezbek C., Prechelt, L., Thiel F.: The Onion has Cancer: Some Social Network Analysis Visualizations of Open Source Project Communication. In: Psychology, Section 4, 4-9 (2010) 20. Masmoudi H., den Besten M. L., De Loupy C. and Dalle J. M.: 'Peeling the Onion': The Words and Actions that Distinguish Core from Periphery in Bug Reports and How Core and Periphery Interact Together. In: Fifth International Conference on Open Source Systems (2009) 21. Sowe, S., Samoladas, I., Stamelos, I., Angelis, L.: Are FLOSS developers committing to CVS/SVN as much as they are talking in mailing lists? Challenges for Integrating data from Multiple Repositories. In: 3rd International Workshop on Public Data about Software Development (2008) 22. Jensen, C., Scacchi, W.: Role Migration and Advancement Processes in OSSD Projects: A Comparative Case Study. In: Proceedings of the 29th international conference on Software Engineering, pp. 364-374 (2007) 23. Pressman, R. S.: Software Engineering: A Practitioner's Approach. McGraw-Hill Higher Education (2001) 24. Wiggins, A., Howison, J., Crowston, K.: Social dynamics of FLOSS team communication across channels. In: Proceedings of the Fourth International Conference on Open Source Systems (2008) 25. Watts, D. J., A.: Twenty-first century science. In: Nature, Vol. 445, No. 7127, pp. 489-489 (2007) 26. Aral, S., Brynjolfsson, E., Van Alstyne, M.: Productivity Effects of Information Diffusion in E-mail Networks. In: Proceedings of ICIS 2007 (2007) 27. Crosby, P. : Quality Is Free . McGraw-Hill (1979) 28. Flyvbjerg, B.: Five misunderstandings about case-study research. In: Qualitative Inquiry, 219-245 (2006) 29. Mockus, A., Fielding, R., Herbsleb, J.: A case study of open source software development: the apache server. In: Proceedings of the 22nd international conference on Software engineering , ICSE '00, 263-272, New (2000) 30. Stol, K., Fitzgerald, B.: Uncovering theories in software engineering. In: 2nd Workshop on a General Theory of Software Engineering (GTSE) collocated with ICSE 2013, (2013) 31. F., Burkhardt J-M, Sack W.: Thematic coherence and quotation practices in OSS design-oriented online discussions. In: Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work (2005) QA Practices and FLOSS Communities 55

32. Howison, J., Wiggins, A., Crowston, K.: Validity Issues in the Use of Social Network Analysis for the Study of Online Communities. In: Journal of the Association for Information Systems (2012) 33. Christley S., Madey G.: Global and Temporal Analysis of Social Positions at SourceForge.net. In: The Third International Conference on Open Source Systems, IFIP WG 2.13, Limerick, Ireland (2007) 34. Shaikh, S., Cerone, A.: Towards a metric for Open Source Software Quality. ECEASST 20 (2009) 35. Barham, A.: The emergence of quality assurance in open source software development. In: Proceedings of the OSS 2011 Doctoral Consortium (2011) ! !

! ! ! !

! ! (This page intentionally left blank) Free/Libre Open Source Software users’ Local Face-to-Face Meetings in Career-Related Activities

Eunyoung Moon School of Information, University of Texas at Austin, Austin, Texas [email protected], WWW home page: http://www.ischool.utexas.edu/~eymoon/

Abstract. While prior studies on Free/Libre Open Source Software (FLOSS) users have mainly looked at FLOSS users in the context of FLOSS communities, little is known about whether and how FLOSS communities play a role in the context of its users’ careers. This study takes an inductive approach to investigate the phenomena that are emergent and poorly understood. As promising themes emerge from a pilot study on Drupal users’ local face-to-face meetings, the author proposes research question and research design. The proposed study plans to investigate local LAMP (Linux, Apache web server, MySQL, PHP/Perl/Python) users who attend local face-to-face meetings. The proposed study pays attention to: what role do FLOSS users’ local face-to-face meetings play in the context of their career-related activities. It is expected that the proposed study will contribute to FLOSS research by giving a new insight into FLOSS users in their careers, how FLOSS projects fit into local FLOSS users’ careers, and how to manage offline activities for FLOSS users.

1 Introduction

A majority of studies on Free/Libre Open Source Software (FLOSS) have looked at FLOSS users in the context of FLOSS projects. From this perspective, researchers studied what motivates FLOSS users to seek out communities [20] and to participate in FLOSS user group activities [2,14]. In that study, researchers found that new FLOSS users seek out communities due to a need to solve problems [20] as well as specific personal motives [2,14]. Study on Apache web server user groups [14] found that participants who posted answers to questions in Apache online help system gained direct learning benefits. Researchers on Linux user group [2] suggested that FLOSS users are motivated by a combination of social and psychological factors such as identification with the open source movement. In a similar vein, researchers have looked at how FLOSS users join FLOSS projects [13,22,25], how they are socialized in FLOSS communities [7], and why they stay or leave FLOSS projects [8]. While prior studies have mainly looked at FLOSS users in the context FLOSS projects, few studies [5,15] investigated FLOSS communities from the perspective on its users’ lives. Crowston et al. (2007) investigated the role of face-to-face meetings in Please cite as: E. Moon, 2013, Free/Libre Open Source Software users’ Local Face-to-Face Meetings in Career-Related Activities, in: K. Stol, B. Lundell and G. Madey (Eds.) Proceedings of the OSS Doctoral Consortium. 58 E. Moon the rest of FLOSS developers’ lives by observing and interviewing FLOSS developers at the several formal annual events such as Apachecon 2003 and 2004, as well as, OSCon 2004. The authors in that study suggested that the value of face-to-face meetings within FLOSS context lies in face-to-face socialization which facilitates developers’ interactions through developing and maintaining social ties, and helping build shared mental models. Furthermore, the authors found that FLOSS conferences helped FLOSS developers set aside time for the FLOSS projects. Marlow and Dabbish (2013) in a recent study [15] investigated how potential employers utilize activity traces on GitHub for hiring software developers and how job seekers attempt to manage the impressions in GitHub as prospective employees. The authors in that study suggested that GitHub provides employers with the detailed information for recruitment, for example, activity level in FLOSS projects, code styles, and project management abilities. Furthermore, it was found that job seekers attempted to promote their presence in order to supplement their job applications by providing a link to their GitHub account. Those findings indicate that GitHub is employed by FLOSS users in the context of their careers. In sum, a majority of studies on FLOSS investigated FLOSS users mainly from the perspectives on FLOSS projects rather than on their lives [5] or careers [15]. In the pilot study, however, the author identified FLOSS users’ career-related activities in FLOSS users’ local face-to-face meetings, based on the qualitative data collected from multiple sources. While face-to-face meetings of FLOSS users appear to occur more today than in the past, there is little research that sheds light on them. The following section starts with a pilot study in which the promising themes emerge.

2. A pilot study

As a pilot study, the author has studied three types of local Drupal1 users meetings, two regular meetups and the annual Local Drupal Day. The period of the pilot was from September to December in 2012. The Drupal project was chosen because 1) Drupal project is mature, 2) local Drupal meetups have two types of meetup: Newbie meetup for beginners and Drupal user group (DUG) meetup for advanced users. It was expected that the researcher could meet a diverse range of local Drupal users. A local Drupal user group (DUG) meetup was created by two big Drupal-based companies in a local city in 2009 since two companies wanted to get together and to exchange ideas face-to-face. One of the two companies has supported local weekly DUG meetups as well as local monthly Drupal events such as the local meeting on Local Drupal Day, which is an annual event where Drupal users gather in cities across

1 Drupal [6] is a free and open-source content management system written in PHP. Drupal was initially released in 2001 and current version is 7.22. Its uses range from personal blogs to corporations, the government sites, and universities. FLOSS users’ Local Face-to-Face Meetings in Career-Related Activities 59

the world. The weekly DUG meetups have been held at the local pizza store in the open space. In this weekly meetup, DUG meetup attendees sat around the table with meals and shared the individual Drupal issues with other attendees. The monthly Drupal events have been held at the local co-work space in which presentations or workshops are available. The monthly events had the presentations on the latest issues or special issues in Drupal and presenters stood in the front stage. In the middle of the presentation or after the presentation, attendees asked questions to the presenter and discussed the issues. A local Drupal newbie meetup was created by the individual Drupal developer working in the local college. This meetup aims to help Drupal beginners learn the basics of Drupal from the installation to customization. The newbie meetup has been held in the seminar room of the local college. This meetup had the presentation on the basics of Drupal as well as the question and answer sections. In the observation, the author participated in local meetups and a local event as a Drupal beginner and a Drupal meetup member, spending a total of 10 hours. Approximately 60 members attended local Drupal Day, 12 members attended local Drupal newbie meetup, 5 members attended local weekly DUG meetup at the first observation, 4 members attended local weekly DUG meetup at the second observation. In the course of observation, the author joined in conversations among Local Drupal Day attendees regarding the local event itself. Also, in the local meetups, the author asked questions regarding terminologies used in Drupal, third-party tools to help start learning Drupal, as well as, questions about the projects that meeting attendees were developing and how meetup plays a role in their projects based on Drupal. By engaging in observation, the author wrote detailed field notes and audio- taped conversations when meeting attendees speak too technically. In addition, the author took photos of what local DUG meetup attendees worked on. In addition, the author conducted two semi-structured interviews with the organizer of Drupal newbie meetup and the organizer of DUG meetup. The interview protocol was developed in advance, however, the question was modified and added, depending on the interviewee’s interests. The interview questions mainly addressed: the motivation to use/develop Drupal, the overall information about local meetings, the motivation to attend other local meetings for Drupal users, and whether the interviewee has been involved in other FLOSS projects. The author also collected a total of 273 meetup members’ online introductions from local Drupal newbie meetup website and local DUG meetup website. DUG meetup members’ introductions included answers to the questions: 1) what would you like to learn from and/or contribute to the local Drupal community?, 2) have you done a project in Drupal, and 3) if so, describe the project. In addition, the author paid attention to the local Drupal meetup’s message boards, including a total of 52 messages posted by local commercial companies and organizations looking for Drupal 60 E. Moon developers. Several business cards the author received in Local Drupal Day were also collected as artifacts for the study. To analyze the qualitative data collected, the author used ATLAS.ti [1], a qualitative data analysis software, in the course of coding. In the 1st round of coding, the author identified and developed concepts by a line or a paragraph of field notes, interview transcripts, and artifacts. This stage of coding resulted in naming and categorizing phenomena. Then, in the 2nd round of coding, the author focused on the context of local face-to-face meeting attendees’ career-related activities. In this stage, the author made connections between a category and its subcategories. For example, the code, Career-related activities were developed in the second stage of coding by making connections among sub-codes developed in the 1st round of coding: job requirement, looking for a new job, the use of Drupal at workplace, and job advertisement. As promising theme such as Career-related activities emerges in the course of coding, the author plans to focus data collection on it.

3. The Research Question emerges

In the field notes, interviews, and local Drupal meetup members’ online introduction, the author identified FLOSS users’ local meetings in terms of career-related activities. For instance, local meetup attendees brought to the meeting individual specific problems encountered in the course of developing the content management systems for their own customers. In the meetup, attendees discussed their technical problems to look for the solutions. For example, one attendee learned that the part he wanted to modify was hardcoded and he should submit a patch or create a new module from scratch for completing his commercial consulting services. Such an episode accompanied that attendees mentioned that they couldn’t find answers by googling or by posting the problem in online community. These episodes reveal that Drupal users’ local meetups seem to be the place in which meetup attendees share or solve the technical issues of their job requirement. This theme also appeared from the analysis of local Drupal meetup members’ introductions. Furthermore, in the preliminary interviews, the organizer of local Drupal newbie meetup noted that he uses Drupal since there are local communities in which somebody can help him. The organizer of local DUG meetup also mentioned: I have talked about some problems, like I have to do this for client. Have you been using this module before? The same questions of other people come here to ask and I have the same question as well. 64 Drupal meetup members specified their careers in online introductions. Also, 5 business cards the author received at local face-to-face meetings helped the author identify their careers. The analysis of those data showed that many of the attendees are FLOSS users’ Local Face-to-Face Meetings in Career-Related Activities 61

employed locally. The background of Drupal meetup members as employees appeared as a developer, a system administrator, a web designer, a company owner. From the analysis of online introductions, themes looking for a new job and the use of Drupal at workplace emerged. The theme looking for a new job was defined as the cases when the member introduced oneself as prospective employee who was seeking for a new job with Drupal. For example, one member introduced oneself: (I’m a)Web designer with Drupal experience looking for work! Enjoyed being a part of the Neighboring town Drupal group from 2008-2012, but looking forward to getting to know the local Drupal users. Meetup members also introduced themselves as workers who use Drupal at workplace at Drupal meetup website as well as at local DUG meetups. Meetup members mentioned what they’ve been working on Drupal in the workplace. In local DUG meetups, each meetup attendee introduced that the individual has worked on the development websites for non-profit organizations, for online gamers, for one’s own customers, and has worked as an administrator with Drupal. The following online introductions are the examples of this case: New to the local city, have been doing some heavy duty Drupal implementation for the company. I work for since late last year. We implemented the Drupal 7(version) site.

I'm a Linux systems administrator who's been developing Drupal sites for about a year. The analysis of online introductions showed that attendees included local employers. A total of 9 meetup members’ introductions were identified as employers who were looking for Drupal developers. These introductions specified the technical skills needed for the job, information about the companies and where to contact. The example of introduction is as follows: We are ‘local Marketing’, a marketing and development firm based in a beautiful local city. We're looking for LAMP/Drupal developers...please contact us! Local Drupal meetup members who work for local companies (e.g., two big local Drupal-based companies), local non-profit organizations, local universities, local state agencies, or local staffing firms posted a total of 52 job advertisements in the message boards of local Drupal meetup websites. These job advertisements described the work modes as either in-house or telecommunication and the positions as either temporary contract work or regular full-time work. For example, one job advertisement described that local Drupal developers were immediately needed to complete the development of two custom modules in the next 2 weeks. In contrast to this case to look for temporary workers, one of the two big Drupal-based local companies posted a total of 7 job 62 E. Moon advertisements to look for local full-time workers such as Drupal developers, Drupal web designers, and architect. Two local DUG meetups were financially supported by the commercial company that was looking for senior software developers who know Drupal. In these meetups, meetup attendees repeated the job description again and again and exchanged information so that potentially interested developers can contact the company. In the middle of conversation, meetup attendees noted that the company’s main base is not the US and started to talk about the Drupal-based local companies in terms of their technical interests such as programming languages and operating systems. Furthermore, meetup attendees shared the perspectives on the local context including weather, the city as a tech city, and its cultural features. Those emerging themes job requirement, looking for a new job, the use of Drupal at workplace and job advertisement emerged in the 1st round of coding, and the author made a connection among them by creating a theme career-related activities in the 2nd round of coding. However, those sub-codes of career-related activities might vary according to FLOSS project. To focus on the promising themes career-related activities, the author proposes the research question: What role do FLOSS users’ local face-to-face meetings play in the context of their career-related activities?

4. Research Design

4.1 Research Setting

To further investigate FLOSS users’ local face-to-face meetings in their career-related activities, the author proposes to collect data from FLOSS user groups in a local city of the US. In this study, local face-to-face meetings include active local FLOSS users’ meetups held on a regular basis, local FLOSS conferences, and local FLOSS events etc. Based on a preliminary study on Drupal users, the author proposes to investigate other FLOSS user groups with the following criteria: 1) FLOSS project is mature, 2) the use of FLOSS is common at commercial software companies, 3) local FLOSS user groups have had local face-to-face meetings on a regular basis. The user groups that best match these criteria are local meetings focused on LAMP (Linux, Apache web server, MySQL, and PHP, Python or Perl). Choosing multiple, though similar, cases should provide a basis for comparison and give more insight to the proposed study. The researcher will seek recommendations for participants for other relevant meetings. FLOSS users’ Local Face-to-Face Meetings in Career-Related Activities 63

4.2 Data collection

4.2.1 Qualitative interviews The author proposes to conduct qualitative interviews with a diverse range of FLOSS users who attend local face-to-face meetings. To recruit respondents from a variety of cases, the author uses two methods. Firstly, the author will start with respondents from members of local FLOSS meetups held on a regular basis and then, ask each respondent to recommend others who could be interviewed and who attended the local face-to-face meetings, known as snowball sampling [23]. However, using only snowball sampling creates a risk that only participants from the same networks will be sampled.. Accordingly, the author will use theoretical sampling in order to mitigate a biased sampling, depending on a set of the respondents. Theoretical basis will be developed, based on the FLOSS survey [9]. In that survey, the authors reported occupational background of FLOSS developers as software engineers, programmers, consultants, executives/marketing/product sales professionals, university staffs, and students. The feature of employment status will also be useful to construct the sample. Following these two features of occupational backgrounds and the employment status, the author will ask the respondents to recommend the individuals in this set of theoretical sampling. In the interview, the author plans to develop the interview protocol with the following main questions: 1) what was the first/recent local meeting the interviewee attended and what motivated him or her to attend the meeting, 2) whether the interviewee have exchanged business cards with other attendees and if so, what prompted this exchange and whether the interpersonal relationship evolves or not. 3) the degree of working with FLOSS at workplace and how it relates to attending to local FLOSS users’ face-to-face meetings, compared to online venues. In addition, interviews will include questions for FLOSS users as employers about the ways of looking for FLOSS developers for recruitment and whether and how the hiring process relates to local FLOSS users’ meetings such as local FLOSS conferences and local FLOSS users’ meetups.

4.2.2 Observation

The author will collect data by observing local face-to-face meetings, taking notes that will describe the settings, the activities that took place, the people who attended, and the meanings of what was observed [18]. The data collected from observation will enable the author to comprehend new dimensions of meeting attendees’ actions [11] that might not be captured in interviews since the respondents could be unaware of their behaviors. Observing local FLOSS users’ face-to-face meetings will enable the author to understand the nature of each local FLOSS communities as well as individual meeting attendees. In these observations, the author will take field notes in 64 E. Moon detail and record conversations with permission, when meeting attendees talk too technically for the observer’s immediate comprehension.

4.2.3 Public Documentation

The author will collect artifacts from FLOSS user groups’ websites as well as local face-to-face meetings. In the local meetup website, job advertisement from commercial companies and meetup members’ introduction are the examples of artifact for analysis. In addition, public messages among local FLOSS users’ meetup members will be analyzed. By collecting multiple data of multiple sources, this study increases validity through triangulation, enabling the researcher to both confirm understandings from individual sources of data and to discover aspects perhaps not included in one source of data.

4.3 Data analysis and procedures

In the proposed study, the purpose of data analysis lies in discovering and understanding the relationships among FLOSS users who attend local FLOSS users’ face-to-face meetings and how FLOSS projects fit into careers [10]. That is to say, the author moves towards explanation [19] by engaging in several coding processes [21]. In the 1st round of coding, the author labels phenomena and discover themes from labeled phenomena. In the 2nd round of coding, the author makes a connection between a theme and its sub-theme regarding a set of relationships indicating context, actions, and interactional strategies etc. In this stage, the data collected both in the pilot study and in the future study put back together in new ways by making connections. Accordingly, it is expected that new themes will occur and existing themes will be likely to be modified, as the author collects the qualitative data from multiple FLOSS users’ local meetings. The number of processes to make connections between a theme and its sub-themes cannot be fixed since it depends on the properties and dimensions of contexts from the data collected. After this process, finally a core theme will be formulated to explain the relationships among conceptual constructs or stories.

4.4 Future Plan for Validity

The author plans to discuss about the findings with the key informants for the validity of this study, known as “member checking” [11]. To be specific, the researcher plans to check the interpretation in the course of coding the qualitative data. In addition, the author plans to finally send my final writing to the key informants. FLOSS users’ Local Face-to-Face Meetings in Career-Related Activities 65

The researcher will re-visit pertinent literature [21] on networking events for careers outside FLOSS [3,4,16,17,24]. Returning to existing literature, the author will investigate how findings on networking for careers [12] outside FLOSS appear in similar ways and in different ways in the context of FLOSS by pointing out a set of conditions or concepts in FLOSS communities.

5. Expected Contribution

The proposed study is expected to contribute to FLOSS research theoretically and practically. Theoretically, it is expected to develop a way of thinking about FLOSS users in the context of their careers. A recent study on employer’s use of github for recruitment and hiring for software developers [15] suggested a new perspective on the use of GitHub, a collaboration tools for FLOSS projects, in the context of its users’ careers. The proposed study will provide the new perspective on 1) FLOSS projects in the context of FLOSS users’ careers and 2) how FLOSS projects fit into local FLOSS users’ careers. The proposed study investigating multiple FLOSS projects in local FLOSS events will give researchers an understanding of how experiences in FLOSS projects relate to FLOSS users’ job seeking, satisfying job description, and hiring processes. Practically, the proposed study is expected to help further understandings of FLOSS users’ offline activities. While few studies [2,5] have looked at what FLOSS participants do in their face-to-face meetings, the proposed study investigates how local FLOSS users’ face-to-face meetings play a role in their activities beyond socialization. The author expects the proposed study would provide FLOSS communities with insight that will be useful in managing offline activities in their communities.

Acknowledgment The author is grateful for the helpful comments of her doctoral advisor, James Howison and for the encouragement of Joan Hughes.

References

1. Atlas.ti http://www.atlasti.com/index.html 2. Bagozzi and Dholakia (2006). Open source software user communities: A study of participation in Linux user groups. Management science, 52(7), 1099-1115. 66 E. Moon

3. Bayer, P., Ross, S., & Topa, G. (2005). Place of work and place of residence: Informal hiring networks and labor market outcomes (No. w11019). National Bureau of Economic Research. 4. Constant, D., Sproull, L., & Kiesler, S. (1996). The kindness of strangers: The usefulness of electronic weak ties for technical advice. Organization science, 7(2), 119-135. 5. Crowston, K., Howison, J., Masango, C., & Eseryel, U. Y. (2007). The role of face-to-face meetings in technology-supported self-organizing distributed teams. Professional Communication, IEEE Transactions on, 50(3), 185-203. 6. Drupal http://drupal.org accessed on May 17th, 2013 7. Ducheneaut, N. (2005). Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW), 14(4), 323-368. 8. Fang, Y. and Neufeld, D. (2009). Understanding sustained participation in open source software project. Journal of Management Information Systems, 25(4), 9-50. 9. Ghosh, R. A., Glott, R., Krieger, B., & Robles, G. (2002). Free/libre and open source software: Survey and study. 10. Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Aldine de Gruyter. 11. Glesne, C. (2011). Becoming qualitative researchers: an introduction (4th edition). Pearson. 12. Granovetter, M. S. (1973). The strength of weak ties. American journal of sociology, 1360-1380. 13. Jensen, C., King, S., & Kuechler, V. (2011, January). Joining Free/Open source software communities: An analysis of newbies' first interactions on project mailing lists. In System Sciences (HICSS), 2011 44th Hawaii International Conference on (pp. 1-10). IEEE. 14. Lakhani and von Hippel (2003). How open source software works: “free” user-to- user assistance. Research policy, 32(6), 923-943. 15. Marlow, J., & Dabbish, L. (2013, February). Activity traces and signals in software developer recruitment and hiring. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 145-156). ACM. 16. Marquis, C., & Battilana, J. (2009). Acting globally but thinking locally? The enduring influence of local communities on organizations. Research in Organizational Behavior, 29, 283-302. 17. Montgomery, J. D. (1992). Job search and network composition: Implications of the strength-of-weak-ties hypothesis. American Sociological Review, 586-596. 18. Patton, M.Q. (2002). Qualitative research and evaluation methods, 3rd Ed. Thousand Oaks: Sage 19. Pentland, B. T. (1999). Building process theory with narrative: From description to explanation. Academy of management Review, 24(4), 711-724. FLOSS users’ Local Face-to-Face Meetings in Career-Related Activities 67

20. Shah, S. (2003). Understanding the nature of participation & coordination in open and gated source software development communities. In proceedings of the HBS- MIT Sloan Free/Open Source Software Conference. 21. Strauss, A.L., and Corbin, J. (1990). Basics of Qualitative research. Newbury Park, CA: Sage publications. 22. Von Krogh, G., Spaeth, S., & Lakhani, K. R. (2003). Community, joining, and specialization in open source software innovation: a case study. Research Policy, 32(7), 1217-1241. 23. Weiss, R. S. (2008). Learning from strangers: The art and method of qualitative interview studies. Free Press. 24. Wolff, H. G., & Moser, K. (2008). Effects of networking on career success: A longitudinal study. University of Erlangen-Nurenburg, Labor and Socio- Economic Research Center (LASER). 25. Ye and Kishida (2003). Toward an understanding of the motivation of open source software developers. In Software Engineering, 2003. Proceedings. 25th International Conference on (pp. 419-429). IEEE.

Proceedings of the Doctoral Consortium at the 9th International Conference on Open Source Systems, Koper-Capodistria, Slovenia, 2013. Edited by K. Stol, B. Lundell, and G. Madey.

Copyright of the papers contained in this proceedings remains with the respective authors.

Skövde University Studies in Informatics 2013:2 ISSN 1653-2325 ISBN 978-91-978513-5-0 www.his.se