<<

Master Thesis

The influence of structure on the chance of success of Open Source software communities

Name Bart Vreugdenhil Email [email protected]

University RSM Erasmus University, Rotterdam, the Netherlands MSc Program Business Information Management, Business Administration

Coach Drs. R. Smit Department of Decision and Information Sciences

Co-reader Prof.dr.ir. J. Dul Department of Management of Technology and Innovation

Version 1.01 Final - March 31, 2009

1

Preface

The author declares that the text and work presented in this Master thesis is original and that no other sources than those mentioned in the text and its references have been used in creating the Master thesis.

The copyright of the Master thesis rests with the author. The author is responsible for its contents. RSM Erasmus University is only responsible for the educational coaching and beyond that cannot be held responsible for the content.

Cover logo © 2005-2009 Rotterdam School of Management, Erasmus University

2

Acknowledgments

Hereby, I wish to thank, in random order, all people who in one way or another supported and guided me through my Master Thesis research project.

First, I want to thank David Hinds, and his Major Professor Ronald M. Lee, for the inspiring dissertation which I used as a guideline for conducting research and writing my Thesis. Their work provided me with a stable fundament necessary to fulfill my research project. I want to thank them as well for answering several questions I proposed during this project.

Next, I want to thank the University of the Notre Dame and all the people working on the SourceForge Research Data Archive. Especially, I want to thank Greg Madey who permitted me access to these vast Open Source .

Thirdly, I would like to thank all the scientists I have referred to in my Thesis. I literally spent hours and hours reading and studying these sources. They supported me with well-funded theories and useful insights.

Fourthly, I am very grateful to my Master Thesis coach Ruud Smit and my co-reader Jan Dul who coached and guided me during my Thesis project. Without their endless supports and thoughts I could not have successfully completed the final phase of my Master degree project.

I thank all professors and employees of the RSM Erasmus University Rotterdam of the last few years for their education and support activities.

Finally, I am grateful to my parents, my and my close friends. Without their support, and their love, I would not be able to complete this project.

Bart Vreugdenhil

3

Executive summary

The internet technology have caused a huge impact on the way people can communicate and exchange information. New forms of collective action and collaboration have arisen. One of these new forms is the development and organization of Open Source software. Open Source software is software which is freely redistributable and can be adapted to individual needs. Businesses, institutions and individuals all have recognized the potential of Open Source software, and each type of people may contribute to these for various reasons.

Not much is known about the conditions which lead to success of Open Source software projects (OSSPs). Current research is of exploratory or descriptive nature and generally has focused on large projects. However, it appears most Open Source software is created by individuals and small teams.

Here, an attempt is done to find out why some Open Source projects succeed while others fail. As the internet technology provides the infrastructure for project members to connect with each other, here is focused on the (pattern of) interactions between the individuals of a project community. Based on social network theory three constructs representing this community structure of Open Source software projects is investigated by using social network analysis. Closure represents the density of the relationships in a project community, bridging represents the degree of relationships of a community to other communities, and centrality of the project takes the effect of project leaders on the community into account.

Surprisingly, the social network structure of an Open Source software project community has no significant relationship with community success. Therefore, various factors are proposed which may both affect success and the structural properties of Open Source software projects. Closure and success of an Open Source project community may be affected by the choice of software design, the use of software documentation and the existing etiquette, called netiquette. Bridging and success may be affected by the set of marketing activities and stakeholder management, where centrality and success may be affected by the adoption of accepted standards and tools, the very own Open Source including skilled developers and the fact these developers are often users of their software as well.

Due to the exploratory and limited research scope it is plausible a measuring problem exists, which implies Open Source project communities may use substitutes to communicate and exchange information and knowledge. Or, the relationships between project members are of indirect nature and therefore information and knowledge are (temporary) stored, 'embedded', in the network, thus are difficult to measure.

4

Four main conclusions can be drawn based on the unexpected research findings. First, although here the social network structure of an Open Source software project community has no significant relationship with community success, it does not necessarily mean it is not of importance. Apparently, social network analysis cannot solely explain factors affecting the chance on success of an OSSP community. Though, relationships between OSSP members are important, the individuals (and their characteristics) establishing these relationships need to be taken into account as well.

Secondly, the Open Source software project community is a new kind of social entity, because current theory of virtual communities and traditional teams and groups creating information and knowledge products cannot explain the exceptional performance levels OSSPs can achieve. Here, the premature theory of smart business networks is presented as linking-pin to further explore these kind of social entities.

Thirdly, although a premature theory is supplied to further describe the Open Source phenomenon, it cannot explain the difference in research findings between large and small OSSP communities. Here, an attempt is done to explain this difference. Where small OSSP communities can be conceived of as operating similar to traditional groups and teams, though their mode of communication is electronic, large OSSP communities can be conceived of as virtual communities. However, it is also generally noted large OSSP communities have onion-like structures including a core of developers and are surrounded by a crowd of interested people. Thus, the difference between small and large OSSP communities is this crowd. Apparently, large OSSP communities are able to deal with this crowd, without the disadvantages related to the management and organization of (growth of) traditional teams and groups. By using the 'Long Tail', a popular description of the impact of the internet's infrastructure and technology on business models, is tried to explain the difference in the basic principles of small and large OSSPs. Where large OSSPs, due to their popularity, are positioned in the hit and are generally focused on community outcome by trying to optimally incorporate the 'wisdom of the crowds-effect' to improve their software product, small OSSPs operate in niche markets and are generally focused on individual outcome.

Lastly, although researchers have not reached consensus on how to measure success of OSSP communities, here is concluded the current set of indicators for measuring the success of OSSP communities are focused on the hit markets in which large OSSPs operate, and are not suitable for the endless variety of niche markets in which small OSSPs. A new approach is needed, in which success factors for hit markets may be focused on the level of community success, and success factors for niche markets may be focused on the level of individual success of project members.

5

Table of contents

1. INTRODUCTION ...... 10 1.1. RESEARCH APPROACH ...... 11 1.2. RESEARCH QUESTION ...... 12 1.3. DEFINITIONS ...... 13 1.4. MASTER THESIS STRUCTURE ...... 14 2. LITERATURE ...... 16 2.1. SOCIAL NETWORKS ...... 16 2.1.1. New media and the network model ...... 16 2.1.2. Social network analysis ...... 17 2.1.3. Social network theory ...... 18 2.1.4. Social capital theory ...... 18 2.2. OPEN SOURCE SOFTWARE ...... 19 2.2.1. Open Source as a form of software distribution ...... 19 2.2.2. Open Source software project contributors ...... 20 2.2.3. Open Source Software project communities ...... 20 2.2.4. The organization of Open Source software projects ...... 21 2.2.5. Success studies ...... 22 2.2.6. Social network perspectives ...... 23 2.3. GROUP PROCESSES AND WORK TEAMS ...... 24 2.3.1. Organization of groups and work teams ...... 24 2.3.2. Teams ...... 25 2.3.3. Virtual teams ...... 26 2.3.4. Success of groups and teams ...... 26 2.3.5. Social network perspectives ...... 26 2.4. VIRTUAL COMMUNITIES ...... 27 2.4.1. Success of virtual communities ...... 28 2.4.2. Social network perspectives ...... 28 3. CONCEPTUAL MODEL AND PROPOSITIONS ...... 29 3.1. CONCEPTUAL MODEL ...... 29 3.2. RESEARCH CONSTRUCTS ...... 31 3.2.1. Subgroups ...... 32 3.2.2. Closure ...... 33 3.2.3. Bridging ...... 35 3.2.4. Leader Centrality ...... 36 3.2.5. Constructs overview ...... 36 3.3. SOCIAL NETWORK MODEL AND PROPOSITIONS ...... 37 3.3.1. Group closure ...... 37 3.3.2. Core closure ...... 38 3.3.3. Peripheral two-mode closure ...... 38 3.3.4. Core bridging ...... 38 3.3.5. Administrator bridging ...... 39 3.3.6. Administrator centrality ...... 39 4. RESEARCH METHODOLOGY ...... 40 4.1. STUDY DESIGN ...... 40 4.1.1. Unit of analysis ...... 40 4.1.2. Study population ...... 40 4.1.3. Research method ...... 40 4.2. RESEARCH SETTING ...... 41 4.2.1. Data sources ...... 41 4.3. VARIABLES ...... 43 4.3.1. Dependent and control variables ...... 43 4.3.2. Community success ...... 43 6

4.3.3. Controls ...... 45 4.3.4. Social network definitions ...... 46 4.3.5. Subgroups ...... 47 4.3.6. Social network variables ...... 47 4.4. SOURCEFORGE.NET POPULATION ...... 49 4.5. SOURCEFORGE.NET SAMPLE STRATEGY ...... 51 4.5.1. SourceForge sample selection procedure ...... 52 4.5.2. Overview of sample selection criteria ...... 54 4.5.3. Data compilation ...... 54 5. DATA ANALYSIS AND RESULTS ...... 55 5.1. FINDINGS OF KRISHNAMURTHY ...... 55 5.2. PRELIMINARY ANALYSES ...... 58 5.2.1. Transformation of variables ...... 58 5.2.2. Outlier assessment ...... 61 5.2.3. Reduction of variables ...... 61 5.3. DESCRIPTIVE AND CORRELATION STATISTICS ...... 63 5.4. HYPOTHESES TESTING ...... 65 5.4.1. Research hypotheses ...... 65 5.4.2. Regression methods ...... 66 5.5. TESTING RESULTS ...... 67 5.5.1. Group Density ...... 67 5.5.2. Core Density ...... 68 5.5.3. Peripheral Two-Mode Density ...... 68 5.5.4. Core Membership Degree ...... 69 5.5.5. Administrator Membership Degree ...... 69 5.5.6. Administrator Class Centrality ...... 70 5.5.7. Project Rank ...... 70 6. DISCUSSION ...... 72 6.1. SUMMARY OF FINDINGS ...... 72 6.1.1. Closure ...... 72 6.1.2. Bridging ...... 74 6.1.3. Leader Centrality ...... 75 6.1.4. Project Rank ...... 76 6.1.5. Abstract of findings ...... 76 6.2. SUGGESTIONS ...... 77 6.3. THE POSSIBILITY OF A MEASURING PROBLEM ...... 85 7. CONCLUSIONS ...... 87 7.1. RESEARCH CONCLUSIONS ...... 87 7.2. STRATEGIC CONCLUSIONS ...... 94 7.3. LIMITATIONS ...... 96 7.4. RESEARCH FLAWS ...... 97 7.5. RECOMMENDATIONS ...... 98 8. REFERENCES ...... 100 8.1. LITERATURE REFERENCES ...... 100 8.2. TEXTUAL ANNOTATIONS ...... 107 8.3. LIST OF FIGURES ...... 107 8.4. TABLES ...... 108 9. APPENDICES ...... 109

7

THIS PAGE INTENTIONALLY LEFT BLANK

8

Master Thesis

The influence of social network structure on the chance of success of Open Source software project communities

9

1. Introduction Technology and the ubiquity of the internet have caused a huge impact on the way people can communicate and exchange information. The internet has reduced the high transactions costs connected with traditional communication, transportation and organization to a bare minimum. As a result, the internet has enabled new forms of collective action and collaboration.

One of these new forms is the development and organization of Open Source software. Open Source software is software which is freely redistributable and can readily be evolved and modified to fit changing needs (Raymond, 1998). Characteristically, Open Source software is developed by volunteers, and by employees of companies working on a non-profit base, operating from all around the world, working at their own pace, at their own project tasks. Simultaneously with the rise of the Internet for the general public, in the last half of the 1990s, Open Source software projects gained a lot of momentum. In those days, as a reaction against dissatisfaction associated with proprietary software, Open Source software projects flourished. The public sharing of the creation, ownership, and benefits of the Open Source software model is the antithesis of the Microsoft model (Perens, 2005). Since then, Open Source software projects (OSSPs) and their communities have achieved enormous success. Popular examples are Linux, the operating of Linus Torvalds, and Apache, web server software, both known for successfully competing against Microsoft's closed source software equivalents. Other examples are MySQL, a relational management system with over 11 million installations, Python, a high-level programming language, and TYPO3, a wide spread enterprise-level content management system.

Giants of industry do not exactly know how to respond to this Open Source movement but some of them including Sun Microsystems, IBM, Cisco, and Hewlett-Packard have identified the potential of Open Source software and are sponsoring these projects by donating money and resources, or by allowing employees to contribute to these projects during work time. Also, previously held closed source software have been made open. Software projects as Mozilla, an Internet browser, and Open Office, an alternative for Microsoft Office have seen their success rising by the shift from a closed form of software distribution to Open Source software. Governments and organizations all over the world including USA, Germany and China are actively supporting Open Source software and Open Standards (publicly available communication protocols), to maintain their neutrality and independency to not be solely dependent on proprietary, commercial software contributions.

Others have identified the benefits of Open Source software as well. Civilians, such as students, may opt for Open Source software to reduce their costs significantly. Currently, a simple computer system for basic usage as typewriting, internet surfing, listening music and watching videos, solely equipped with Open Source software is not surpassed by a system equipped with proprietary

10

software. Companies do not want to be completely depended on a few software suppliers and instead choose for Open Source software to be more agile and flexible. As well, it may reduce initial costs, and can be adapted to specific needs immediately. Institutions and researchers may choose for Open Source software because of the neutrality to all stakeholders they need to satisfy. In general, Open Source solutions for security issues are perceived as more secure than closed software solutions. Due to the fact the source code is publicly available flaws can easily be detected.

It becomes clear Open Source projects not only may have a huge economic impact, the impact on the development of software, and even the impact of organization and innovation can be distinguished. However, currently not much is known about the conditions which lead to success of Open Source software projects. Most research on Open Source software, as with all relatively new phenomena, has been of exploratory or descriptive nature. Even now researchers have not reached consensus on how to define and measure the success of OSSPs and are unable to answer the question why do some OSSPs succeed while others fail (Crowston et al., 2006).

Most Open Source related research has focused on well-known, well-established Open Source examples which have large communities. However, most Open Source software is created by individuals and small teams (Krishnamurthy, 2002). In addition, Capiluppi et al. (2003) noted most Open Source software projects hosted on the SourceForge platform were inactive, and the pool of developers is a scarce resource that concentrates on very few projects, of which just an even smaller few will make it into a success. Finally, Capiluppi et al. (2003) concluded very successful Open Source projects such as Linux and Apache are probably not the 'average' Open Source project.

To address the knowledge gap associated with the succeeding of Open Source software projects, here two research directions are further explored. On the one hand is focused on small Open Source software projects. Although the majority of projects can be remarked as 'being small', most research has ignored this kind of projects. On the other hand, various measures are explored to investigate success of Open Source software projects, as researchers still not have achieved consensus on this topic.

1.1. Research approach

There are two main reasons which make the investigation of Open Source software projects difficult. First, the Open Source phenomenon is relatively new, and thus its research. And, the agile and complex behavior of Open Source project communities does not make it easy to investigate and measure these entities. To overcome these difficulties a suitable research approach is essential, therefore a social network perspective is chosen. A social network perspective focuses on the structure of relationships between social entities, and the nature of that structure, rather than the attributes of

11

these entities themselves (Wasserman and Faust, 1994). Stated otherwise, this research focuses on the relationships of, and between, developers within these OSSPs, rather than focusing on specific characteristics of these individuals.

Not many social network research have been conducted related to Open Source software projects. Healy and Schussman (2003) proposed researchers should attend more closely to the of the Open Source software community as they stated the huge cap between successful and unsuccessful projects is 'a real puzzle'. Hinds (2008) chose a social network perspective to further investigate the conditions which are associated with success in Open Source software project communities. A major advantage of this perspective is that it can be used as a research framework, or lead for other perspectives and research areas. Also, the social network perspective enables the precise definition of research constructs and success factors which are necessary to overcome current research issues necessary to solve, before more detailed in-depth research can take place.

The author of this research agrees with Hinds (2008) about the importance of the Open Source movement, its complex behavior and its influence on . Therefore, the research (in progress) of Hinds (2008), and the proceedings of this research (Hinds and Lee, 2008) are used as a guideline for this Master Thesis research. The research of Hinds (2008) is currently not only the most recent research on Open Source software development project which takes a social network perspective into account, it is also properly set out and clear, well founded and usable for further research, next to be of an interesting and refreshing nature.

1.2. Research question The main underlying research question is: 'Why do some Open Source project communities succeed while others fail?'. Taking a social network perspective into account to investigate the structure of these communities the research question can be formulated as:

'What is, if any, the influence of social network structure on the chance of success of Open Source software project communities?'

By using this question a fundamental contribution to the insight in the structure and management and organization of OSSP communities and its innovative development processes can be made. In practice, software development communities, individuals, organizations, businesses and governments can use the knowledge of the research findings to learn and improve the organization and strategy of (developmental) projects. The knowledge of the social structure of a network can be used in many disciplines of scientific research, business models and social and political fields, since this 'network model' is an organizational form which is used in many areas. Previously, economics have written about business modularity, biologists and mathematicians about swarm intelligence, 12

computer experts about the semantic web, and etcetera. All with the 'network model' as a conceptual fundament.

1.3. Definitions Before discussing the research-related conceptual foundations and literature, first of all three main constructs are briefly set out to put the research scope.

(1) Open Source software project community

Open Source software is defined by Raymond (1998) as software which is freely redistributable and can readily be evolved and modified to fit changing needs. In addition, an Open Source software project is the total amount of activities needed to develop a particular piece of Open Source software. The Open Source software project community is the community of the Open Source project which consists of the population of individuals who contribute to the project. These individuals are called project community 'members', but are also referred to as developers, participants or actors. Open Source (OS) software is sometimes referred to as FOSS, F/OSS or FLOSS, which stands for Free / Libre and Open Source Software. When is referred to an Open Source software project community, the community is limited to the individuals contributing to a particular project, and not to all Open Source developers in general, which can be seen as a community on its own as well. An Open Source software project community is sometimes referred to as an Open Source software community or Open source software development community.

(2) Social network structure

The social network structure refers to the structure of an Open Source software project community and includes (the pattern of) all interactions between the individuals of the community. These interactions, sometimes referred to as relationships or ties, are the ingroup ties. To a limited extent outgroup ties are being research as well. Outgroup ties include all interaction between individuals of a community and other individuals (of other Open Source software project communities) outside the community. This is discussed in a later stadium.

The social network perspective does not take individual characteristics of the community members into account. To distinguish some characteristics of these individuals 'subgroups' are introduced, which is discussed in a later stadium as well.

13

(3) Community success

Community success refers to the success of an Open Source software project community. Currently researchers are arguing on how to measure success of Open Source software project communities. Success can be conceived in many ways, and therefore it is difficult to define success.

As a project refers to the amount of activities needed to develop a particular piece of Open Source software, here success can be measured by means of two dimensions. First dimension is 'success as activity' which is associated with the level of activity of a particular OSSP community. The second dimension is 'success as output' which is associated with the amount of software produced. Other measures of community success, including the impact on society, the economic value or 'good will' of a project community are not part of the research scope.

1.4. Master Thesis structure This chapter introduced the unique character of Open Source software development and its impact on society. In succession, the social network research approach and research question were proposed. Then, three definitions of main research constructs were briefly introduced. Here, the outstanding chapter are briefly expounded.

Chapter 2

A literature review is provided to set out the main research elements. A clear picture is rendered of the research topic and its theoretical fundaments.

Chapter 3

Here, first the conceptual research model is introduced and its belonging research constructs. Then, on the basis of six research constructs propositions are made.

Chapter 4

First, the study design is presented, where after the research setting is discussed. Secondly, the research variables are set out. Then, a closer look is taken at the research population, where after the sample strategy is proposed.

Chapter 5

This chapter includes the data analysis and its results. First, a closer look is taken at the sample descriptive statistics. Then, the preliminary analysis includes a transformation of variables, an outlier assessment, and a reduction of variables. Several descriptive statistics of are discussed as well.

14

Finally, the hypotheses are tested and the testing results are presented on the basis of the research constructs.

Chapter 6

First, the research findings are briefly summarized, where after a discussion of these findings is started.

Chapter 7

In this final chapter first research conclusions are drawn related to theoretical implications. Next, strategic conclusions are drawn related to practical implications. Finally, the research limitations and flaws are discussed, and recommendations for future research are proposed.

15

2. Literature This chapter is segmented up into four subchapter in which the literature review is presented. First, theory related to social networks is discussed, where after the topic of Open Source software is set out. Then is dealt with group processes and work teams, to come to an end with a discussion about virtual communities.

2.1. Social networks This subchapter deals with theory related to social networks. First, the social shaping of new media and the network model and their current importance are taken into account. Then, social network analysis and the social network perspective are introduced. Briefly, social network theory and its most important field social capital theory are introduced as well.

2.1.1. New media and the network model Lievrouw and Livingstone (2006) noted the social shaping and social consequences of information and communication technology of new media. New media, thus new tools used to store and deliver data and information, have become everyday technologies, thoroughly embedded and routinized in the where they are most widely used. The most important example is the rise of the internet. The infrastructure and the technology of the internet have created endless new possibilities as transactions costs for communication, organization and information transportation, etcetera, are reduced to a minimum.

Two main forms of social shaping new media can be identified. First, recombination is the continuous hybridization of both existing technologies and innovations in interconnected technical and institutional networks (Lievrouw and Livingstone, 2006). The second form of social shaping is described as the network metaphor.

‘…the point-to-point network has become … the archetypal form of contemporary social and technical organization … [it] denotes a broad, multiplex interconnection in which many points or nodes (persons, groups, machines, collections of information, organizations) are embedded. Links among nodes may be created or abandoned on an as-needed basis at any location in the system, and any node can be either a sender or a receiver of messages – or both.’ (Lievrouw and Livingstone, 2006)

The internet is becoming ubiquitous. It is a global network where everybody and everything can interact quickly and instantly with each other, twenty-four-seven. Even if you are not an internet user it effects your life, as others surrounding you are affected by the internet. In fact, the internet is a huge social network. As Wellman (1996) states 'when a computer network connects people, it is a

16

social network. Just as a computer network is a set of machines connected by a set of cables, a social network is a set of people (or organizations or other social entities) connected by a set of socially- meaningful relationships.'

The crowd involved in these social computer networks are often referred to as being a virtual community. Hinds and Lee (2008) offer a simple, and suitable definition of a virtual community. A virtual community is a community in which the primary mode of interaction is electronic (online/virtual) and not face-to-face. These virtual communities have important social and technological implications. To investigate the structural properties of these virtual communities further, a social network perspective is taken into account and social network analysis is applied to conduct research.

2.1.2. Social network analysis Social network analysis is a set of methods and applications suitable for analyzing network data. Social network analysis is commonly used in a variety of scientific areas such as studies of social and behavioral nature, but also in fields as marketing or economics. The social network perspective focuses on the structure of relationships between social entities, and the nature of that structure, rather than the attributes of these entities themselves (Wasserman and Faust, 1994). In this case the research focuses on the relationships of individuals in Open Source software projects, and of relationships of individuals between OSSPs, rather than focusing on specific characteristics and behavior of these individuals.

Although researchers reached consensus on central principles of social network analysis, alternatives do exist. Here, the book of Wasserman and Faust (1994) called 'Social network analysis: methods and applications' is used as a reference guide. This book is well received among researchers conducting social network analysis in the Open Source area. In addition, Hinds (2008) refers several times to this book. And, the book is edited by Mark Granovetter who is well respected for his important and insightful research in the area of social networks.

Social network analysis is a distinct research perspective within the social and behavioral sciences, as the social network perspective is based on the assumption of the importance on relational concepts or processes between individuals. Wasserman and Faust (1994) noted four characterizing principles of social network analysis.

(1) Actors and their actions are viewed as interdependent rather than independent, autonomous units (2) Relational ties (linkages) between actors are channels for transfer or 'flow' of resources (either material or nonmaterial) (3) Network models focusing on individuals view the network structural environment as providing opportunities for or constraints on individual action 17

(4) Network models conceptualize structure (social, economic, political, and so forth) as lasting patterns of relations among actors

Social network analysis not uses the individual as main unit of analysis, but an entity consisting of a collection of individuals and the ties between these individuals. Thus, the difference between a social network and a non-network explanation of a process is the encapsulation of concepts and information on relationships among the studied individuals.

2.1.3. Social network theory Social network analysis is a product of social network theory. Social network theory has a long tradition and rich history, and its usage is popular to researchers working across a broad range of disciplines. Social network theory finds its roots in social sciences, with influences of mathematical, statistical and computing methodology. The last decade a 'new' science of social networks has risen. It is noted in many real-world networks the number of network neighbors is identified by a power-law distribution. Typically, the degree distribution is right-skewed with a 'heavy tail', meaning that a majority of nodes have less-than-average degree and a small fraction of hubs are many times better connected than average (Watts, 2004). Thus a small percent of the network hubs is responsible for the majority of the activity in a network. These networks are called small-world networks, and their distinguished by the fact they are scale-free.

In a popular way Chris Anderson (2006) described the economic and social impact of the near-limited possibilities of the ubiquity of the internet, which he calls 'the long tail' effect. Now, niche markets can create big opportunities as the costs of search and distribution via the internet are reduced to a minimum.

2.1.4. Social capital theory Probably the largest domain of social network theory is called social capital theory. The main principle of social capital theory is social networks represent value. The social ties in networks can be conceived as pipelines for the flow of data, information and knowledge, and other resources. The structure of networks, thus the allocation of the social ties, is of importance for the value of the network.

A wide variety of constructs exist to define and measure the value of social networks. Here, three basic constructs are used, which are further discussed in the next chapter. Closure represents the allocation of ties within a social network, bridging represents the allocation of network ties to other social networks, and centrality represents the influence of the social network initiators.

18

2.2. Open Source software This subchapter deals with the Open Source phenomenon. Open Source can be perceived as a form of software distribution. First, its history is set out, where after Open Source software project contributors are mapped out. In respect to the organization of Open Source software projects, their communities are described, and used organizational metaphors are set out. Next, success studies and studies using social network perspective are discussed.

2.2.1. Open Source as a form of software distribution Although Open Source software just became popular in the last half of the 90s, simultaneously with the rise of the Internet, this type of software has a long history. Strictly, Open Source (OS) software is software released under a license conforming to the Open Source Definition. The Open Source Definition is derived from the Debian Free Software Guidelines and was composed by Bruce Perens for the purposes of the Open Source Initiative which he co-founded with Eric Raymond in 1998.

Already taken first preparations in 1983, in 1985 Richard Stallman published the GNU Manifesto to announce the development of a free called GNU ('GNU's Not Unix')(GNU, 2009). Not late after the publication of the GNU Manifesto Stallman started the Free Software Foundation, which is a non-profit organization for the purposes of spreading the free software philosophy. By this Foundation Stallman could employ free software developers and provide a legal infrastructure for its activities. The free software philosophy was based on the essence of the hacker culture. Despite of the current negative context in the newspapers, a hacker is actually a programmer enabling the computer to do what he wants, not to question if the computer wants to or not (Paul Graham, 2004). It was a common use for the earliest programmers to share any software that was developed. Everything was new, and programmers needed to learn a lot to improve their creations and gain knowledge. When hardware vendors began to dominate the software market it became the norm to distribute proprietary software in binary code at a significant cost. The free software philosophy was a reaction to this domination.

One of the first licensing terms was the concept of copyleft. Stallman made this form of licensing popular in 1985. In 1989, the Free Software Foundation introduced the GNU General Public License (GPL) which is based on a number of implemented copylefts and other free licenses. The GPL was the first program-independent license, and therefore could be used in many ways.

As of today, there are many Open Source licenses around, for example the MIT-license of the Massachusetts Institute of Technology, or the Acadamic Free License (AFL). Still, the GNU license, currently in its third version, is the most popular license used. The Open Source Initiative has

19

summarized the Open Source Definition into ten points which need to be satisfied to let software be marked as Open Source (OSD, 2009)

2.2.2. Open Source software project contributors Previously, Open Source was perceived as a gift economy (Raymond, 2000). In a gift economy volunteers contribute to an Open Source project without monetary benefits. Software developers use their spare time to participate in Open Source projects which are not related to their daily work. This way, their benefits are of intangible nature, as it helps them to do more creative and artistic work. However, the gift economy does not answer the question why large corporations and businesses are actively supporting Open Source software projects by contributing resources and programmers. Perens (2005) made an overview of common contributors to Open Source software projects. Hars and Ou (2001) have identified incentives of why developers participate in Open Source software projects. Appendix B provides a detailed overview of types of project contributors and their incentives to create Open Source software.

2.2.3. Open Source Software project communities Raymond (2000) describes the start and growth of an Open Source software project community. An individual, or a small group of individuals, create an initial functioning prototype of the software and put this in the public domain as Open Source software. Then, people gather around this prototype, all with their own reasons and goals, and work together to continue developing the software. Next, as the software becomes more usable, more people are attracted to the community and start developing as well. Now, a growth cycle has started which feeds both the community and the development of the software. Due to the growth of the community, it becomes more divers and sustainable, and so does the value of its software.

In general, the development and communications structure of an Open Source software project is provided by a hosting platform. For example a hosting platform as SourceForge or Open Source Flash has an extensive set of tools which enables the hosting and management of Open Source software projects. A wide variety of Open Source software projects is hosted on these platforms.

Various researchers have suggested a typology of projects to map out differences between these Open Source software projects. For example, from a software architectural perspective Raymond (2000) proposed a typology including three types of software, namely infrastructural software, middleware and application software. Another interesting perspective is taken into account by Ye et al. (2005) who propose a project typology based on the goals of the software developers and include (1) exploration-oriented projects, which attempt to create leading edge solutions which involve innovative approaches, (2) utility-oriented projects, which are directed towards filling a void

20

in functionality, and (3) services-oriented projects, which are geared to maintaining stable code and providing ongoing services to large group of stakeholders.

Similar, typologies have been proposed to differentiate between stages of development of Open Source software projects. Based on software writing Rothfuss (2002) identifies six categories of development statuses. This development lifecycle includes (1) planning, (2) pre-alpha, (3) alpha, (4) beta, (5) production/stable, and (6) mature. In practice, the SourceForge platform identifies an extra category for (0) inactive projects. Krishnamurthy (2002) and Capiluppi et al. (2003) found evidence the majority of projects hosted on the SourceForge platform are inactive, or never get through the initial development lifecycle stages.

2.2.4. The organization of Open Source software projects Various researchers and Open Source practitioners including Crowston and Howison (2004), Raymond (1998) and Almarzouq et al. (2005) have identified the community structure of an Open Source software project as an onion-like structure that is based on the level of contribution. The core of the group, in general the smallest subgroup of the community, is responsible for the majority of the code development and effort contribution (Krishnamurthy, 2002)(Crowston and Howison, 2006). The core is surrounded by co-developers, which, on occasion, modify or review code or fix bugs. The majority of the community is formed by users, who either can be active or passive users. Where passive users are only software users, and thus contribute nothing directly to the project, active users usually contribute ideas, suggestions and bug reports.

These passive users are often referred to as free-riders (Crowston and Howison, 2004) or lurkers or leechers. In other types of virtual communities, such as electronic communities of practice (Wasko and Faraj, 2000), sharing and distributing platforms (Nonnecke and Preece, 2001) this is considered to be a problem, as it negatively affects the outcome of a group. However, Perens (2005) states all Open Source users start out as free-riders. They download and try the software, and do not generally consider contributing to the software development. But, when the time comes they gain interest in the project or desire an additional feature, they might implement it themselves, and are not longer free-riders. Here, the negative effect is not a burden for the project community.

Not much is known about leadership within Open Source software project communities. Open Source software projects are generally perceived as democratically operating entities. However, it is generally noted project leadership is important to achieve success. (Raymond, 2000)

Metaphors have been used to provide insight in the organization of Open Source projects. Open Source initiator Raymond (1998) noticed Open Source software Projects can be perceived as communities of developers and introduced the concept of cathedrals and bazaars. Most proprietary, 21

commercial software is made in similar to how people build a cathedral. Designed by smart individuals (architects) or small isolated groups, with no beta releases in between. Open Source software is made similar to how bazaars work. Mixed approaches and agendas of many different people resulting in a coherent and stable system, with a lot of beta releases and testing in between.

Cox (1998) added two organizational metaphors, namely the town council and the clique. A clique is just like a ‘failing’ bazaar. What supposed to be functioning as a bazaar, ends up in a clique. Here, a lot of ‘wannabe’ programmers are polluting the transparency of the project and ideas of the core group by giving their opinions, instead of producing real code. As Cox says "They knew enough to know how it should be written but most of them couldn’t write 'hello world' in C."

The town council is an organizational metaphor with a better ratio of ‘wannabe’ programmers versus real programmers. The core design is strictly kept in hands of the real programmers where the ‘wannabe’ programmers can be of use to lesser important project tasks. When a wannabe programmer becomes stuck, the chance is high some of the other wannabe programmers know how to solve this problem. Functioning as a safety buffer, the noise in the project is turned into productivity.

2.2.5. Success studies Various researchers have identified factors affecting the success of Open Source software projects. However, no one has addressed the question of success factors for specific projects in systematic way (Hinds, 2008).

Despite Open Source has commonly been regarded as work produced by a community of developers, several studies conducted on the SourceForge platform indicated otherwise. Based on a study of 100 mature Open Source software products Krishnamurthy (2002) found most programs were developed by individuals, rather than communities. Next, most programs did not generate a lot of discussion on their projects' public forums. Products with more developers tend to be viewed and downloaded more often, and the number of developers associated with a project is unrelated to the age of the project. Finally, Krishnamurthy found the larger the size of a project, the smaller the percent of administrators.

In a similar way, Healy and Schussman (2003) found power-law distributions for Open Source software project activity measures, such as for the number of developers, number of downloads and number of site views. Also they found different types of projects dominated different types of these measures. Healy and Schussman concluded there is a real gap between the current state of theory and data, and cannot answer the question of project success affection.

22

Stewart and Ammeter (2002) conducted an exploratory study considering factors influencing the level of vitality and popularity of Open Source projects. Here, vitality is an indicator of how much developer effort and attention is expended on a project, and popularity is an indicator of how much user attention is focused on a project. Surprisingly, they concluded the vitality of an Open Source software project is not affected by development status, sponsorship, and type of project (project category and target audience).

Crowston and Howison (2004), Howison and Conklin (2005) and Crowston et al. (2006) noted in information research success is one of the most used dependent variables, however Open Source related research often fails to conceptualize this important concept. Crowston et al. (2006) provide a detailed overview of success measures used in recent Open Source related research. They distinguish four types of success measures, namely (1) system creation, (2) system quality, (3) system use and (4) system consequences, and opt for a portfolio approach of success measurement in respect to Open Source software projects.

2.2.6. Social network perspectives Just a few studies have used social network analysis to conduct research related to Open Source software projects and their communities. Even fewer studies have used a social network perspective as a framework for theory building, since most studies are of descriptive and exploratory nature.

Madey et al. (2002) use social network analysis to explore the Open Source software development phenomenon. 39,000 projects, involving 33,000 developers, hosted on the SourceForge platform were investigated. By structurally mapping out relationships between software developers, the presence of power-law distributions was found for project sizes, cluster sizes of connected developers, and the number of projects joined by developers. Madey et al. (2002) conclude Open Source software development can be modeled as self-organizing, collaboration, social networks.

Gao et al. (2003) explored the statistics and topological information of the Open Source software developers collaboration network further by extracting project evolution parameters, by inspecting the network over a time period of two years. 50,000 projects involving 80,000 developers were investigated. Again, power-law distributions were found for the cluster distribution and degree distribution, which is the amount of ties within a network. Also they found during this time frame the average degree of projects on SourceForge slightly increased, though the network diameter slightly decreased.

Xu et al. (2005) investigated the composition of the Open Source software community on the SourceForge platform and its collaboration mechanisms as well. Again power-law distribution were 23

found for social network properties and indicators of small-world networks and scale free behaviors were found. They also concluded weakly associated, but contributing, co-developers and active users may be an important success factor in respect to the development of Open Source software.

Crowston and Howison (2004) investigated the structure of 120 Open Source software projects hosted on SourceForge by measuring the interactions in the bug tracking systems of these projects. They found Open Source projects widely vary in their communications centralization, and suggested the onion-structures may be representative for the development structure of Open Source software projects, but are not representative for the communication structure.

2.3. Group processes and work teams This subchapter deals with the theory of group processes and work teams, in social science often referred to as group dynamics. Group dynamics focuses on the nature of groups - the variables governing their information and development, their structure, and their interrelationships with individuals, other groups, and the organizations within they exist (Greenberg and Baron, 2003). Here is focused on the effectiveness of groups, as the point of interest is on how groups may become successful. First the organization of group and work teams is set out. A team can be noted as a special type of group. After the discussion of virtual teams, research related to the success of groups and teams and research using a social network perspective is shown.

2.3.1. Organization of groups and work teams Much has been written about teams and work groups in an organizational setting. Social scientists have formally defined the definition of a group as collection of two or more interacting individuals with a stable pattern of relationships between them who share common goals and who perceive themselves as being a group (Greenberg and Baron, 2003). There is a wide variety of groups, and people can join groups for different reasons. Greenberg and Baron (2003) distinguish several basic types of groups. First, groups can be formal or informal. Formal groups include command and task groups, where informal groups include interest groups and friendship groups. Based on Maslow's need hierarchy, they distinguish four main reasons why people join groups, namely (1) to satisfy mutual interests, (2) to achieve security, (3) to fill social needs, and (4) to fill need for self-esteem.

Tuckman and Jensen (1977) have identified five stages that small groups go through during their development. These stages are (1) forming, (2) storming, (3) norming, (4) performing and (5) adjourning.

Greenberg and Baron (2003) provide an overview of four primary structural elements of groups, namely roles, norms, status and cohesiveness. Group members tend to play one, or more, specific roles in group interaction. A role is defined as the typical behaviors that characterize a person 24

in social context. Next, to properly function as a group, norms exist. Norms may be defined as generally agree upon informal rules that guide group members' behavior. Status is the relative social position or rank given to groups or group members by others. Finally, cohesiveness is the strength of group members' desires to remain part of their groups.

One of the major problems which may occur in groups is called social loafing, which is often referred to as 'free riding' of individuals. Group tasks, which involves the coordinated effort of multiple people, may be negatively affected by free riders. As the size of the group increases, group members may have the tendency to put less individual effort in the group task.

2.3.2. Teams Although sometimes used interchangeably in research, a team can be noted as a special kind of group. Greenberg and Baron (2003) use the definition of a team of Katzenbach and Smith (original 1993) who state a team is a group whose members have complementary skills and are committed to a common purpose or set of performance goals for which they hold themselves mutually accountable. The difference between groups and teams in respect to their performance can be summarized into four points.

(1) Performance of groups depends on individual contributions, where the performance of teams depends on individual contributions and collective work products. (2) Accountability for group outcomes rests on individual outcomes, where accountability for team outcomes rests on mutual outcomes. (3) Group members are interest in common goals, where team members are interested in common goals and commitment to purpose. (4) Groups are responsive to demands of management, where teams are responsive to self- imposed demands.

Tuckman and Jensen (1977) have categorized teams along five dimensions. The first dimension is called purpose or mission. Work teams are concerned with products or services, where improvement teams are concerned with improving the effectiveness of processes. The second dimension is time. Where temporary teams exist for a finite period, permanent teams remain intact as long as the organization is in existence. Third dimension is the degree of autonomy. Where in work groups leaders make decisions for group members, in self-managed work teams the team members are free to make their own decisions. The fourth dimension is authority structure. Where intact teams work within their own specialty area, cross-functional teams consists of members from different specialties. The final dimension is physical presence. Members of physical teams are physically present, where members of virtual teams meet via electronic means.

25

Hackman (1987) identified four stages which work teams go through when they develop, namely (1) prework, (2) creating performance conditions, (3) forming and building the team, and (4) providing ongoing assistance.

2.3.3. Virtual teams Information technology is providing the infrastructure necessary to support the development of new organization forms, including those of virtual teams. Scientists have defined virtual teams as groups of geographically, organizationally and/or time dispersed workers brought together by information and telecommunication technologies to accomplish one or more organizational tasks (Powell et al., 2004).

Strader et al. (1998) have identified four phases virtual organizations go through during their development. These four phases are (1) identification, (2) formation, (3) operation and (4) termination.

2.3.4. Success of groups and teams Much research about the effectiveness of groups and teams in social, behavioral and economical sciences do exist. Cohen and Bailey (1997) conducted a literature study of six years of research on teams and groups in organizational settings. As a result they made a general distinction between three dimensions of team effectiveness, thus success measures and indicators, namely (1) performance effectiveness assessed in terms of quantity and quality of outputs, (2) member attitudes, and (3) behavioral outcomes.

2.3.5. Social network perspectives Taken social network theory into account the organizational network structure affects team viability and team task performance (Balkundi and Harrison, 2006). Team viability involves the committed of team members to stay together, where team task performance involves attaining the goals of a team. Team viability and team task performance can be conceived as success measures of teams. Balkundi and Harrison (2006) noted although these two dimensions of success are conceptually distinct, in reality there is a close connection and cross-correlation between team task performance and team viability. In addition they found teams with central leaders, and teams central in a network full of other teams, tend to be better performers.

Carley (1995) compared hierarchies and centralized structures with democratic team or decentralized structures. She found hierarchies and centralized structures tend to exhibit lower performance than democratic or decentralized structures. Information loss, uncertainty absorption and information distortion are important causes. And, the more levels in the hierarchy the greater level of

26

information loss and distortion. However, Carley (1995) states for simple tasks simple structures as decentralized and team-like structures perform better. And, for more complex tasks, hierarchical and centralized structures perform better in the long run, although their response may be slower.

In respect to the organization of teams during (or software) development Yang and Tang (2004) researched 25 teams in a system analysis and design course. Four findings could be derived. (1) Group cohesion was positively related to overall performance, (2) group conflict indexes were not significantly correlated with overall performance, (3) group characteristics such as cohesion and conflict fluctuated in different team stages, though in later stages much less cohesion occurred and the advice network seemed to be very important. Finally, Yang and Tang concluded (4) that group structures seemed to be a critical factor for good performance.

2.4. Virtual communities Open Source related research and Open Source initiators have often described the individuals participating in Open Source software projects as members of online, or virtual, communities. Hummel and Lechner (2002) state the definition of a virtual community of Rheingold is probably best known. Rheingold (original 1993) defined virtual communities as social aggregations that emerge from the Net when enough people carry on those public discussion long enough, with sufficient human feeling, to form webs of personal relationships in cyberspace. From the view of computer- mediated communication, the most important elements of a virtual community are shared resources, common values and reciprocal behavior (Hummel and Lechner, 2002). Current research describe that virtual communities are in many respects similar to traditional, offline, communities.

Bughin and Hagel (2000) note that virtual communities are not simply communities, but are (becoming) a prominent business model of the world wide web, as these virtual communities can combine reach and selectivity based on user needs. In addition, they note virtual communities do not only fill a strategic niche, virtual communities also tend to have stronger operational performance than other business (to consumers) models. In respect to work teams De Souza and Preece (2004) identified several distinguishing characteristics of online communities.

- Many online communities exist mainly for social action as well as or rather than work. - Online communities can involve large groups. - Many online communities develop in an ad hoc way. - Schedules and timeliness tend not to be a focal issue for most online communities. - Participants in online communities are often widely distributed and may cross cultural and geographical divides. - Many online communities exist on the internet and are open to a wide variety of people. - The skills and knowledge of members may be very broad in some online communities.

According to Hinds (2008) Wenger (1998) identified two kind of communities, namely

27

communities of practice and communities of interest. Where communities of interest serve the purpose to inform interested people, communities of practice serve the purpose to create, expand and exchange knowledge between people, based on expertise or passion for the topic of interest. Fischer (2001) adds communities of interest bring together stakeholders from different communities of practice. Wasko and Faraj (2000) propose knowledge can be considered as a public good, owned and maintained by a community. When knowledge is considered a public good, knowledge exchange is motivated by moral obligation and community interest rather than by narrow self-interest.

Wenger (1998) identified five stages of development of communities of practices, namely (1) the potential stage, (2) the coalescing stage, (3) the active stage, (4) the dispersed stage, and (5) the memorable stage.

2.4.1. Success of virtual communities In general, only a few studies have focused on the success of virtual communities. Success of virtual communities can be defined in many ways. Leimeister et al. (2004) observe the differences between the definitions of success for stakeholder groups of virtual communities. Sangwan (2005) observes from ‘a uses and gratifications’ perspective, i.e. she uses member need satisfaction as a proxy measure for virtual community success. And, Lin et al. (2007) propose a research model which investigates sociability and usability factors to community success.

Crowston et al. (2004 and 2006) present an extensive set of measures that could be used to research the success of Open Source software projects. Hinds and Lee (2008) state it is plausible that success conditions are related to the community type of a virtual community and therefore can differ from each other.

2.4.2. Social network perspectives Very limited research related to virtual communities have taken a social network perspective into account. However, several social network theory related research and articles of communities of practice are published by Teigland, Schenkel and Wasko. For example, Schenkel et al. (2000) have defined five structural properties which characterize communities of practice.

(1) connectedness In a community of practice, every member is connected, directly or indirectly, to every other member (2) graph-theoretic distance Relative to organizational networks in general, communities of practice have shorter graph-theoretic distances between all pairs of members (3) density Relative to organizational networks in general, communities of practice have a greater density of ties (4) core/periphery structure Communities of practice have core/periphery structures rather than clique structures (5) coreness The greater an individual's participation in a community of practice, the greater his or her coreness 28

3. Conceptual model and propositions This chapter deals with the conceptual model and propositions. First the conceptual model is presented, where after the research constructs are briefly set out. Finally, the social network model and the propositions are discussed.

3.1. Conceptual model The conceptual model proposed is adapted from Hinds (2008) and is discussed below. Each part of the conceptual model is briefly explained.

Figure 1 - Conceptual model

Source: adapted from Hinds (2008)

Figure 1 shows the adapted conceptual model in which community social network structure may affect community success. Various factors may moderate this relationship. Next, community success may have an effect on community impact. And, external market factors may affect community impact. Though, community impact and market factors are beyond the research scope. Here, each box of the conceptual research model is further set out.

Community social network structure

Community social network structure is made up out of three constructs adapted from social network theory, respectively bridging, closure and leader centrality, as proposed in Chapter 2, and further set out in the next subchapter.

29

Community success

Success can be measured in various ways, and in the area of Open Source software projects it is not clear how to measure success exactly. However, Grewal et al. (2003) noted the criteria for success of Open Source software projects should encompass both the technical achievements of a project, as well as indicators of market or commercial success, to be in line with information systems literature and R&D product development literature. Hinds (2008) sets out the concept of community success as a model consisting of two dimensions, namely success as output, and success as activity. Where success as output can be perceived as the quantity of produced software, success as activity can be perceived as the quantity of participation of community members. Thus, community success can be reflected as the effort of, and the performance of, an Open Source software project group. The effort of project members is reflected in success as output, and the performance of a group is reflected as success as activity.

As shown in Figure 1 the reciprocal relationship between success as output and success as activity is based on the assumption that producing more software will lead to greater community participation. And, increased community participation will tend to attract even more developers to produce more software. Another assumption is community activity can be representative for software product quality, since it is plausible software of high quality can generate a higher level of community activity than software of low quality.

Due to the fact, this research works with total numbers of success, these success factors need to be controlled for various other factors. E.g. it is plausible the age of a project may affect the amount of output of a project, simply because of the age, and not due to its project structure. Therefore, this factor needs to be controlled. In the next chapter is further explained how success factors and their control mechanisms are applied to this research.

Moderating factors

Hinds (2008) recognizes various factors which may mediate the relationship between social network structure and its success, including project type, project maturity, process/task structure, community norms, and organizational environment, among others. This research recognizes these factors in well. However, here is suggested these factors are moderating factors, rather than mediating factors. According to Baron and Kenny (1986) a moderating variable is one that influences the strength of a relationship between two other variables, whereas a mediating variable is a variable that explains the relationship between two other variables.

Due to a wide variety of Open Source software projects, it is impossible to take each moderating factor in account. Here, several plausible moderating factors are recognized. In 30

succession, these factors are filtered from the research sample, by selecting on various characteristics, to attempt to obtain a homogenous sample of Open Source software projects. Therefore the moderating factors are beyond the scope of the research in the conceptual research model. In the next chapter is discussed in which way these moderating factors are further taken into account.

Community impact and market factors

Community success is not only affected by community output and community activity, community impact and market factors can influence community success as well. An example of an influencing market factor could be the bankruptcy of a competitor. As a result users could eventually switch to your products. Market factors, as the incorporation of your product intro a broader infrastructure, for example providing the option to automatically install Open Office, when installing the Sun Solaris Operation System. These factors are beyond the scope of this research and therefore not included.

3.2. Research constructs The introduced social network concepts are further set out as six social network constructs, namely group closure, core closure, peripheral two-mode closure, core bridging and administrator bridging, and administrator centrality, which are briefly discussed here. The 'group' concept is constructed via three subgroups, namely the administrator subgroup, core subgroup and the peripheral subgroup. Each subgroup represents a type of developer which can be distinguished in an Open Source software development community. But, first Hinds' (2008) development framework for the social network constructs is given, where after the subgroup and research constructs are discussed.

31

Figure 2 - Development framework for social network constructs

Source: Hinds (2008)

The development framework for social network constructs (Figure 2) provides an overview of the involved subgroups for each social network construct.

3.2.1. Subgroups In studies of communities in general and Open Source software project communities as well, subgroups can be distinguished. Here, three important subgroups are identified. Contradicting to teams, Open Source communities do have cores and peripheries. In general, leaders and high-active members form the core, and ordinary members form the periphery. For example, in online gaming communities such as World of Warcraft guilds exist. A guild is a group of gamers who organize parties with each other to explore dungeons and other areas of this virtual world. The people of these guilds can generally be divided into three types of gamers. There is a guild leader, who is the (democratic) leader. High active members are called officers, and are responsible for daily management, organizing parties and recruiting new members. Ordinary members, contribute less,

32

though they participate in parties and gather materials such as food and potions to play the game party-wise.

Below, each subgroup is described.

Administrators

Administrators are project leaders and take responsibility for monitoring and guiding the progress of the project, and who are recognized as such by group members.

Core developers

Core developers are lead developers who take responsibility for developing the project, and who are recognized as such by group members. Core tasks includes functional and technical spec(ifications) writing, programming and testing. Thus, this type of developers has a significant influence in the development process of the Open Source software. Administrators, by definition, can be noted as core developers as well. Sometimes, the administrator subgroup and the core subgroup are taken together and referred to as the 'core group'.

Peripheral developers

All other individuals who are somehow involved in the project and who either contributed source code or have posted on the project forum or are actively involved in making bug reports, alpha and beta testing, posting requests, contribution of source code, etcetera, are referred to as peripheral developers. Although, in general peripheral developers are less involved in the project, their contribution may not be underestimated.

In chapter 4 is further explained in which way these subgroups are applied to this research.

3.2.2. Closure Closure, also sometimes called coreness, is the extent to which the individuals in a project community are connected with each other. High closure, a tight set of relationships, within a community will improve the utilization of resources, and will create cohesive (sub)groups and support shared norms and trust (Burt, 2000). On the contrary, high closure can result in group thinking. During decision making processes group members are trying to minimize conflict and reaching consensus without critically testing, analyzing and evaluating ideas. A classic example of group thinking is the Challenger space shuttle launch decision, which tragically exploded after 73 seconds after launch of its tenth mission (Weick, 1997). Although, the computer systems knew the Challenger should not be launched under conditions far about outside the experience base, NASA still launched.

33

By three elements of structural design, namely an enacted work group culture, institutionalized organization, and a dispersion of information, group thinking occurred.

Closure is measured by 'density', which is defined as the total number of observed relationships of an individual or subgroup divided by the total number of possible ties. The closure concept can be measured in different ways. Here, three useful types of closure are investigated.

In social network analysis 'mode' is defined as a distinct set of entities on which the structural variables are measured. In a one-mode network all actors come from one set of actors, where in a two- mode network the network data set contains two set of actors.

Group closure

Group closure is defined as the closure of the social network of informal ties within the total project community. Thus, all existing relationships between administrators, core developers and peripheral developers are divided by the total number of possible relationships.

Group closure is a one-mode network as all subgroup participants are taken together and being dealt with as one group.

Core closure

Core closure is similarly constructed as group closure, but only the core group is taken into account. Thus, all existing relationships between administrators and core developers (together known as the 'core group) divided by the total number of possible relationships.

Core closure is a one-mode network as well, though the peripheral subgroup is not taken into account. The subgroup of administrators and the subgroup of core developers are being dealt with as one group.

Peripheral two-mode closure

Two-mode closure is defined to only take into account the ties between members of one subgroup and the other members of the community. Ties which are internal to either the subgroup or internal to the group of other community members are excluded. Thus, peripheral two-mode closure is defined as the possible ties between the peripheral subgroup and the rest of the community (core group).

Peripheral two-mode closure is, as the name already states, a two-mode network. The subgroup of administrators and the subgroup of core developers are taken into account as being a single subgroup. The peripheral developers are taken into account as another single subgroup. This

34

way, the network consists of two sets of actors of which the relations between those two sets are taken into account. Peripheral two-mode closure could also be named core two-mode closure, as this is identical, but the perspective of the other subgroup is taken into account.

3.2.3. Bridging Bridging ties are connections of social network members to other social networks. Extensive bridging ties will result in a better facilitation of access to resources such as information and knowledge, people and ideas, which positively can influence the networks’ success.

Granovetter (1983) noticed the strength of weak ties; group members who have weak internal relationships with other group members, but do have strong outgoing ties with members of other groups. It is possible these group members contribute less than other members to the group, though their external resources can positively influence the success of the group. For example, when recruiting new group members, the part of the group with strong internal ties may just know a few outsiders and are very familiar with each other. The members with weak ties within, but strong ties outside the group, may know suitable new members, who nobody else could have reached.

Membership bridging is the extent to which community members are connected to other Open Source software project communities through the membership network. These connections are called bridging ties, or sometimes referred to as outgoing ties or outbound ties, and this is measured by average core affiliation degree. Average core affiliation degree is the total number of other Open Source software project communities in which a core member is affiliated as a participant, averaged over all core developer members of the focal community.

Here, two types of bridging are taken into account, namely core bridging and administrator bridging which are discussed below. Peripheral bridging is not taken into account. As the contribution as developers of the project group of peripheral members is relatively limited, the assumption is the effect of those bridging ties is relatively limited as well.

Core Bridging

Core bridging is defined as the extensiveness of ties between members of the core group, thus the administrator and core subgroups combined, and members of other project communities, though excluding members who are already part of the initial project group.

Administrator Bridging

35

Administrator bridging is defined as the extensiveness of ties between the administrator subgroup and members of other project communities, though excluding members who are already part of the initial project group.

3.2.4. Leader Centrality Leader centrality is defined as the extent to which a project leader is the spill, related to the internal information flows, to the project group. Based on Everett and Borgatti (1999) the core/periphery structure and the influence of leadership and leader centrality is further explored.

Administrator Centrality

Administrator centrality is defined as the centrality of the administrator (or administrator subgroup) with respect to the total project community.

3.2.5. Constructs overview Below, an overview of the relevant social network constructs is provided in Table 1. The definition of the construct is briefly presented and the relevant subgroups are shown as well.

Table 1 - Social network constructs overview

Construct Definition Relevant subgroup Group Closure Extent (density) of informal ties considering all None possible connections between members of the project community Core Closure Extent (density) of informal ties considering Core subgroup only the possible connections between members of the core developer subgroup, excluding all other possible ties Peripheral Two-Mode Extent (density) of informal ties considering Peripheral subgroup Closure only the possible connection between Core subgroup peripheral subgroup members and the rest of the project community, and excluding all other ties Core Bridging Extent of bridging ties, considering Core subgroup connections between members of the core subgroup and members of other project communities Administrator Bridging Extent of bridging ties, considering Administrator subgroup connections between members of the administrator subgroup and members of other project communities Administrator Centrality Central network position of the administrator Administrator subgroup or administrator subgroup in relation to the remainder of the project community Source: Hinds (2008)

36

3.3. Social network model and propositions Here the social network model of community success is proposed in Figure 3. On the left side the six social network constructs are visualized. On the right side community success is presented. These six constructs all possibly have a relationship with community success, and thus result in six propositions. Below these propositions are discussed along the theory which was presented in the previous chapter.

Figure 3 - Social network model of community success

Source: Hinds (2008)

3.3.1. Group closure As group closure is the group 'tightness', it is expected group closure is of influence on the success of an OSSP community, but only up to a certain point. The digital infrastructure enables the benefit not all one-to-one relationships in a network are necessary. One can easily ask a question to someone else if the person you actually needed did not respond. Though a certain level of interaction, thus relationships, is necessary to fulfill the tasks of a project group such as bug fixing, testing and coordination and distribution of work and tasks. Maintaining relationships with others has a certain cost. One needs to invest at least a certain amount of time. Due to the nature of OSSPs most developers are volunteers and have limited resources. Therefore it is not smart to maintain all relationships in a network. It is more efficient to focus on the most interesting ones and communicate via the forums, etcetera, to others. For a group as a whole, it is similar. Too many ties increases slack, and then it becomes a burden. Therefore, the proposition is the following:

37

Proposition 1

Group closure of an OSSP community has an inverted-U relationship with community success. Community success is maximized at a moderate level of group closure.

3.3.2. Core closure It is expected the effect of core closure is similar to that of group closure. The only difference is the peripheral subgroup is not taken into account. Given the higher activity of the core and the administrator subgroups, core closure should be higher than group closure, though the shape of the effect itself is similar.

Proposition 2

Core closure of an OSSP community has an inverted-U relationship with community success. Community success is maximized at a moderate level of core closure.

3.3.3. Peripheral two-mode closure It is expected the effect of peripheral two-mode closure is of a similar shape as core closure and group closure. The more involved the peripheral subgroup feels, the better their performance may be, though to a limit. If it costs too much for the core group to maintain the relationships with the peripheral subgroup, it may decreases performance because the priorities of developing software shift to maintaining contacts, etcetera.

Proposition 3

Peripheral two-mode closure of an OSSP community has an inverted-U relationship with community success. Community success is maximized at a moderate level of peripheral two-mode closure.

3.3.4. Core bridging It is expected the effect of core bridging is mainly positive. By having a certain amount of outgoing ties, the project group can benefit from their external resources by using their information and knowledge, etcetera. Although too many outgoing ties can have a negative effect on the performance of a group member, it is just limited to a single group member and is just a minimal effect on the group as a whole. Therefore, the core bridging effect is expected to be positive, rather than having an inverted-U shape.

38

Proposition 4

The core bridging extent of an OSSP community is positively related with community success.

3.3.5. Administrator bridging It is expected administrator bridging has as similar effect as core bridging. Though, in a group it is expected an administrator, or leader of a group, has more influence and effect within the group than the other core members. Therefore it may be the administrator bridging effect is larger than the core bridging effect.

Proposition 5

The administrator bridging extent of an OSSP community is positively related with community success.

3.3.6. Administrator centrality For project leaders, a certain level of centrality is needed to actually be able to manage a project, though to a certain point. The project needs to be managed by coordinating work and distributing tasks. However, a lack of centrality may lead to a chaotically managed project and therefore negatively affect group performance. On the other side, if peripheral and core developers need to maintain too much contact with the administrators of the group, it may be those subgroups feel less committed with the group, which may negatively affect group performance as well.

Proposition 6

The administrator centrality of an OSSP community has an inverted-U relationship with community success. Community success is maximized at a moderate level of administrator centrality.

39

4. Research methodology This chapter deals with the research methodology. First, the study design is briefly set out, where after the research setting is discussed. Then a subchapter is dedicated to the research variables in which the dependent and control variables, and the independent social network construct variables are discussed. Thereafter a closer look is taken at the SourceForge.net population. Finally, a sample strategy is set out.

4.1. Study design A study design is chosen in which data are collected via several sources from a sample of a platform of Open Source software project communities. First, the unit of analysis, the study population and the research method are set out.

4.1.1. Unit of analysis While the sample of a platform of Open Source software project communities can be noted as a single community, it is not the research focus. Here, the main unit of analysis is the Open Source software project community, and not the platform enabling these communities. Thus each Open Source software project can be noted as having a single community. This study defines an 'Open Source software project community' as all individuals who are contributing to a certain project group which is developing Open Source software.

4.1.2. Study population As a wide variety of Open Source software projects exist, a study population of similar like projects is defined. The project selection criteria are discussed in chapter 4.5.1. If a total random sample of Open Source software project communities would be taken, it would be difficult to measure the effect, if any, of structure on community success. And, it does not provide the necessary accuracy and reliability needed for this research.

Comparing to Hinds (2008) a main difference is Hinds uses a timeframe of two years of project development, where this research mainly uses a snapshot of only one month of data due to Master Thesis limitations. Hinds (2008) uses so-called 'early stage' projects which have only two years of history following their first release of executable software. This probably results in a more homogenous and similar group of projects which has obviously several consequences for the research judgments. This is further set out in chapter 4.5.1 where the sample selection procedure is discussed.

4.1.3. Research method According to Dul and Hak (2008) the is the best approach for testing propositions with probabilistic relations. Due to the exploratory stage Open Source research still is in, and the wide 40

variety and complexity of Open Source software projects and their communities, 'laboratory settings' are unknown. Therefore, a quasi-survey case study is conducted. First an Open Source software development platform is selected, where after a sample of project groups is selected. Although it is known random sampling in any survey is very important, it is not feasible due to the mentioned complexity and wide variety of Open Source software project communities. To overcome this problem a quasi-random sample, which can be treated as a 'sub-population', is extracted from the entire population of Open Source software project communities on the Open Source software development platform, which will be further discussed in Chapter 4.5.

Another important fact is the 'survey' is not setup and conducted by the author of this research. Rather than independently collecting data of project communities, chosen is to use existing data archives as primary data sources. Although, independently gathering data has clear advantages such as specified, detailed and suitable data, the Master Thesis timeframe does not allow this option due to the extensive work of labor involved.

The use of existing data records enables the investigation of a large number of project communities, which is not possible with the former option. Another advantage is the data used are not based on perceptions of the individuals being researched, but are based on real activity of project communities. In addition, the data used, are used by various other researchers, which have proven the accuracy and reliability of these data.

4.2. Research setting The SourceForge.net platform is used as research setting. The SourceForge.net website (short: SF or SourceForge) is the worlds' largest hosting platform for Open Source software development communities. The SourceForge platform has an extensive set of tools which enables the hosting and management of Open Source software projects. Registration and making use of the platform are free. A brief history of the SourceForge.net platform is provided in Appendix C.

Other Open Source hosting platforms could have been chosen such as Freshmeat, Savannah, Open Source Flash, or one of many others [*.1]. Main reason for choosing SourceForge is, as it is not only the largest platform, it is also the platform of which most data is available for researching purposes. Another advantage of choosing for a single platform is the independent software project communities may have shared common norms and values, as they have chosen for SourceForge and not for another development platform.

4.2.1. Data sources All data are coming from the SourceForge.net hosting platform. However, three main data sources are used, namely (1) the SourceForge.net website, (2) The SourceForge Research Data 41

Archive and (3) FLOSSmole. Here, these three data sources are briefly set out. As these three data sources have large archives of historical data, chosen is to use data which are coming from May 2008, any exceptions clearly marked. This means data are gathered just before, or during, May 31, 2008. Similar to a bookkeeping system, May 31, 2008 is identical to June, 1st, 2008.

(1) SourceForge.net website

First, data directly from SourceForge.net Open Source software project community sites or pages are gathered. Screenshots of the SourceForge website are shown in Appendix E. One of the statistics pages, of which data are gathered is shown as well.

(2) SourceForge Research Data Archive

Secondly, data of Open Source software project communities is subtracted from the SourceForge Research Data Archive (SRDA) which is managed by the University of Notre Dame. Every month, the SRDA is spidering the SourceForge.net website which results in all kinds of information about project development statuses, members and forums, etcetera. In total, more than 100 different variables are collected. Every month roughly 25 Gigabyte (GB) of data are added to the already over 600GB data archive. For the purpose of this research the May 2008 data dump, called sf0508, is used. This data dump consists of 78 tables and has more than 100 relations between those tables. An overview of the Entity-Relationship Diagram of the May 2008 data dump is provided in Appendix F. An overview of core tables is given below.

Table 2 - Explanation of SDRA core tables

Name Core Table Explanation Groups Information about the project (user) groups Users Personal information about the users of SourceForge Artifact Information about project ‘artifacts’. An artifact is any object taken as a whole which can aid in the recovery of the source code. Doc Information about the documentation within a project Forum Information about the discussion forum within a project Frs Information about the File Release System of a project (e.g. Final releases, beta versions, updates) Job Information about the jobs of users Task Information about project tasks

The SRDA is based on the techniques of PostgreSQL, which is a powerful Open Source relational database system. It is not possible to simply navigate within this huge database. A query, a short piece of programming code, is needed to subtract data from the database. In Appendix G the most important used queries are presented.

42

(3)FLOSSmole Third source of data is FLOSSmole (or OSSmole). FLOSS stands for Free/Libre/Open Source Software. Not everyone is simply allowed to spider Open Source community platforms. Security reasons, heavy traffic generation and bandwidth problems may be the causes of this limitation. In this case FLOSSmole can offer a solution. FLOSSmole, especially funded for research purposes, is a collaborative collection of Open Source project data including platforms such as SourceForge.net, Freshmeat, Rubyforge and Objectweb. FLOSSmole aims to provide raw data and summary reports about Open Source projects, as well as to integrate data from other researchers. And, FLOSSmole provides some tools to individually gather data. Not all data can be easily subtracted from the SRDA, and in these cases FLOSSmole may provide the correct data.

As soon as possible, after every month the spider activities are bundled and uploaded to the FLOSSmole website. Therefore FLOSSmole data of June 2008 is used, since it matches the SRDA May 2008 data. The FLOSSmole June 2008 SourceForge collection consists of 19 separate files, which are provided in Appendix H.

4.3. Variables Here, first the dependent and control variables are discussed, where after the social network variables are set out.

4.3.1. Dependent and control variables First, the dependent variable community success is set out as several variables of the two proposed success dimensions, success as activity and success as activity. Then, several control variables are presented to control for the mediating effect between the independent and the dependent variables.

4.3.2. Community success Here, the community success variables are divided into the two proposed dimensions. First, the community output variables are discussed, and thereafter the community activity variables. Finally, one variable is a 'mixed' variable, and is therefore separately discussed.

When an administrator starts a project several tools and statistics of the project are measured. Here, several of these measures are used to measure success of an Open Source software project community.

43

(1) Community output variables

The community output variables include the number of trackers opened and the number of web hits generated.

Trackers opened

The tracker is a useful tool to keep track of the work which is done and which is needed to be done. Administrators and core leaders who are in charge can open trackers to provide work orders. Work orders can include rewriting code, adding a feature, or fixing a bug, etcetera. The number of trackers opened could be representative for the amount of work which is done by the project group, and therefore this variable can be noted as a success as output measure. The data of the number of trackers opened comes from the SourceForge website. Previous researchers who have used the number of trackers opened include Healy and Schusmann (2003) and Crowston et al. (2004).

Web hits

The number of web hits is the number of all hits the website of the project group receives. Each time a website visitor requests something, such as a page or a picture, from the server a hit is counted.

(2) Community activity variables

The community activity variables include the number of trackers closed, the number of software downloads, the number of page views and the number of web hits.

Trackers closed

As trackers are opened, they need to be closed when the work is done. If the work is not done, or is never finished, the trackers could stay open forever. Therefore the number of trackers closed may be representative for the activity of a project community, and thus can be noted as a success as activity variable. The data of the number of trackers closed comes from the SourceForge website. Previous researchers who have used the number of trackers opened include Healy and Schusmann (2003) and Crowston et al. (2004).

Software downloads

The number of software downloads is the total number of software releases which are downloaded by people interested in these projects. The data of the number of software downloads comes from the SRDA. Previous researchers who have used the number of software downloads include Healy and Schusmann (2003) and Crowston et al. (2004) and Krishnamurthy (2002).

44

Page views

The number of page views is the number of times a project website is viewed. All the views of all project pages are accumulated. A project website may include a homepage with information about the project, a statistic page, several tool pages, and communication pages such as one or multiple forums. The number of page views slightly differs from the number of web hits; when a visitor visits the project website his IP-address is logged and a 'session ID' is generated to identify the visitor. Now, when a visitor first clicks on page 1, then surfs to page 2, and back again to page 1, this count as three clicks (web hits), but only count as two page views, as the pages visited per session are only counted once. The data of the number of software downloads comes from the SRDA. Previous researchers who have used the number of software downloads include Healy and Schusmann (2003).

(3) Mixed variable

This research takes the variable project rank into account, which is a combined variable of community success as output and community success as activity variable, as it is made up from several individual community success variables.

Project rank

Project rank is a measure introduced by the SourceForge platform. Project rank is made up from three measures, namely the amount of traffic, the amount of development, and the amount of communication of a project group. The exact project rank formula can be found in Appendix M. The project rank is an indication of public interest in, and activity of, a project. This way the SourceForge community can see which projects are ‘hot’ and which are ‘not’. Here, project rank is used as a measure of success. The data of the number of software downloads comes from FLOSSmole. No other research is known which have included project rank as a measure of success.

4.3.3. Controls Studies of teams and work group performance have identified the size of a group may affect the effectiveness of that group. Therefore, it is plausible to expect this effect as well in the area of Open Source software projects. In addition, the size of a group may affect the social network structure of that particular group as well. To mediate for these effects, several control variables are introduce, which are discussed below.

Group size

From social studies it is known group size may negatively affect the performance of a group. It is plausible this counts for Open Source project communities as well.

45

Core size

Similar to group size, as the core includes the most active developers, it is plausible core size may negatively affect the performance of a group as well.

Conversation volume

It is plausible the performance of a group is affected by the total conversation volume on the projects' public forums, rather than the structure of a group.

Number of threads

Similar, to conversation volume, the number of threads may affect the performance of a group, rather than the structure of a group. A thread is a line of discussion on the projects’ public forum.

Project age

It is plausible the age of a project may affect the success of an Open Source software project. The longer a project exists, the more widespread this project may be. Next, the first-mover advantage, thus being the first with a new software program, or new feature, may affect the success of a community.

4.3.4. Social network definitions Here, the network and subgroup concepts are applied to the SourceForge research setting. And, the social network variables are defined. First, two network definitions are necessary to make use of the proposed network constructs, namely the conversational network and the project membership network.

Conversational network

A SourceForge project community has many ways to communicate. Unfortunately it is not possible to include all forms of communication such as emails, instant messaging, virtual conferences, and etcetera. Most of these forms of communication are not archived and therefore difficult to investigate. Next, the nature of Open Source software communities is to be open and transparent. Therefore, the project's public forums are taken into account as the investigated conversational network. On the public forums registered members can post messages and react on each other, resulting in forum threads. Registered members form the nodes of the conversational network. And, when two members of a project group post in the same thread, a relationship between those two members, or a link between those two nodes, exists.

46

Project membership network

As the conversational network is representative for the internal structure of a project group, the project membership network represents the connections of group members to members of other groups. Although SourceForge does not keep track of data suitable for measuring the project membership network, an alternative is found in 'cross-membership status'. A registered developer on the SourceForge platform can be part of multiple Open Source software project communities. If a registered developer is member of project a and b, and the investigated project is a, it is plausible project a has an external relationship with project b, due to the developers' dual membership.

This type of network consists of two types of nodes. First, the project member is defined as a member node. Second, there is a project node. The tie between the member node and its project node represents the internal tie. The tie between the member node and an external project represents the external tie.

4.3.5. Subgroups In chapter 3, three relevant subgroups were distinguished, namely administrators, core developers and peripheral developers. Here, is discussed how these subgroups are measured.

A project member is considered to be an administrator when this individual is formally registered on SourceForge as being (one of the) administrator(s) of a particular project. In the SourceForge Research Data Archive these individuals are marked with an ‘A’. A project member is considered to be a core developer when this individual is formally registered on SourceForge as being formally registered with this project. A project member is considered to be a peripheral developer when this individual has posted a message on the project public forums.

By definition an administrator is a core developer as well. When the administrator subgroup and core subgroup are taken together, this is referred to as the core group. The core group and the peripheral developer subgroup are mutually exclusive. All individuals need to have posted at least one message on the projects’ public forum to be noted as active members.

4.3.6. Social network variables Here the social network variables, derived from the social network constructs, are set out. The closure constructs are measured by density variables. The bridging constructs are measured by membership degree, and the centrality construct is measured by class centrality. All constructs are based on social network concepts as set out by Wasserman and Faust (1994), except class centrality which is referred from Everett and Borgatti (1999).

47

Group density

Group density (GD) is the density of the total conversation network. The total conversation network is a one-mode network of group members (thus all three subgroups) of a project community and the relation is forum conversation.

Core density

Core density (CD) is the density of the core conversation network. The core conversation network is a one-mode network of core subgroup members of a project community and the relation is forum conversation.

Peripheral two-mode density

Peripheral two-mode density (PTD) is the density of the periphery-core conversation network. The periphery-core conversation network is a two-mode network where the mode-1 actors are members of the peripheral subgroup and the mode-2 actors are members of the core subgroup. Again, the relation is forum conversation. Forum conversation is only defined for pairs containing one core member and one peripheral member.

Core membership degree

Core membership degree (CMD) is the mean nodal degree for all actors in the 'core project membership network', which is an 'affiliation network' where the actors are core subgroup members of a project community, the events are SourceForge OSSPs, and the relation is project membership.

Administrator membership degree

Administrator membership degree (AMD) is the mean nodal degree for all actors in the 'administrator project membership network', which is an 'affiliation network' where the actors are administrator subgroup members of the focal project community, the events are SourceForge OSSPs, and the relation is project membership.

Administrator class centrality

Administrator class centrality (ACC) is the standardized actor degree centrality of the super- node in the 'administrator-other conversation network', which is a special type of two-mode network (Everett and Borgatti 1999). Here, the administrator subgroup members are represented as a single mode-1 'super-actor', the mode-2 actors are the other members of the project community, and the 48

relation is forum conversation which is only defined for pairs containing the single super-actor and a mode-2 actor.

4.4. SourceForge.net population Before a sample is made, a closer look is taken at the SourceForge.net population. Here, a selection is presented of information about the SourceForge population. More information, such as used programming language and intended audience can be found in Appendix I. Due to the exploratory nature of this research information presented in the appendix may provide leads for future research. As of May 2008 the three data archives show there are 153,843 projects listed on SourceForge (Source: FLOSSmole, file [17]). This is 12 percent less projects than SourceForge advertises with on their website. This difference occurs as duplicate projects are only counted once in the data archives, in contrast to the SourceForge platform.

On the SourceForge platform projects are categorized in several groups including Development (41,923), Games/Entertainment (25,259), Multimedia (22,058), Database (9,605) and Networking (6,715), among others. These categories are then again split further into several subcategories of interest. As an example an overview is provided for the Games/Entertainment section in Table 3.

Table 3 - Overview of the SF Games/Entertainment section

Subsection Number of projects Board Games 2,001 Card Games 684 Console-Based Games 518 First Person Shooters 1,509 Multi-User Dungeons (MUD) 1,781 Puzzle Games 1,484 Real Time Strategy 1,372 Role-Playing 4,355 Side-Scrolling / Arcade Games 1,704 Simulation 2,344 Turn Based Strategy 1,680 Further Uncategorized 5,827 Games/Entertainment section 25,259 projects Source: SourceForge.net, consulted April 10, 2008

Although the Games section states it hosts 25,259 projects, in fact it just hosts 15,812 projects which counts for 9 percent of all hosted projects. The reason for this difference is that a project can be put in more than one category simultaneously.

49

Development status SourceForge.net places projects into six categories based on their stage of product development. These six categories are (1) Planning, (2) Pre-Alpha, (3) Alpha, (4) Beta, (5) Production/Stable, and (6) Mature. Next to these six categories there is one additional category for (0) inactive projects. The stage of development is indicated by the administrators of the project groups themselves. 124,182 (80.7%) projects have filled in a stage of development. 11,582 (7.5%) projects have multiple development statuses, resulting in a total of 135,764 statuses. Reason for categorizing a project group into multiple development stages could be a project group can have multiple pieces of software in development. Or, a single piece of software suitable for multiple Operating Systems, results in several software versions. An overview of the development statuses of SourceForge Open Source software project groups is shown below in Table 4.

Table 4 - SourceForge projects development statuses

Development Status Frequency Percent Cumulative percent (1) Planning 27,360 20.2 20.2 (2) Pre-Alpha 21,549 15.9 36.0 (3) Alpha 23,439 17.3 53.3 (4) Beta 31,425 23.1 76.4 (5) Production/Stable 26,055 19.2 95.6 (6) Mature 2,319 1.7 97.3 (0) Inactive 3,617 2.7 100.0

Total 135,764 100.0 Source: FLOSSmole, file [4]

Roughly all projects are equally distributed between the first five statuses of development. Only 2.7% of all projects with insight in the development status are inactive. Just a very small portion of all projects has a mature status, though this research focuses on projects in that particular stage of development. This is in line with Krishnamurthy (2002) who expected these products have the best chance to build a community around them.

Developers per project The last 60 days previous to June 2008 a total number of 210,216 developers were actively involved in project participation. Another 85,532 (40.7%) duplicates were found, which means a large portion of developers is active in more than one project group. In Table 5 the number of developers per project is presented.

50

Table 5 - Developers per project

Developers per Project Frequency Percent Cumulative Percent 0 761 0.5 0.5 1 105,660 68.7 69.2 2-5 39,801 25.9 95.1 6-10 5,023 3.3 98.4 11 or more 2,240 1.5 99.8 Missing values 358 0.2 100.0

Total 153,843 100.0 Source: FLOSSmole, file [15]

The mean number of developers is 1.93 (Standard error of mean .009). The median and mode are both 1. The standard deviation is 3.478 (Variance 12.094). The maximum number of developers is 381. There are 295,569 developers for 153,485 projects (excluding 358 missing values).

Forum Posts

146,270 (95.1%) of the project groups have a total number of 541,099 forums installed. The sum of the forum posts shown in Table 6 are totaled per project group, thus when a project group has multiple forums the number of forum posts is cumulated. In two months SourceForge got 95,065 forum posts, which is a mean of 0.62 (Standard error of mean .066) per project group.

Table 6 - Forum posts: sum of 60 days previous to June 2008

Forum Posts Frequency Percent Cumulative percent 0 150,016 97.7 97.7 1 - 6 2,419 1.6 99.3 7 - 12 371 0.2 99.5 13 - 60 541 0.4 99.8 61 and more 233 0.2 100.0

Total 153,580 100.0 Source: FLOSSmole, file [11]

Noticeable, the majority of all project groups had not got any forum message in the last two months. The median and mode are both 0. The standard deviation is 25.987 (variance 675.341). Only 129 projects have more than 100 forum posts in the last 60 days. Of these projects only 16 have more than 1,000 forum posts, of which just 3 have more than 2,000 forum posts. The highest number of forum posts is 6,311, which is an average of 105.2 forum posts per day.

4.5. SourceForge.net sample strategy Social network analysis rarely makes use of samples. Social network analysis focuses on relationships between individuals, and therefore these individuals cannot be sampled independently to

51

be included as observations. For example, if an individual is selected, all other individuals who have a relationship with this individual need to be included in the sample as well. Thus, network methods are likely to study populations as a whole, rather than using samples. Though, examining the whole SourceForge population in detail is rather complicated and time consuming. Hanneman and Riddle (2005) noted a sample strategy, called full network data collecting method, which is used here to collect a sample from the population.

With the full network data collecting method, information is collected of all ties between all nodes in an entire population. Several social network analyzing tools can only be used with a full network approach, which makes the full network method a powerful approach. Disadvantages of this approach are that it can be difficult to collect data and it can be very expensive.

Although it is impossible to investigate the entire SourceForge population, a subpopulation can be selected, after which this subpopulation is treated as if it is an entire population, thus every project group from this subpopulation is researched. This selection process is set out in the next paragraph.

4.5.1. SourceForge sample selection procedure Here, the sample selection procedure is set out. To measure, and compare, the success of Open Source software development communities, similar project groups need to be selected. As the project public forums are used as main source for researching social network structure characteristics, this is the first sample selection criterion.

Criterion: Forums

Of the 153,580 project groups 146,270 (95.1%) have installed at least one forum (total of 541,099 forums). As the descriptive characteristics of the SourceForge population have shown, not all project groups are very active on their forums. Social network structure can only be researched as some activity has taken place between the individuals of that particular group. Therefore all project groups which are taken into account in this research need to have at least 50 messages in total on their forums. This measure is in line with the research of Krishnamurthy (2002) and Hinds (2008) who both use this criterion as well.

Criterion: Public posting on the forums

The SourceForge sample is now reduced to 7,375 project groups with a total number of 12,025 forums. The next selection criterion is anonymous versus public posting on the forums. A project group has the ability to allow anonymous posting on their forums. An advantage of anonymous posting is the barrier to entry for people to leave a message is much lower than with 52

public posting, which requires registration. A disadvantage is these forums have more spam postings in general, thus requires more time to manage and keep them clean. A major disadvantage for this research is anonymous posting does not allow the identification of people who posted a message, and therefore project groups which allow anonymous posting are eliminated from the SourceForge sample as social network structure cannot be measured without identification of individuals. There are 4,866 projects with a total of 8,440 forums which allow anonymous posting which need to be eliminated, which leaves 2,509 project groups with a total of 3,585 forums which only allow public posting.

Criterion: Project development stage

Krishnamurthy (2002) only used projects in the mature stage of development, since he expect these products had the best chance to build a community around them. As this research uses data with a limited timeframe, and the expectation of Krishnamurthy sounds plausible, this selection criterion is used. Now, the sample is further reduced to 130 projects with a total of 221 forums.

Criterion: Project core size

Krishnamurthy (2002) found that the vast majority of mature Open Source Software programs are developed by a small number of individuals. In the research of Crowston et al. (2006) they only use projects with groups of at least 7 members to ensure to have enough research data of group communication available.

As there is already set a selection criterion for the minimum number of forum messages, which implies already some communication, thus a few developers, further selection restrictions are reduced to a minimum. As also a closer look is taken into communication within the core group, the administrator and core developer subgroups taken together, this core group need to have at least 2 individuals, and therefore groups with only 1 member in the core group are eliminated from the research. Same counts for groups with 0 members in their core group. Though the project groups are having a mature development status, this could indicate an inactive group, of which the administrator abandoned the group without changing the status of development. Now the SourceForge sample is reduced to 96 project groups with a total of 168 public forums.

Hinds (2008) chose to eliminate all projects from his sample which included evidence of sponsors. Here, is chosen to include these projects because very limited evidence of donations was found.

53

4.5.2. Overview of sample selection criteria To summarize a schematic of the selection criteria is given in Table 7. To guarantee the quality of the research all research data is visually inspected and if any data corruption is found, the project group is eliminated from research sample, unless stated otherwise.

Table 7 - Overview of sample selection criteria

Criteria Category Test Criteria ('Reject if') Study Population Development status is not set to 'mature' Data Availability No forum is available A project group has less than 50 messages on the total of their forums Administrators allow anonymous forum postings Less than 2 core members are found Data Integrity Evidence is found of data corruption in available data

4.5.3. Data compilation The used data comes from three data sources. Data from FLOSSmole can directly be downloaded. For data from the SourceForge Research Data Archive it is necessary to insert a query to download a subset of data. The most used queries are provided in Appendix G. FLOSSmole and SRDA data come in plain text, which is then imported in SPSS 16.0, which is a statistical software suite. Data from the SourceForge.net site is imported in Microsoft Excel 2007, which is a spreadsheet application. Data are checked on inconsistencies and corruption by the author of this research. Both, Microsoft Excel 2007 and SPSS 16.0 are used to compile, manipulate and work with the data sets.

At a certain point, data compilation is needed to measure group closure, core closure, peripheral two-mode closure and leader centrality. A Java program has been written by coach drs. Ruud Smit to automate this process. The non-technical specifications for this program have been delivered by the author of the research, though the program itself is 'outsourced' due to lack of programming skills and knowledge of the author. The program is manually checked on irregularities by the author by manually calculating these measures several small project groups.

54

5. Data analysis and results This chapter deals with the data analysis and its results. Before dealing with the actual data analysis which includes testing of the hypotheses and a presentation of the testing results, first, based on the sample descriptive statistics are compared to the surprising findings of Krishnamurthy (2002).

5.1. Findings of Krishnamurthy Krishnamurthy (2002) conducted a limited research to find empirical evidence whether Open Source software is mainly developed in caves, i.e. lone producers, or by communities. Krishnamurthy took a sample of 100 mature SourceForge projects. Five interesting conclusions were found. Here, is checked, and discussed, if these conclusions are valid in the research sample as well.

Finding 1

The vast majority of mature OSS programs are developed by a small number of individuals.

Most research on Open Source Software projects have focused on large projects, and therefore the community model and the notion this is the common model for OSS development, was widely adopted. Krishnamurthy came to the conclusion most of the projects were lead by a pair of administrators, and had a handful of core developers. The variety in the number of core developers was high. The largest project in his sample had just over 40 developers. Here, similar numbers are found which are presented in Table 8.

Table 8 - Finding 1 of Krishnamurthy - small groups

Krishnamurty (N = 100) Research findings (N = 96) Number of Number of Number of Number of administrators developers administrators developers Mean 2.21 6.61 1.74 4.82 Median 1 4 1 3 Mode 1 1 1 2 Minimum 1 1 0 1 Maximum 14 42 5 48 Std. deviation 1.91 8.24 1.03 6.27

Finding 2

Very few OSS products generate a lot of discussion. Most products do not generate too much discussion.

Although the sample selection of this research filtered out projects which generated very little discussion, this finding is true for this research as well.

55

Table 9 - Average generated discussion per day on the project public forums

Research findings (N = 96)

Mean .8753 Median .1450 Mode .02* Minimum .02 Maximum 18.54 Std. Deviation 2.44644 * Multiple modes exist. The smallest is shown

Although there are some exceptions where projects generate a lot of discussion, and the sample project were selected on at least some generated discussion, thus had a slightly higher conversation volume than the projects investigated by Krishnamurthy, the finding is most projects generate not too much discussion. As shown in Table 9 the average generated discussion per day on the project public forums is less than one message per day. The project with the highest ratio has not even 20 messages per day.

Finding 3

Products with more developers tend to be viewed and downloaded more often.

To investigate this finding of Krishnamurthy two figures are presented. Figure 4 shows the number of page views versus the number of developers on the left, and the number of downloads versus the number of developers on the right.

Figure 4 - Page views (left) and software downloads (right) vs. number of core developers

Both graphs are similar shaped as Krishnamurthy's findings. Though, the number of page views, and the number of downloads are considerably less here. Possibly, the calculation of these

56

numbers differs from the method of Krishnamurthy. The actual correlation between the number of developers and page views is 0.725, and between the number of developers and downloads is 0.327. Both correlations are significant at the 0.01 level.

Finding 4

The number of developers working on an OSS program was unrelated to the release date.

It is plausible the older a project, the more developers it could have. More development time and natural growth of the group may influence the projects' group size. However, though some older projects (4 outliers) have significantly more developers than other projects, there is no evidence of a significant relationship between the age of a project (comparable with the release date) and the number of developers, as presented in Figure 5 (left).

Figure 5 - Project age (left) and percent of administrators (right) vs. the number of developers

Finding 5

A smaller percent of participants was assigned as project administrators in larger groups.

By nature of groups, it is expected the percent of administrators related to total number of developers goes down as the core size grows. Otherwise, the administrator function would not be very representative and possibly not very effective as project leader or (final) responsible person.

Although the research of Krishnamurthy (2002) is limited, it is important to note Open Source software projects differs significantly from developing proprietary and other forms of software, as a lot of people still have misperceptions of the Open Source phenomenon.

57

5.2. Preliminary analyses This subchapter deals with the preliminary analyses, which were performed before the actual regression analyses were conducted. The goal of the preliminary analyses is to get a quick insight and overview of the sample, which helps to execute the actual, detailed regression analyses. First, the distributions of the variables were checked for normality. As a result, the dependent variables were log transformed. Secondly, several outlier tests are done. Based on the transformed dependent variables, univariate procedures, as well as multivariate procedures were performed to identify possible outliers. Chosen was to not exclude any case. Finally, a principal component analysis was performed to measure possible overlap of the dependent variables. As a result, the number of trackers opened and the number of trackers closed were eliminated from further research.

5.2.1. Transformation of variables First, a normality test was executed. Each dependent variable was linear regressed against each independent variable. The standard residuals were checked on normality. These plots can be found in Appendix M. Normality is found when the standard residuals are normally distributed. All 24 preliminary regression analyses indicate this is not the case. All plots of standard residuals have high peaks and short tails.

Similar, the probability-probability (P-P) plots were checked on normality. When the variables are normally distributed, they should approximately be linear and follow the 45-degree line on the plot. The P-P plots can be found in Appendix O. These plots are all similar shaped, and are linear low-sloped, and on the right end they bend over to above. Again, no evidence of a normal distribution was found.

To overcome the problem of non-normality the dependent variables were log transformed. In log transformation the natural logarithms of the values of the variables are used, rather than the original raw values. Now, normality was checked by executing One-Sample Kolmogorov-Smirnov tests. Here, the One-Sample Kolmogorov-Smirnov test compares the observed distribution function of the dependent variables with a (theoretical) normal distribution. The Kolmogorov-Smirnov Z-score tests the 'goodness-of-fit' of the two distributions. When a Z-score is lower than 2, the values of the observed distribution differ less than 2 times of the reproducibility of the standard deviation of the theoretical normal distribution. In normal distribution this counts in around 95% of all the cases, and thus acceptable. A Z-score between 2 and 3 will occur in around 5% of all cases in a normal distribution and therefore is questionable. A Z-score higher than 3 is unsatisfactory, as this is just the case in around 0.3% of all normal distributions.

58

Appendix M includes the One-Sample Kolmogorov-Smirnov tests for the untransformed dependent variables, and the same tests for the natural transformed dependent variables. The untransformed variables have unsatisfactory high Z-scores, where the transformed variables have acceptable Z-scores. By nature of the One-Sample Kolmogorov-Smirnov test, by comparing the observed distribution with a theoretical normal distribution the power of the Kolmogorov-Smirnov test is affected. Therefore a Monte Carlo method was used to check the significant of the Kolmogorov-Smirnov test. The Monte Carlo significance level, and the lower and upper bound of these significances are presented at the right side of both tables. A Monte Carlo method with 1,000,000 samples tables was performed.

Thereafter the Quantile-Quantile (Q-Q) plots were inspected. A Q-Q plot is a graphical method for diagnosing differences between the probability distribution of a statistical population from which a random sample has been taken and a comparative distribution. Non-normality of the sample distribution can be tested with the Q-Q plot. With the Q-Q plot quantiles are matched. First, a set of quantiles is chosen, where after each quantile of the distribution is plotted against the matched quantile of the sample. The closer the plotted points are to the 45-degree line, the closer (the higher the fidelity) the two distributions. Here, the sample and a theoretical normal distribution are matched. Also a look is taken at the detrended Q-Q plot. The detrended Q-Q plot shows the differences between the observed and expected values of a normal distribution. When a distribution is normal, all points are clustered, with no pattern, around the horizontal zero-line.

The QQ-plots and the detrended QQ-plots are provided in Appendix N. The QQ-plots of the dependent variables clearly deviate from the linear 45-degree line. On the other hand, the QQ-plots for the natural transformed dependent variables are approximately follow this 45-degree line. The detrended QQ-plots of the dependent variables are indicating non-normality as well, as these values are not clustered around the horizontal zero band. The detrended QQ plots of the natural transformed variables, despite some outliers, are much closer, in random order, clustered around this horizontal zero band. Deviations are relatively low, in contrast to the detrended QQ-plots of the normal dependent variables.

After several exploratory test runs, it appeared the control variable 'the number of threads' had a consistent Tolerance of just below .10. Apparently this is caused by the similarity of the control variables 'the number of threads' and 'conversation volume', and their relationship with each other. The low Tolerance value means the regression analysis has an allocation problem. Despite R² is high (measure of explanation), the unstandardized coefficient standard errors of both control variables are high as well. Thus, though the combined contribution of the control variables to the explanation of the variation of the independent variable is acceptable, the contribution of each individual control variable

59

is statistically not reliable. In Figure 6 the correlation between the number of threads and conversation volume is provided. On the left figure an overview is provided, on the right figure there is zoomed in.

Figure 6 - The correlation between the number of threads and conversation volume

The correlation between the number of threads and conversation volume is very high. A Pearson correlation of .931 (significance .000) has been measured. Apparently the high correlation between the two control variables causes the multicollinearity. To overcome this problem, the control variable the number of threads is eliminated from further research. Though, an alternative, combined control variable could be made, such as the number of threads divided by the conversation volume, as a density measure, it is decided to not choose for this alternative. One, there are already three density measures as dependent variables, which in one way or another are similar measures. And, Hinds (2008) uses the conversation volume as a control variable as well, and no multicollinearity or other problems did occur.

For each of the research variables the skewness and the kurtosis were measured. These are provided in Table 11, which can be found in the next subchapter.

The skewness is a measure of the asymmetry of a probability distribution. A negative skew means the left tail of a distribution is larger than the right tail. Thus the mass of the values is positioned at the right side of the distribution and therefore has relatively low values. A positive skew is the opposite of a negative skew, thus the right tail of the distribution is longer than the left tail, and has relatively few high values.

The kurtosis is a measure of the pointedness of a probability distribution. The higher the kurtosis, the more variance is due to infrequent extreme deviations. A high kurtosis means the probability distribution has a sharp peak, and long, fat tails. A low kurtosis means the peak is more

60

flat, rounded and has short, thinner tails. Due to the characteristics of the kurtosis, by definition, the minimum kurtosis is minus 2, where the maximum kurtosis is (positively) infinite.

Most of the research variables have a high skewness and high kurtosis, though some exceptions exist, such as project age. The natural transformed dependent variables have clearly more acceptable skewness and kurtosis levels.

5.2.2. Outlier assessment Here, a closer look is taken at the possibility of outliers in the sample. Outliers are research variables which are research variables which have extreme values. In general, in a dataset of a research variable, outliers are identified when a value has an interval of three or more standard deviations from the mean of that specific research variable. The outlier may have such on influence on the descriptive statistics, and thus also may influence the possible relationship of this variable with other research variables, and therefore at least need to be discussed independently, or eliminated from the research.

Because of the fact an extensive selection method is used to take a sample from the SourceForge population, no outliers are eliminated from the research at this stage. Outliers are, when necessary, discussed in further chapters.

5.2.3. Reduction of variables A principal component analysis is used to measure possible overlap of the dependent variables. As a result, the number of variables may be reduced, or grouped.

Previously in this research, two dimensions were introduced, namely success as activity and success as output. Success as output includes the number of page views, the number of software downloads and the number of trackers opened. Success as activity includes the number of web hits and the number of trackers closed. The SourceForge project Rank is a complex variable and therefore not classified.

The principal component analysis transforms a number of possibly correlated variables into a smaller number of uncorrelated variables. These uncorrelated variables are called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The goal of the principal component analysis is to test the plausibility of the introduced two dimension model, and to verify the classification of the dependent variables. Several test runs were performed. The log transformed dependent variables were used. And, a 'Varimax' rotation with 'Kaiser Normalization' was applied. The first test run included all six dependent variables, which resulted in

61

two components, one component included project rank, and the other component the resulting dependent variables. Due to the complexity of project rank, this variable is left out of the two- dimension classification, though this dependent variable is still taken into account for further research.

The second test run produced two components, one component included the number of web hits, and the other component included the other four dependent variables. Now, the total variance explained was only 59.95%, and the component matrices and the scree plot recommended an extra component, as trackers opened and trackers closed 'floated' between the number of web hits on one side, and the number of page views and the number of software downloads on the other side.

The third run forced the use of three components. Now, the first component included the number of software downloads and the number of page views, the second component included the number of web hits, and the third component included the number of trackers opened and the number of trackers closed. However, the eigenvalues of the number of trackers opened and the number of trackers closed are close to zero. This indicates multiple equilibrium points between the two dimensions, which should not be possible. And, the initial classification of trackers opened as success as output, and trackers closed as activity is now eliminated. Possibly, the correlation between the number of trackers opened and closed, causes this inconsistence. The Pearson correlation between those variables is .898 (significance = .000) which is high, though plausible as a tracker can only be closed after it is opened. Therefore, the trackers opened and trackers closed are eliminated from the set of dependent variables, though Appendix I explores some descriptive statistics of these two variables.

The fourth run included the three remaining dependent variables. Now, the first component included the number of web hits, where the second component included the number of page views and the number of software downloads. Now, the scree plot and the component matrices indicated no further components were needed. The rotated component matrix for the accepted dependent variables is provided below.

Table 10 - Rotated component loading for accepted (log transformed) dependent variables

Component Dependent variable 1 2 Web Hits .254 .967 Software Downloads .957 .223 Page Views .941 .279

62

5.3. Descriptive and correlation statistics Here, several descriptive and correlation statistics are discussed. An overview is provided in Table 11.

The average core size of a group is approximately 5, though the core size ranges from 1 to 48 developers. The administrator subgroup ranges from 0 to 5 and the core size group from 0 to 47, where the average number of administrators is approximately 2, respectively 3 for the subgroup of core developers. The range of the number of peripheral developers is large, from 0 to almost 2,700, almost identical to the group size of an Open Source software project.

The control variables, conversation volume, and the project age all have a large range. The age of project differs from almost 1 year old to a maximum of approximately 8.5 years old. The number of messages on the forums varies from 39 to almost 38,000, and the average number of messages is around 2,000 messages. The minimum of 39 messages differs from the selection criteria of at least 50 messages on a total of the projects' forums, but this difference is occurred by clearing out the spam and anonymous messages.

All of the success variables, software downloads, page views, web hits and project rank have all wide ranges and can independently differ a lot from each other for each project. The minimum value of '0' for the number of software downloads and the number of web hits is occurred by not having these descriptive statistics available for all projects.

The number of web hits is highly correlated with the number of page views, and the number of software downloads is correlated with project rank. Therefore, although the actual figures differ slightly, it is chosen to work with these figures. And the 96 sample projects have generated a total of 14,551,793 web hits and a total of 1,670,826 downloads, and the differences with the actual figures are marginal (less than 0,001%).

The descriptive statistics for the natural transformed variables are provided as well. Here, it can be seen the skewness and kurtosis are much closer to a normal distribution as the non-transformed dependent variables. The community social network structure variables have large ranges as well. Four of the six constructs have a 0-to-1 index, where core density, peripheral two-mode density and administrator class centrality all have the maximum range from 0 to 1. Group density has a range from nearly 0 to .5. The core membership degree ranges from 1 to just over 15, where the administrator membership degree is slightly larger and ranges from 1 to over 22.

Table 12 provides the correlation matrix of the research variables. Several relationships can be noted. The highest Pearson correlation coefficient is noted between the natural transformed page views and software download variables (.930, significance .000). 63

Table 11 - Descriptive statistics of research variables

Unit N Min. Max. Mean S.D. Skewness Kurtosis Subgroups (Pdev) Peripheral developers # members 96 0 2,689 252.52 526.407 3.648 13.408 (Cdev) Core developers # members 96 0 47 3.08 5.996 5.033 31.936 (Addev) Administrators # members 96 0 5 1.74 1.028 1.316 1.717

Controls (GS) Group Size # members 96 4 2,691 257.34 527.591 3.633 13.307 (CS) Core Size # members 96 1 48 4.82 6.268 4.501 25.905 (CV) Conversation Volume # all posts 96 39 37,986 1,910.93 5,557.716 5.337 30.886 (AGE) Project Age # days 96 356 3,056 2,229.11 608.612 -.728 0.087

Community Success (SD) Software Downloads # all d/l 96 0 542,740 17,404.44 68,838.688 6.005 39.764 (PV) Page Views # all views 96 17 1,396,906 73,359.84 201,685.245 4.877 26.210 (HITS) Web Hits # all hits 96 0 4,054,972 151,581.18 505,150.923 5.898 40.364 (RANK) SourceForge Rank # Rank (May) 96 23 94,076 11,927.48 18,941.491 2.475 6.043

Transformed Success Variables (Ln_SD) Software Downloads Ln # all d/l 94 1.0986 13.2044 6.9600 2.4779 -.020 .165 (Ln_PV) Page Views Ln # all views 96 2.8332 14.1498 9.2910 2.1448 -.230 .181 (Ln_HITS) Web Hits Ln # all hits 94 .6931 15.2155 9.3362 2.8559 -.805 .929 (Ln_RANK) SourceForge Rank Ln # Rank 96 3.1355 11.4519 8.0711 1.9447 -.477 -.316

Community Social Network Structure (GD) Group Density 0-to-1 96 .0007 .500 .059 .081 3.111 11.784 (CD) Core Density 0-to-1 79 .000 1.000 .538 .362 .043 -1.388 (PTD) Peripheral Two-Mode Density 0-to-1 95 .000 1.000 .276 .213 1.610 2.422 (CMD) Core Membership Degree # projects 96 1.000 15.333 2.948 2.475 2.896 10.093 (AMD) Administrator Membership Degree # projects 94 1.000 22.500 3.685 3.421 2.845 11.348 (ACC) Administrator Class Degree 0-to-1 96 .000 1.000 .570 .277 -.287 -.945 * Due to these figures are spidered from real-time projects, small time differences may occur and therefore some group size figures may differ with a maximum of 1 developer from the actual numbers.

Table 12 - Correlation matrix of research variables

GS CS CV GD CD PTD CMD AMD ACC Ln_SD Ln_PV Ln_Hits Ln_R. GS CS .194* (.029) CV .834** .145 (.000) (.079)

GD -.290** -.084 -.202* (.002) (.209) (.024) CD .069 -.278** .128 .197* (.273) (.007) (.131) (.041) PTD .268** -.372 -.165 .333* .484** (.004) (.000) (.055) (.000) (.000) CMD -.036 -.170* .011 -.092 .240* .089 (.364) (.049) (.457) (.186) (.017) (.194) AMD .074 .001 .125 -.110 .180 .010 .584** (.237) (.496) (.113) (.144) (.056) (.462) (.000) ACC -.267** -.027 -.181* .235* .225* .535** -.107 -.047 (.004) (.395) (.038) (.011) (.023) (.000) (.149) (.325)

Ln_SD .437** .274** .348** -.447** -.060 -.201* .095 .111 -.192* (.000) (.004) (.000) (.000) (.300) (.027) (.181) (.143) (.032) Ln_PV .448** .348** .331** -.498** -.085 -.229* .049 .121 -.165 .930** (.000) (.000) (.000) (.000) (.227) (.013) (.319) (.120) (.054) (.000) Ln_ .331** .235* .199* -.162 -.025 -.183* .005 .063 -.235* .461** .499** Hits (.001) (.011) (.027) (.058) (.414) (.038) (.480) (.274) (.011) (.000) (.000) Ln_ -.306** -.396** -.228* .300** .066 .103 -.136 -.162 -.096 -.750** -.765** -.452** Rank (.001) (.000) (.013) (.001) (.282) (.161) (.093) (.060) (.177) (.000) (.000) (.000) * Correlation is significant at the 0.05 level (1-tailed) ** Correlation is significant at the 0.01 level (1-tailed) 64

5.4. Hypotheses testing Here, first the research hypotheses are presented, where after the regression methods are discussed.

5.4.1. Research hypotheses Here, the set of research hypotheses is presented. The six social network constructs, and the four left over dependent variables result in a set of 24 hypotheses.

Hypotheses 1 - Group Closure 1A The Group Density of an OSSP community has an inverted-U relationship with Software Downloads 1B The Group Density of an OSSP community has an inverted-U relationship with Page Views 1C The Group Density of an OSSP community has an inverted-U relationship with Web Hits 1D The Group Density of an OSSP community has an U-shaped relationship with Project Rank

Hypotheses 2 - Core Closure 2A The Core Density of an OSSP community has an inverted-U relationship with Software Downloads 2B The Core Density of an OSSP community has an inverted-U relationship with Page Views 2C The Core Density of an OSSP community has an inverted-U relationship with Web Hits 2D The Core Density of an OSSP community has an U-shaped relationship with Project Rank

Hypotheses 3 - Peripheral Two-Mode Closure 3A The Peripheral Two-Mode Density of an OSSP community has an inverted-U relationship with Software Downloads 3B The Peripheral Two-Mode Density of an OSSP community has an inverted-U relationship with Page Views 3C The Peripheral Two-Mode Density of an OSSP community has an inverted-U relationship with Web Hits 3D The Peripheral Two-Mode Density of an OSSP community has an U-shaped relationship with Project Rank

Hypotheses 4 - Core Bridging 4A The Core Membership Degree of an OSSP community is positively related with Software Downloads 4B The Core Membership Degree of an OSSP community is positively related with Page Views 4C The Core Membership Degree of an OSSP community is positively related with Web Hits 4D The Core Membership Degree of an OSSP community is negatively related with Project Rank

Hypotheses 5 - Administrator Bridging 5A The Administrator Membership Degree of an OSSP community is positively related with Software Downloads 5B The Administrator Membership Degree of an OSSP community is positively related with Page Views 5C The Administrator Membership Degree of an OSSP community is positively related with Web 65

Hits 5D The Administrator Membership Degree of an OSSP community is negatively related with Project Rank

Hypotheses 6 - Administrator Centrality 6A The Administrator Class Centrality of an OSSP community has an inverted-U relationship with Software Downloads 6B The Administrator Class Centrality of an OSSP community has an inverted-U relationship with Page Views 6C The Administrator Class Centrality of an OSSP community has an inverted-U relationship with Web Hits 6D The Administrator Class Centrality of an OSSP community has an U-shaped relationship with Project Rank

5.4.2. Regression methods A similar setup for testing as Hinds (2008) is used. As principal testing method a multiple linear regression with ordinary least squares was used. This is done by using a single three-step hierarchical regression test. First, for each hypothesis the dependent variable is regressed on the independent variable (Model 1). Secondly, the control variables are included and the regression test is repeated. Here is tested on the possibility of a linear (inverted-U) relationship (Model 2). The third test includes a transformation of the independent variable. Now, the independent variable is mean- centered and squared (Model 3). This way the possibility of a quadratic (U-shaped) relationship can be tested.

The purpose of the control variables, software downloads, page views, conversation volume, group size and core size, is to filter out the effects from the control variables on the effect of the independent variable. It is plausible the control variables may affect community success, and therefore this effect must be isolated.

Any evidence of an inverted-U relationship is found when the unstandardized coefficients in model 2 for the untransformed independent variable are positive and the unstandardized coefficients for the transformed independent variable in model 3 are negative, and are significant. And, the level of explained variance should positively significantly change from model 2 to model 3.

Any evidence of an U-shaped relationship is found when the unstandardized coefficients in model 2 for the untransformed independent variable is negative and the unstandardized coefficients for the transformed independent variable in model 3 are positive, and are significant. And, the level of explained variance should positively significantly change from model 2 to model 3.

66

5.5. Testing results Here, for each construct a summary of results of the linear and quadratic regressions tests are provided. This way, each hypothesis is tested. All the detailed regression analyses can be found in Appendix Q. Each regression summary table includes the unstandardized coefficients and standard errors, the standardized beta and its significance (p-value), and the adjusted R-squared (R²) as well as the change of R-squared (ΔR²). For the linear regressions the change of R-squared is the difference of R-squared between model 1 and model 2. For the quadratic regressions the change of R-squared is the difference of R-squared between model 2 and model 3.

Overall, the predictive values of all models were relatively solid. Between the models changes of values were minimal. For the linear regression the number of software downloads had an adjusted R-squared range from .201 to .324, the number of page views a range from .250 to .412, and the number of web hits a range from .150 to .190. The quadratic regressions were similar and the adjusted R-squared for the number of software downloads ranges from .201 to .364, the number of page views ranges from .244 to .452, and the number of web hits ranges from .144 to .182.

Project rank is discussed solely, after the summary regressions of the six constructs.

5.5.1. Group Density For the linear regressions on group density significant negative relationships were found for the number of software downloads (.000) and the number of page views (.000). A near-significant negative relationship was found for the number of web hits (.062). For the quadratic regressions on group density near-significant relationship were found for the number of software downloads (.012) and the number of page views (.007). A summary of the regression on core density is provided in Table 13. The detailed regression analyses can be found in Appendix Q.

Table 13 - Summary of regression on group density

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1a: Software -11.287 (2.764) -.370 .000 .324 .121 downloads H1b: Page Views -11.200 (2.226) -.421 .000 .412 .157 H1c: Web Hits -7.053 (3.727) -.187 .062 .190 .031

Quadratic regressions H1a: Software .291 (.114) .428 .012 .364 .045 downloads H1b: Page Views .250 (.091) .420 .007 .452 .043 H1c: Web Hits .067 (.149) .081 .655 .182 .002 Controlling for group size, core size, conversation volume and age 67

5.5.2. Core Density No significant or near-significant relationships were found for the linear and the quadratic regressions on core density. A summary of the regression on core density is provided in Table 14. The detailed regression analyses can be found in Appendix Q.

Table 14 - Summary of regression on core density

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1a: Software -.404 (.803) -.056 .617 .203 .016 downloads H1b: Page Views -.341 (.674) -.054 .615 .250 .012 H1c: Web Hits -.499 (.905) -.064 .583 .150 .004

Quadratic regressions H1a: Software .306 (.354) .092 .390 .201 .008 downloads H1b: Page Views .184 (.300) .063 .542 .244 .004 H1c: Web Hits -.283 (.410) -.077 .492 .144 .005 Controlling for group size, core size, conversation volume and age

5.5.3. Peripheral Two-Mode Density No significant or near-significant relationships were found for the linear and the quadratic regressions on peripheral two-mode density. A summary of the regression on peripheral two-mode density is provided in Table 15. The detailed regression analyses can be found in Appendix Q.

Table 15 - Summary of regression on peripheral two-mode density

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1a: Software -.594 (1.208) -.053 .624 .201 .006 downloads H1b: Page Views -.455 (1.007) -.046 .652 .255 .008 H1c: Web Hits .053 (1.435) .004 .971 .168 .011

Quadratic regressions H1a: Software .182 (.189) .155 .338 .201 .008 downloads H1b: Page Views .155 (.156) .151 .652 .255 .008 H1c: Web Hits .325 (.220) .241 .144 .179 .020 Controlling for group size, core size, conversation volume and age

68

5.5.4. Core Membership Degree No significant or near-significant relationships were found for the linear and the quadratic regressions on core membership degree. A summary of the regression on core membership degree is provided in Table 16. The detailed regression analyses can be found in Appendix Q.

Table 16 - Summary of regression on core membership degree

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1a: Software .120 (.093) .121 .199 .211 .014 downloads H1b: Page Views .116 (.077) .134 .134 .265 .018 H1c: Web Hits .021 (.111) .018 .850 .157 .000

Quadratic regressions H1a: Software -.035 (.129) -.048 .787 .202 .001 downloads H1b: Page Views .019 (.106) .030 .180 .257 .000 H1c: Web Hits -.122 (.152) -.145 .427 .154 .006 Controlling for group size, core size, conversation volume and age

5.5.5. Administrator Membership Degree No significant relationships were found for the linear and the quadratic regressions on administrator membership degree. However, for the linear regression a very weak result was found for the number of page views (p = .099). A summary of the regression on administrator membership degree centrality is provided in Table 17. The detailed regression analyses can be found in Appendix Q.

Table 17 - Summary of regression on administrator membership degree

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1a: Software .091 (.067) .127 .178 .213 .017 downloads H1b: Page Views .093 (.056) .148 .099 .272 .025 H1c: Web Hits .021 (.081) .025 .796 .159 .003

Quadratic regressions H1a: Software .058 (.109) .085 .594 .206 .003 downloads H1b: Page Views .046 (.091) .076 .616 .266 .002 H1c: Web Hits -.093 (.131) -.116 .482 .154 .005 Controlling for group size, core size, conversation volume and age

69

5.5.6. Administrator Class Centrality No significant or near-significant relationships were found for the linear and the quadratic regressions on administrator class centrality. A summary of the regression on administrator class centrality is provided in Table 18. The detailed regression analyses can be found in Appendix Q.

Table 18 - Summary of regression on administrator class centrality

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1a: Software -1.376 (.928) -.156 .142 .215 .018 downloads H1b: Page Views -.902 (.781) -.117 .251 .257 .010 H1c: Web Hits 1.309 (1.102) .128 .238 .170 .013

Quadratic regressions H1a: Software .080 (.116) .068 .491 .211 .004 downloads H1b: Page Views .073 (.096) .071 .451 .254 .005 H1c: Web Hits -.064 (.136) -.047 .238 .163 .002 Controlling for group size, core size, conversation volume and age

5.5.7. Project Rank Due to the nature of project rank it is discussed separately. The correlation matrix in chapter 5.3 showed project rank has significant negative relationships with the amount of peripheral developers (-.203), the amount of administrators (-.191), group size (-.205) and the number of page views (-.208). All correlations are significant at the .01 level.

Here, project rank is correlated with each of the dependent variables. Table 5.5.7.a provides the results of the linear and quadratic regressions. First, project rank is natural transformed and regressed on the control variables (Model 1). Then, project rank is regressed on the control variables and the social network construct (Model 2). To check for the possible of a U-shaped relationship, project rank is regressed on the control variables and the social network construct, and a mean- centered and squared social network construct. The control variables are group size, core size, conversation volume and project age.

70

Table 19 - Summary of regressions of project rank on six social network constructs

Unstandardized Standard Standardized p-value Adj. R² ΔR² coefficient error beta Linear regressions H1d: GD 5.117 (2.289) .212 (.028) .243 .040 H2d: CD -.222 (.600) -.041 (.713) .208 .015 H3d: PTD -.848 (.952) -.094 (.376) .210 .009 H4d: CMD -.172 (.071) -.219 (.017) .251 .047 H5d: AMD -.120 (.051) -.212 (.021) .251 .048 H6d: ACC -.962 (.727) -.137 (.189) .216 .015

Quadratic regressions H1d: GD -.062 (.097) -.115 (.524) .238 .003 H2d: CD .055 (.268) .022 (.838) .197 .001 H3d: PTD -.237 (.146) -.251 (.108) .224 .022 H4d: CMD .118 (.097) .205 (.226) .255 .012 H5d: AMD .026 (.083) .047 (.310) .243 .001 H6d: ACC -.139 (.089) -.149 (.120) .229 .020 Controlling for group size, core size, conversation volume and age

The detailed regression analyses can be found in T. Table 19 shows only three near- significant relationships of project rank. Project rank is positively linear near-significant with group density (p = .028), and negatively linear near-significant with core membership degree (p = .017) and administrator membership degree (p = .021). None of the quadratic regressions of project rank are significant or near-significant.

71

6. Discussion This chapter includes a summary of findings. First, for each main construct the findings are summarized. In succession closure, bridging and leader centrality are discussed. Project rank is discussed separately. Next, several suggestions are proposed for the research findings. As ending, the insignificance of structure in Open Source software projects is discussed.

6.1. Summary of findings Here, for each main construct the findings are summarized. The closure construct deals with group density, core density and peripheral two-mode density. The bridging construct deals with core membership degree and administrator membership degree, where the centrality construct deals with administrator class centrality. Then, project rank is discussed separately. For each construct the results of the hypotheses are summarized in a table.

6.1.1. Closure First, a table with a summary of test results for the closure hypotheses is presented. Table 20 shows for each type of closure, which are group density (GD), core density (CD) and peripheral two- mode density (PTD), the hypothesized relations with each dependent variable. Based on the testing results, if applicable, a suggested alternative relation is given.

Table 20 - Summary of test results for closure hypotheses

# Hyp. Independent Dependent Success Hypothesized Suggested Detailed variable variable dimension relation alternative relation results H1a GD Soft. Downloads Activity Inverted-U Negative (p = .062) Table Q.1 H1b GD Page Views Activity Inverted-U Negative (p = .000) Table Q.2 H1c GD Web Hits Output Inverted-U Negative (p = .000) Table Q.3

H2a CD Soft. Downloads Activity Inverted-U None Table Q.4 H2b CD Page Views Activity Inverted-U None Table Q.5 H2c CD Web Hits Output Inverted-U None Table Q.6

H3a PTD Soft. Downloads Activity Inverted-U None Table Q.7 H3b PTD Page Views Activity Inverted-U None Table Q.8 H3c PTD Web Hits Output Inverted-U None Table Q.9

For each hypothesis of the types of closure it was expected to find an inverted-U relationship. This means a low, and a high, level of density is coupled with a low success rate. And, a moderate level of density is coupled with a relatively high success rate. Here, it means an average level of density is coupled with respectively a high number of software downloads, page views and web hits.

For the quadratic regression of software downloads (.012) and page views (.007) on group density near-significant relations were found. However, for the linear regressions of software

72

downloads (.000) and page views (.000) on group density significant negative relationships were found. And the linear regression of web hits (.062) is nearly-significant for a negative relationship. Because the p-values of the linear regressions are more significant than the p-values of the quadratic regressions, the linear regressions were chosen above the quadratic regressions.

Rather than finding inverted-U shaped relationships for the regressions of the dependent variables on core density, no or near-significant relationships were found. None for the linear regressions, and none for the quadratic regressions.

Similar, the linear and quadratic regressions on peripheral two-mode density resulted in no or near-significant relationships.

It is somehow strange to note peripheral two-mode density and core density are not negatively significant, though the regression of software downloads and page views on group density is. It would be plausible at least peripheral two-mode density would have a significant relationship with success, because in general the peripheral developer subgroup is much larger than the two other subgroups.

The regressions on core density and peripheral two-mode density are very similar to each other, where the regressions on group density deviate from these results.

If the closure hypotheses of software downloads and page views are compared with the results of Hinds (2008) several points can be noted. First, similar results are achieved for the linear (and quadratic) regressions of software downloads and page views on group density, which also have strong significant (p < .001) negative relationships. Next, Hinds finds a significant (p < .05) U-shaped relationship for the regression of page views on core density, and a near-significant (p = .067) negative relationship for the regression of software downloads on core density. Hinds also finds a near-significant (p = .092) linear negative effect for the regression of software downloads on peripheral two-mode density. Although this differs from the results above, similar to Hinds group density has the highest correlations in relation to core density and peripheral two-mode density. And, out of 12 closure hypotheses Hinds found 8 relations with a negative slope. Here, 8 out of 9 relations had a negative slope, and an identical conclusion of that there is no beneficial effect to closure in relation to the success of an Open Source software project can be drawn.

Next to Hinds (2008) only one other study is known which took density in relation to Open Source software projects into account. Crowston and Howison (2006) researched Open Source software project structure on the basis of bug reports. Not taken success variables into account, they found a high negative correlation between project size and density, which is in line with the research findings here. However, the research of Crowston and Howison is rather limited and of exploratory

73

nature. Plausibly, it must be viewed as an unfinished working paper as it is recommended by the authors not to cite or quote this research.

6.1.2. Bridging Table 21 provides a summary of the test results for the bridging hypotheses. Although positive relations were hypothesized for each of the dependent variables, no significant effects were found for the regressions on core membership degree and administrator membership degree. Only one near-significant positive relation (p = .099) was found for the linear regression of page views on administrator membership degree.

Table 21 - Summary of test results for bridging hypotheses

# Hyp. Independent Dependent Success Hypothesized Suggested alternative Detailed variable variable dimension relation relation results H4a CMD Soft. Downloads Activity Positive None Table Q.10 H4b CMD Page Views Activity Positive None Table Q.11 H4c CMD Web Hits Output Positive None Table Q.12

H5a AMD Soft. Downloads Activity Positive None Table Q.13 H5b AMD Page Views Activity Positive Positive (p = .099) Table Q.14 H5c AMD Web Hits Output Positive None Table Q.15

It was expected the higher the level of the bridging constructs, the more positive the effect on the success of an Open Source software project would be. The more ties administrators and core members would have with other groups, taken the cost-of-ties effect into account, in general the expected effect should be positive for the success of the software project. As by having these ties, a group has access to a wider variety and larger collection of information and knowledge. However, only one near-significant positive effect (p = .099) was found for the linear regression of the number of page views on administrator membership degree.

Overall, the linear regressions of core membership degree and administrator membership degree were very similar, though the significance levels for administrator membership degree are slightly better. The success as activity variables, software downloads and page views, were far more significant than the success as output variable, web hits. The quadratic regressions of core membership degree and administrator membership degree were not significant at all, though the regression of page views on core membership degree was far more significant than the other quadratic regressions. Though, the linear regression of page views (p = .134) was dominant over the quadratic regression (p = .180).

Comparing the results of the regressions of page views and software downloads on administrator membership degree and core membership degree to the results of Hinds (2008), Hinds 74

only finds a near-significant (p =.099) for the inverted-U relationship of software downloads with core membership degree, where here only a near-significant (p =.099) positive relationship was found for the regression of page views on administrator membership degree.

Again, only one single other study is known which investigated bridging in relation to the success of Open Source software projects. Grewal et al. (2006) investigated three bridging related constructs, which they refer to as network embeddedness. Structural embeddedness captures the extent to which an entity is entrenched in a network of relationships. Junctional embeddedness assess the extent to which an entity connect to other entities. And, positional embeddedness appraises the extent to which an entity is connected with other structurally embedded networks. Grewal et al. (2006) use the amount of code commits and the number of downloads as measures of success. Although the methodological research setup differs, in respect to positional embeddedness Grewal et al. (2006) found a significant positive effect (p < .001) for core bridging (project embeddedness) on the number of code commits, and a near-significant negative relationship (p < .05) of code commits with administrator bridging (project administrator embeddedness). The effects on the number of downloads were much smaller, and not significant. Here, code commits is not measured. For software downloads the results are identical and consistent. Hinds (2008) found similar results for the effect of bridging on code commits. Remarkably, it is safe to conclude bridging does not significantly affect the success of an Open Source software project.

6.1.3. Leader Centrality Table 22 provides a summary of the test results for the leader centrality hypotheses. No significant or near-significant effect was found for any of the linear and quadratic regressions on administrator class centrality (ACC). The linear and quadratic regressions of the success as activity variables, software downloads and page views were very similar to each other, though the linear regressions were more significant than the quadratic regressions. Overall, they were not even near significant. The linear regression of software downloads on administrator class centrality has the best significance (p = .142). The linear and the quadratic regression of web hits on administrator class centrality has identical significance levels (p = .238).

Table 22 - Summary of test results for leader centrality hypotheses

# Hyp. Independent Dependent Success Hypothesized Suggested Detailed variable variable dimension relation alternative relation Results H6a ACC Soft. Downloads Activity Inverted-U None Table Q.16 H6b ACC Page Views Activity Inverted-U None Table Q.17 H6c ACC Web Hits Output Inverted-U None Table Q.18

75

Although it was expected success had an inverted-U relationship with administrator class centrality, this is not the case.

Hinds (2008) was the first who explored the effect of the structural role of leader centrality on the success of Open Source software projects. Similar to this research, he did not found any evidence of an inverted-U relationship for the leader centrality hypotheses. However, Hinds (2008) found evidence of a near-significant positive effect (p < 0.1) for administrator class centrality on software releases, which is not researched here, and a weak-significant U-shaped relationship for administrator class centrality on page views (p < .05).

6.1.4. Project Rank Table 23 provides a summary of the test results for the project rank hypotheses. For the density constructs no significant effects were found for core density and peripheral two-mode density. However, rather than finding an U-shaped relationship for group density evidence was found of a weak-significant positive relationship (p = .028). For the bridging construct negative effects were found, as hypothesized, though they are of very weak significance (p = .017 and p = .021). For the centrality construct no significant effect was found.

Table 23 - Summary of test results for project rank hypotheses

# Hyp. Independent Dependent Success Hypothesized Suggested alternative Detailed variable variable dimension relation relation results H1d GD Rank - U-shape Positive (p = .028) Table Q.19 H2d CD Rank - U-shape None Table Q.20 H3d PTD Rank - U-shape None Table Q.21 H4d CMD Rank - Negative Negative (p = .017) Table Q.22 H5d AMD Rank - Negative Negative (p = .021) Table Q.23 H6d ACC Rank - U-shape None Table Q.24

No other researches was found in which the project rank was used as a measure of success in relation to the structure of Open Source software projects.

6.1.5. Abstract of findings Here a brief abstract of the research findings based on the hypotheses is presented.

Of the 18 hypotheses of software downloads, page views and web hits regressed on group density, core density, peripheral two-mode density, core membership degree, administrator membership degree and administrator class centrality, 2 hypotheses were very significant (p = .000), 2 hypotheses were near-significant (p < .10) and the other 14 hypotheses were not significant at all. Only one of the hypothesized relationships was actually found, though the linear positive effect of page views on administrator membership degree was only near-significant (p = .099). 76

Of the 6 hypotheses of project rank 3 were weak-significant (p < .05), a positive relationship of group density, and a negative relationship of core membership degree and of membership degree.

Findings:

(1) In general, a negative relationship was found between the closure constructs and the success as activity variables. Though, these relationships were only significant for group density.

(2) In general, no significant bridging effects on the chance on success of an Open Source project was found. Only a near-significant positive effect of page views on administrator membership degree was found.

(3) In general, no significant centrality effects on the chance on success of an Open Source project was found.

(4) Weak evidence was found of a negative bridging effect on the rank of an Open Source project. And a weak positive effect of group density on the rank of a project was found as well.

In general, no very significant effects at all were found for all the linear and quadratic regressions, except for the regressions of the success as activity variables on group density. None of the hypothesized relationships were supported by the research findings.

6.2. Suggestions In this subchapter various explanations are proposed for the research findings. These explanations need to be conceived as suggestions, as they are not tested in this study. Further research is needed to verify these suggestions.

Finding (1) In general, a negative relationship was found between the closure constructs and the success as activity variables. Though, these relationships were only significant for group density.

A plausible cause for the significant negative relations of software downloads and page views and the near-significant negative relation of web hits, could be explained by 'clicking behavior' of developers as an effect of information disintegration.

When a project group has a low density, there is relatively little contact between the developers. As a result, developers are relatively isolated and need to figure out all kind of things themselves, rather than asking other developers. Due to this information disintegration, the search for information, and lack of communication, may result in more clicking behavior as each developer need to search information individually. Thus this clicking behavior affects the number of generated web

77

hits as well as the number of page views and the number of software downloads to overcome the problem of information disintegration.

However, this does not necessarily implicate something for the chance on success of a project group. Although clicking behavior can be seen as if it positively affects output or activity, it does not. Instead, tasks may be performed by several developers independently from each other, rather than one developer performs this task and then shares the results with the other developers.

A better explanation is rather than closure affects the chance on project success, a 'third factor' directly affects success and closure, whereby closure may be a mediating factor for this relationship. This is visualized in Figure 7.

Figure 7 - [Closure] Research findings (left) and new conceptual model (right)

Here, three suggestions are proposed. Three factors can be distinguished when organizing a software project, which affect both the closure of a project and the success of the project. First, a software design is essential to develop software. Next, software documentation helps to guide the project. Third, a 'netiquette', a shared set of common norms and values, is used to communicate and exchange with other project developers.

Software design

Good software design is essential for developing successful Open Source software. Hinds (2008) refers to MacCormack et al. (2006) who recognize the modularity of software architecture is an important success factor for Open Source software projects.

Here, rather than referring to software architecture, the term software design is used. As software architecture suits a developers perspective, software design also implies a customer's perspective, although these groups may be similar. Software design include ideas about what customers need (note: not what customers want). A measure of good design is how well it works for the customer (Graham, 2003).

78

The focus of a customer is more on the front-end of software and includes user friendliness, user interface, etcetera. Just like Philips slogan expresses 'Sense and simplicity' (Philips, 2009). A developer focuses more on the back-end of software and includes software modularity, scalability, clean and readable code, etcetera.

Thus, meaning the performance of software is dependent on the perspective taken into account. From a customer's perspective factors as real time and easy-to-use are important performance indicators. A developers perspective includes factors as the use of memory space, processor time or network bandwidth.

Good software design enables developers to work more autonomous, which negatively affects closure. But it also positively contributes to the chance on success of a project as a good software design is necessary for smooth operating software.

If a closer look is taken at SourceForge, a trend is visible to lightweight, modular and platform-independent software. Each characteristic may increase the chance on success of a software project.

The largest stake of software programs developed by project groups is written for an independent Operating System (19.2%)(FLOSSmole, file [8]), which means the actual Operating System does not play a role when the software is running. Together with OS portable (i.e. these programs work on many Operating Systems) already 25% of all SourceForge projects are Operating System independent.

Next, a clear trend in creating lightweight software is seen. Although still a lot of programs are programmed in 'heavy' languages as C++ (16.4%) or C (13.7%)(FLOSSmole, file [7]), which can be explained by the fact software, plug-ins or add-ons for software can be language independent as they are based on other (infrastructural) software. A trend to lightweight (and web) languages such as Java (20.0%), PHP (13.4%), or even Python (5.5%) is visible . Similar, most projects which include a database environment choose for a lightweight database, such as MySQL (32.0%), rather than heavyweight databases as Oracle (2.4%) or Microsoft SQL Server (3.8%)(FLOSSmole, file [16]).

Comparable, for project user interfaces, web-based interfaces (26.8%) are now more popular than Win32 (16.6%)(FLOSSmole, file [1]), the Windows interfaces where most of the software (at least) is written for, as most people worldwide use Windows as Operating System.

Software documentation

Software documentation includes a wide variety of different types of documentation, all useful to assist in managing and organizing a successful software project. For example, 79

spec(ifications) writing includes the setting out of software specifications, specific features and requirements. When actually developing software, this type of documentation is made to keep the goals of the software clear. Architectural documentation include the construction principles and relation to the software environment of a piece of software. Modular design, inclusion of libraries or plug-ins, and the relation to other software (infrastructure) is documented to smoothly 'fit-in'.

Technical documentation includes documentation of code, interfaces, and algorithms, etcetera. Documentation and remarks can also be placed in the programming code to make it readable for other developers, rather than distributing it in isolated documents. End-user documentation includes manuals, help-files, FAQs, how-to's, etcetera to enable end-users to actually use the software. Other documentation may include Open Source licenses or distribution terms, marketing plans, contact forms or anything else.

Poor documentation may increase the level of closure of a project as more contact with the projects developers is needed to successfully create properly functioning software. Next, more contact is needed with end-users as they cannot find answers if they are having problems with non- functioning software. As an effect, developers need to spend more time to solve these kinds of problems which negatively affects the actual software writing, and thus negatively affects the quality of the software as well.

Thus, good software documentation prevents a lot of problems with developers and end-users, where poor quality documentation may create an opposite occasion. Next, this effect can be negatively leveraged as writing is part of the skill set software developers must have. Poor quality software documentation may indicate poor code writing skills of software developers as well.

Netiquette

Hinds (2008) suggests project rules play an important role in guiding the behavior of independent contributors of Open Source software projects, as these project are less reliant on hierarchy and supervision than software development teams. Hinds adds these project rules may be formally stated in a document, or may be informally stated in forum postings. And, the Open Source license is also part of these project rules.

Hinds' (2008) sounds plausible it is therefore further explored in the research sample. First, a sample of 10 projects is drawn from the research sample. This is done by using a randInt-command on a TI-83 graphing calculator. The 'randint(1,96,10)'-command results in a set of 10 uniformly distributed pseudorandom integers between 1 and 96, which are corresponding with the sample projects. The projects' website, the forums and the software documentation were visually inspected on any evidence of formal, written project rules. No evidence was found. Next, the idea arose only large 80

projects would have written project rules, as these projects need to manage more people and organize more tasks. Therefore the 10 largest sample projects were visually inspected in the same manner. Here again, no evidence of formal project rules was found.

However, indication of the use of unwritten project rules and etiquette was found. On the internet platform often referred to as netiquette, a junction of the words network and etiquette. And, etiquette guidelines do exist for many actions, such as how to take over an (inactive) project, or how to request for, or propose new features. On the public project forums, forum administrators not only deal with spam messages they also need to deal with messages which include rude language, close double postings or cross-postings (two identical postings on two different sub-forums to increase the response rate on a message).

Most Open Source licenses include netiquette how to refer to used code, how to use a project as donor, if allowed, (i.e. start your project on the basis of an existing one), unashamedly directly coping code from proprietary software is prohibited. Netiquette is in line with other virtual examples as it exists on discussion websites, virtual trading places, usenet, massively multiplayer online games, but also is in place for email-usage or website-coding.

Proper use of netiquette enables autonomy of software developers, and thus negatively effects closure. On the other hand, it enables smooth management and organization of developing software, which increases the chance on success of an Open Source software project.

Finding (2) In general, no significant bridging effects on the chance on success of an Open Source project was found. Only a near-significant positive effect of page views on administrator membership degree was found.

It was hypothesized bridging effects would positively affects the chance on success of an Open Source software project. Evidence of only one very weak positive relationship was found. However, based on previous literature it is still plausible to suggest not bridging directly affects the chance on success of an Open Source software project, but a third factor is responsible for this affects, and bridging may be a mediating factor for this relationship, such as visualized in Figure 8.

Figure 8 - [Bridging] Research finding (left) and new conceptual model (right)

81

Here, bridging represents the establishment of relationships with other groups. These relationships can be two-way, as one can establish outgoing relationships themselves, or external individuals can establish incoming relationships. Therefore two perspectives can be taken into account, thus two suggestions are proposed for this third factor. Based on business administration literature, marketing activities include the (outgoing) relationships from a project to others, where stakeholder management includes the (ingoing) relationships of external persons, groups and organizations to a project.

Marketing activities

Marketing activities include all activities to get 'grip' on the market including advertising and distributing project information to other websites (outside the scope of Open Source projects, such as topic related sites, news sites, blogs or distribution platforms), or offline activities to create awareness. Marketing activities are all plausible factors to increase the chance on success of a project. Next, bridging may be a mediating factor for these marketing activities.

Stakeholder management

Stakeholder management include all activities to engage the right (external) people in the right way. Gaining support from external individuals may provide access to information, knowledge or monetary resources. Next, building a reputation and recognition are all plausible factors to increase the chance on success of a project. And, bridging may be a mediating factor for stakeholder management.

It is also plausible bridging activities of peripheral developers are more important than bridging activities of administrators and core developers. In general, the peripheral developer subgroup is much larger than the core group. Although the bridging effect of each peripheral developer may be less than the effect of a core developer, the combined activities of all peripheral developers may be of importance. Further research is needed to verify this suggestion. Unfortunately, here it was too difficult and too time-consuming to identify all external ties with other Open Source software projects of peripheral developers.

Finding (3) In general, no significant centrality effects on the chance on success of an Open Source project was found.

No evidence of a significant administrator class centrality, thus leader centrality, effect was found. However, based on previous literature it is plausible not leader centrality directly affects the chance on success of an Open Source project, but a third factor is responsible, and centrality may play a mediating role, as can be seen in Figure 9.

82

Figure 9 - [Centrality] Research findings (left) and new conceptual model (right)

A plausible third factor is the reduced need for knowledge transfer in Open Source software projects. Below, various reasons for the reduced need for knowledge transfer within Open Source software projects are provided. This not only may affect the centrality of a group, but may affect the closure of a group as well. Both these social network constructs may be a mediating factor for the reduced need for knowledge transfer on the chance on success of an Open Source software project.

Accepted standards and tools Open Source software development is generally characterized by the use of accepted standards and tools, and by a lightweight modular software design. Here these two points are briefly set out.

Open Source software projects often use a wide variety of accepted standards and tools, such as development methodologies, programming languages and coding standards, and support systems as bug trackers and forums. As most developers are familiar with these standards and tools, it is not needed to 'reinvent the wheel', and developers can jump 'on-the-fly' into a project, no matter the stage of a project. This reduces the need for knowledge in this area of Open Source software development.

A lightweight modular software design decreases the necessary knowledge transfers with other developers, as parts of software can be developed without the knowledge of other parts of the software.

Developer as user

Open Source software developers are often users of the software as well. Therefore, software is tested by the developers themselves, which reduces the need of knowledge transfer of external users, which is often not done in proprietary, commercial software development tracks.

Skilled developers

In general, project contributors are highly skilled developers. Often they have a background in the Information Technology sector or in the scientific area. This implies a reduced need of knowledge

83

transfer, as developers can solve problems independently, and already have an extensive knowledge set. An overview of project contributors is provided in Appendix B.

Open Source culture

The Open Source phenomenon is characterized by its very own culture. There is a netiquette between developers which include a level of respect and reputation to other developers. Next, their behavior includes a kind of familiarity, and leaves room for error and criticism. The management system of an Open Source software project is sometimes described as a meritocracy (Apache, 2009). A meritocracy is a system where responsibilities, tasks and roles are given based on abilities and capabilities of participants. 'The right people on the right place' and their autonomous behavior, reduce the need for knowledge transfer, as knowledge is already on the right spot in the social network, and it dampens the effect as well due to the level of independence of developers.

Finding (4) Weak evidence was found of a negative bridging effect on the rank of an Open Source project. And a weak positive effect of group density on the rank of a project was found as well.

First, an explanation for the weak-significant negative relationship of administrator bridging on project rank is provided. This relationship means the more outgoing ties of the administrator or core group, the better the rank of a project (the lower the rank, the better). It is plausible various bridging activities such as project advertising, or recruiting on other project forums, may lead to more traffic to your own project. When the number of web hits, page views, downloads, etcetera will go up, the rank will be better as project rank is a complex variable composed out of three components, namely the amount of generated traffic, the development activity, and communication activity. The project rank formula is provided in Appendix J.

Secondly, an explanation for the weak-significant positive relationship of group density on project rank is provided. This relationship means the tighter a project community is connected with each other, the worse the project rank. Although, it looks like a tight community is representative for a lot of communication between project developers, and may result in high traffic and rapid development, the factor time is not included. Project rank is measured as the mean project rank during the month May 2008. If the already established relationships between the project members are not used in this month, there is not much activity and therefore the rank may be low, despite the high group density.

Thus, here it is plausible to conclude project rank is not a proper success measure to investigate the effect of social network structure of Open Source software projects on the chance of success. Due to the composition of project rank it is difficult, or impossible, to measure the

84

(individual) effect of the structural constructs on project success. Researchers still may opt to use project rank as a success measure, but suggested is not to use this measure for structure-related issues.

Summary

Based on the three research findings concerning social network constructs a general implication of the effect of social network structure on the chance of success of an Open Source software project community can be drawn, which runs as follows:

The social network structure of an Open Source software project community has no significant relationship with community success.

6.3. The possibility of a measuring problem The overall research conclusion is social network structure of an Open Source software project community has no significant relationship with community success. However, it is plausible the social network constructs are not measured in the right way, and therefore the relationships between the structure of the community and the chance on success of the Open Source software project were not significant. Here, two explanations are provided. First, it is plausible project communication is established without the use of the (measured) network. Thus substitutes exist. Secondly, it is possible the communication, or information and knowledge exchange, is embedded in the network. These two options suggest the level of exchange in a network may be higher than the measured levels of closure, bridging and centrality.

Social network substitutes

It is plausible software developers of Open Source projects may have more relationships with other project developers than is measured in this research. It is known various communication protocols exist which were not taken into account in this research, such as communication by email, instant messengers, (internet) telephony, or even by sending letters or face-to-face communication. It is plausible the projects' public forums are not the main communication channels for developing Open Source software.

The SourceForge development platform can conceived of an alternative social network as well. SourceForge has special recruitment pages for finding developers for projects. It may be the Open Source software development platform community is already tightly connected with each other. People may know each other from previous or other projects.

It is also plausible software developers may know each other from other exchange platforms, and communicate via these platforms. Software developers may be specialized in certain areas of

85

work or research, and therefore share history with each other, such as using similar programming languages, or they are specialized in certain types of programs, such as web-based applications or security algorithms.

Social network embeddedness

Another possibility for the presence of a measuring problem of social network structure of an Open Source software project is it is plausible the communication between project developers is not a direct relationship, but an indirect one, and thus (temporarily) stored within the social network.

Already noted before the general rise of the Internet Wiederhold (1991) concluded in information systems there is a central role assigned to modules that mediate between the users' workstations and data resources. Mediators contain the administrative and technical knowledge to create information needed for decision-making.

Open Source software project have access to a wide variety of tools of which most can function as a mediating module. Bug reports, source code repositories, public forums, documentation, wiki's, websites, and even programming languages and external data archives can have a mediating effect for transferring information and knowledge between project developers.

Thus, if the information and knowledge a developer needs can be retrieved from the (structural) environment he is working in, there is no incentive for contacting other project developers. These tools enable the indirect transferring of information and knowledge. On top of stored information and knowledge in the social network, new information and knowledge is created.

86

7. Conclusions Here, the overall research conclusions are presented. The first subchapter deals with theoretical and methodological research conclusions, where the second subchapter deals with practical research conclusions. Then, the most important research limitations and research flaws are summed up, where after recommendations for future research directions are proposed.

7.1. Research conclusions First, a closer look is taken at the research conclusions of Hinds (2008). Hinds has conducted a similar research and the general research findings are comparable. However, a slight difference is can be remarked. Hinds concludes the social network structure of an Open Source software project community has an important effect on community success. In addition, Hinds (2008) finds the closure of an Open Source software project community is a condition or indicator of community success, but is not a driver or cause of such success. Here, the general research implication sounds the social network structure of an Open Source software project community has no significant relationship with community success. Although this relationship is not significant, it does not necessarily mean it is not of importance. Here, is tried to explain this difference in interpretation.

Conclusion 1 - Social network analysis cannot solely explain factors affecting the chance on success of an Open Source software project community

When individuals are working together, there is always a relationship between those two individuals. Thus, a level of closure is always present in a project community. If the relationships are not a driver or cause of the success of a community, it is plausible the people are of importance and need to be taken into account as well to explain the chance on success of an Open Source software project.

Thus, although social network analysis is of importance to describe and further explore structural properties of Open Source software project communities, it cannot solely explain factors affecting the chance on success of this community.

If the individuals in a community are taken into account as well, and if this is a reflected to organizational literature of teams and work groups, a very general conceptual model can be made. This model is visualized in Figure 10.

87

Figure 10 - General conceptual model

This general model is as old as the literature of groups and work teams, and does also fit Open Source software projects. The amount and quality of resources a project has affects the chance on success of that particular project. And, the allocation of resources, thus the mechanics of how these resources are distributed and are applied for the greatest possible future gain of the project, affects the chance on success of the project as well. As the allocation of resources is done by the project members, the allocation of resources can be noted as a mediating factor for the relation of the amount and quality of resources of a project on the chance of succeeding.

Here, as Open Source software projects may be conceived of as producing information and knowledge products, people are the most important resources of a project, besides money, time, etcetera. The allocation of people may be conceived of as the 'management and organization' of a project. Raymond (2000) states the community is the source of all input that goes into the Open Source system, such as source code, requirement and bug reports. This has been measured by (proxies of) (in)direct relationships between project developers, and thus the social network structure of an Open Source software project.

Proposition 1:

The quantity and quality of project members affects the chance on success of an Open Source software project community, and project management and organization may mediate this relationship.

Conclusion 2 - Paradigm disruption

Hinds (2008) suggests due to surprising results which not match current theories, "the Open Source software project community is actually neither 'team' nor 'community but is a new kind of social entity which is built upon a social-technical development process involving extensive interaction between humans and technical artifacts" and speaks of a paradigm disruption. After discussing this paradigm disruption Hinds opt for a new theory, which is typically associated with

88

disruption, and sets out a set of requirements for this theory and two propositions based on his research findings.

However, initial steps have already been taken to explore and describe these new social entities. Here, an existing premature theory is presented which may fulfill the linking-pin Hinds (2008) is looking for to describe and further explore this new kind of social entity and its performance. Vervest et al. (2006) introduce the concept of smart business networks. A smart business network can be defined as a "developing web of people and organizations, bound together in a dynamic and unpredictable way, creating smart outcomes from quickly (re-)configuring links between actors."

Rather than being a 'simple' virtual community, Open Source software projects can be set out as a type of 'smart business network'. Here, the concept of a smart business network is further set out in respect of an Open Source software project community. First, the word 'smart' implies there is some intelligence within this business network. Rather than relational ties being simple pipes to transport information, a smart network is able to distribute, store, assemble and modify information. And a complex digital network can be smart by improving the utility of information in multiple ways, which is synonymous with creating economic value, according to Vervest et al. (2006).

Open source development project communities can be perceived as smart as well. Most projects have an extensive set of communication and sharing tools which enable the abilities of distribution, storage and assembling and modifying information. Examples include a Subversion system for keeping track of version control of the software releases, and bugs are kept track of enabling daily builds of the source code. Forums, chat program, messengers, etcetera, enable the easily exchange of information and knowledge. And forums and other repositories may be used as information archives as well.

The word 'business' in smart business network does not necessarily literally imply business. Having a shared vision, or (partial) goal, is the least necessary condition to do business, i.e. exchanging or trading information with each other. Thus without the necessity of a relationship between two actors, no (temporary) network can be formed. Open Source software project communities have similar beliefs. An Open Source project is only joined by a developer when his incentives are activated by the visions and goals of the project.

The 'network' of a smart business network is similar to the virtual social network used in this research. Previously, business literature identified 'pipelines' such as supply chains or supply trees where logical steps followed upon each other. Now, it is a flexible network where everyone is able to communicate instantly with each other, rather than communicating via a particular sequence. And,

89

according to Vervest et al. (2006) 'a business network is unlikely to be fully connected, but will be partially connected', which can be seen in Open Source software projects as well.

To sum up what smart business networks are a summation list of Vervest et al. (2006) is provided.

- a group of participating businesses - organizational entities or 'actors' - that form the nodes - linked together via one or more communication networks forming the links, or line, between the nodes - with compatible goals - interacting in novel ways - perceived by each participant as increasing its own value - sustainable over time as a network - resilient if one or more businesses, nodes in the network, malfunctions

Next, Vervest et al. (2006) sum up what capabilities are seen in the smart business networks.

- establishment of common understandings - membership selection - linking - goal setting - interaction - risk and reward management - continual improvement - fault tolerance

Open Source software projects matches the descriptions of smart business networks, and already have innovative ways which enable the mentioned capabilities.

Proposition 2:

An Open Source software project community is a type of smart business network.

Conclusion 3 - The competitive advantage of Open Source software project communities

The paradigm disruption Hinds (2008) notes requires new theory to explain the Open Source phenomenon. Conclusion 2 provides a premature theory of descriptive nature to explore this new kind of social entities. Here, is tried to give feedback about the (new) position of Open Source software project communities in relation to current literature, which was provided in Chapter 2.

Hinds (2008) concludes "the OSSP community is actually neither 'team' nor 'community'". However based on previous literature, here is suggested Open Source software project communities exhibit both characteristics of teams as well as communities. Below, an attempt is done to explain

90

how Open Source software projects incorporate both entities and how they can benefit from this synergy.

Previous commonly noted large Open Source software project communities have an onion- like structure that is based on the level of contribution (Almarzouq et al, 2005). And, the onion-like structure differs from traditional management models involved in information and knowledge creation areas (Nonaka, 1994). However, Krishnamurthy (2002) noted most Open Source software projects are very small. Here, the absence of social structure implies a lack of community, and what is left is just more a virtual team, similar to traditional groups and work teams.

Wenger (1998) distinguishes two types of virtual communities. Where communities of practice serve the purpose to create, expand and exchange knowledge between people, communities of interest server the purpose to inform interested people. In reflection of OSSP communities, small OSSPs can conceived of as communities of practice, i.e. virtual teams which create knowledge products (software). The union-like structure of large OSSP communities can be divided into two. The core group can be conceived of as a community of practice as they are lead developers producing software, and the peripheral developers i.e. the 'crowd' around the core can be conceived of as a community of interest, as their project contribution is primary based on interest in the software. The meritocracy may explain the organizational functioning of the union-like structure, i.e. how peripheral developers may become core developers, and vice versa.

Thus, the main difference between a small and a large OSSP is the large crowd of interested people. Although a large OSSP may have a larger core, it is still relatively small. The competitive advantage of a large OSSP is how the core deals with the crowd. According to Raymond (1998) as Linus' Law states 'given enough eyeballs, all bugs are shallow'. Thus, the larger the community of interest, the bigger the chance someone will notice and fix a problem. In a popular way, Surowiecki (2004) described this idea, which he called wisdom of crowds. Large groups of people are smarter than an elite few, no matter how brilliant - better at solving problems, fostering innovation, coming to wise decisions, even predicting the future.

The main cause of enabling this competitve advantage is the ubiquity of the internet. The technology and its infrastructure, enables (access to) people (and other resources), and their allocation, to new levels; to be 'smart'. Open Source software projects are very scalable and when they grow, they still manage to be flexible. Traditional groups and work teams are limited to the amount, and quality of, resources and their allocation. As they grow problems such as bureaucratic decision- making, institutionalization, group-thinking effects, free-riders, etcetera, may occur. Open Source software projects (probably up to a certain point) do not have these limitations. When one wants to join an Open Source software project, this is possible. In addition, you do not have to be the smartest 91

of the community, as Open Source communities often work as a meritocracy. The more skills and contribution of a developer, the better their position in the project community.

This scale-free effect is described by a power-law distribution. For example, the theoretical value of virtual social networks is covered by Reed's Law (Reed, 2001). Reed's Law states the value of a social network increases exponentially, in proportion to 2N. Reed's Law is based upon Metcalfe's Law which states the value of a one-to-many network, as was common in traditional media, grows in proportion to N2. However, in reflection to OSSPs Metcalfe's Law underestimates the value of an OSSP because it takes only the community of interest into account, and Reed's Law overestimates the value of an OSSP because here the entire OSSP community is conceived of as a community of practice (Reed calls it a 'Group Forming Network'). Here is suggested for a new law which incorporates both laws, to measure the theoretical value of an Open Source software project community. Hereby, the Master Thesis author calls this law the BOSS Law, "Bart's Open Source Software Law". The core value of an OSSP can be measured by Reed's Law, where the theoretical value of the periphery (i.e. the wisdom of the crowd-effect) can be measured by Metcalfe's Law. In Np 2 formula: E = 2 + Ni , where 'E' represents the theoretical economical value of an OSSP community,

'Np' represents the number of individuals of the community of practice (i.e. core group), and 'Ni' represents the number of individuals of the community of interest (i.e. periphery/crowd). This law has important practical implications, because the current tendency is to overestimate the economic value of virtual social network related companies.

Bughin and Hagel (2000) note that virtual communities are not simply communities but are (becoming) a prominent business model of the world wide web, as these virtual communities can combine reach and selectivity based on user needs. Chris Anderson (2006) describes this 'Long Tail' effect of the Internet, visualized in Figure 11.

Figure 11 - The 'Long Tail'

Source: Adapted from Chris Anderson (2006)

92

As Anderson summarizes "our culture and economy are increasingly shifting away from a focus on a relatively small number of hits (mainstream products and markets) at the head of the demand curve, and moving toward a huge number of niches in the tail. In an era without the constraints of physical shelf space and other bottlenecks of distribution, narrowly targeted goods and services can be economically attractive as mainstream fare."

If we reflect this to Open Source software project communities three forces can be described to explain the rise of Open Source software. First, the tools to produce Open Source software are democratized, which means more software can be produced, and thus lengthens the Tail. Secondly, the tools to distribute software packages are democratized as well, which means more access to niche software, and thus fattens the Tail. Thirdly, Open Source software can better connect supply and demand of software than proprietary, commercial software as developers are often users of the software as well, and thus drives the software business from hits to niches.

Proposition 3:

An OSSP community can be conceived of as consisting of both a community of practice and a community of interest.

Proposition 4:

Np 2 The theoretical value of an OSSP community can be measured by the BOSS Law E = 2 + Ni

Conclusion 4 - The measuring problem

Previously, researchers have focused on large Open Source software project communities, and most measured, for example as proposed by Crowston and Howison (2006) are suited for these large OSSPs communities. However, the majority of Open Source software is developed by small teams (Krishnamurthy, 2002) and seen here most of the current measures are not suited to measure the success of these small teams. For example, here was selected on the criteria of a minimum of 50 project messages. However, close to 40% of the project sample did not actively use the bug tracker system as can be seen in Appendix I.

If these findings are reflected to Chris Anderson's 'Long Tail' (2006), briefly set out in the previous conclusion, the often found power-law distributions can be explained. For example, Healy and Schussman (2003) found power-law distributions for the number of downloads and site views. When an Open Source software project is popular, this project is located in the head of the Long Tail (the 'hit' market). These software products will have a high amount of downloads and site views. However, the majority of OSSPs is located in the long tail, and these measures are not the most

93

important of these types of projects. In contrary to proprietary, commercial software, an Open Source software project may be viable if there is only one developer or user.

Perens (2005) noted Open Source software is a difficult product to monetize, which could be a reason it is difficult to measure success of Open Source software projects. The success effect of traditional teams and work groups in an organizational setting is much easier to measure, as this outcome (goals or objectives) in one way or another, in general, can be (as a proxy) monetized.

Hars and Ou (2001) found two types of motivations account for participation in an Open Source project. The first category includes internal factors such as intrinsic motivation and altruism. The second category includes external rewards such as expected future returns and personal needs.

Here is suggested, for small Open Source software project communities the focus of the focus of success must not be put on the outcome of a project in total, such as success as activity or success as output, but on individual success. For example, it is plausible, a lonely developer working on an Open Source project completely fails his project hosted on an Open Source software platform. The premature code of the software is not downloaded by others at all, and his project website is not visited. However, it is possible the developer still achieves success, by learning a new hack, improving his programming skills applying an innovative algorithm, which can be used in future software projects. And, project rank may be an indicator of where a project is located on the Long Tail.

Proposition 5:

The more 'niche' an Open Source software project is, the more success factors need to be focused on the level of individual success. Thus, the more 'hit' an Open Source software project is, the more success factors need to be focused on the level of community success. Individual and community success factors are not mutually exclusive.

7.2. Strategic conclusions It is difficult, due to the exploratory stage of Open Source research, the limited research setup, and the unexpected research findings, to set out useful strategic conclusions which can be applied in practice. Though, it can be argued the Open source phenomenon is a viable alternative form of organization to create information and knowledge products. As Open Source software projects can be conceived of as a type of smart business network, now the underlying principles of Open Source can also be applied to a wide variety of other areas of interest.

94

First, practitioners could (encourage the) use (of) a wide variety of existing, widely adopted tools and Open standards, as shown in the previous chapter, to eliminate the boundaries of traditional groups and work teams and to fully enable the potential of the technology and infrastructure of the internet. Modular design, documentation and netiquette may be of importance to enable this potential.

Secondly, this new organizational phenomenon could also be applied to create physical products, rather than complete virtual products such as software; initiatives are already taken to create Open Source hardware. Collaborating together, sharing information and knowledge, to develop schematics and hardware designs. Where the physical product, the hardware itself, can be produced as 'do it yourself' or by outsourcing (limited) production to a (small) hardware equipment producer.

Thirdly, business information (and knowledge) management is often noted as a core competence of an organization or company. However, it is important to make a distinction between types of information and knowledge. As Perens (2005) noted in the area of Open Source software, for most businesses software is not the profit-center. The technology enables your business, rather than you are making money from it. Thus, software is a cost-center which is unavoidable when doing business. And cost-center technology can be divided into two types, namely differentiating and non- differentiating. Now, if a translation is made to information and knowledge products in general, companies should be encouraged to shift their 'selfish-proprietary-perspective' to a more open and transparent 'peer-to-peer-perspective' as used by Open Source software project communities. Typical problems of this selfish perspective such as 'not-invented-here' and 'reinventing-the-wheel' syndromes should be questioned. Though 'reinventing the wheel' may be an important part for the construction of innovation and new ideas, thus creating information and knowledge, newcomers to Open Source projects, or new practitioners of smart business networks can be taken care of by the use of a wide variety of common tools and accepted standards. This way, the information and knowledge embedded in the projects' infrastructure functions as a safety net, and core members are not interfered from their main tasks.

However, this does not mean companies should simply randomly peer with other companies. They need to carefully select possible network participants which have similar goals to create cost- center non-differentiating information and knowledge products. Thus, it is possible these companies are competitors, though competing within the area of differentiating information and knowledge products, thus keeping differentiating information as a competitive advantage.

In reflection to productive, thus successful, projects and teams in the computer industry area, the classic example of DeMarco and Lister (1999) makes the essential theme of partnership, as these peer-to-peer-relationships can be remarked, visible: 'that owning part of a good work somehow feels better than owning all of it.' 95

Fourthly, although via technology this new social entity enables new (organizational) performance levels which were not seen before, Davenport and Prusak (2000) already concluded to stay down-to-earth in respect to the internet infrastructure:

"What we must remember is that this new information technology is only the pipeline and storage system for knowledge exchange. It does not create knowledge and cannot guarantee or even promote knowledge generation or knowledge sharing in a corporate culture that doesn't favor those activities." and "The medium turns out not to be the message and does not even guarantee that there will be a message."

7.3. Limitations Here, the most important limitations of the research are summed up. These limitations need to be taken into consideration in reflection to the research conclusions.

First limitation was raised by choosing for the SourceForge platform of development. SourceForge is by far the largest Open Source software platform. Much data and statistics are available, which makes it an ideal area for conducting research. However, it is plausible projects hosted on SourceForge are more likely to succeed, as it the most popular platform and more developers and other resources may be available.

It is also plausible developers on SourceForge are aware of the fact they are observed and therefore are more motivated and focused on achieving success. This is known as the Hawthorne effect (Nickels et al, 2002). On other development platforms it is less likely this effect will occur, though it can be much more difficult for researchers to gather data as most platforms do not keep extensive track of their activities like SourceForge.

Other limitations were raised by selecting the sample. A justifiable way was needed to reduce the research population to a sample of workable size. This method resulted in the use of mature projects only, and projects in another stage of product development were left out of the research. Next, the public projects' forums were chosen as main point of view, though it appeared project communication and information and knowledge exchange occur via a wide variety of tools.

Due to Master Thesis research limitations the time-frame of the gathered data was set to a fixed point in time, May 2008, rather than using a longitudinal study design. This way, data interpretation can deviate more than actually is the case, as it is harder to identify anomalies or to correct for seasonal influences or other longitudinal effects. Next, using archival data as primary resource, no 'fresh' data was used. It is commonly known innovation and new technology are rapidly changing in the IT-environment. And, new ideas and initiatives might be as quickly introduced in a

96

range of 3 to 6 months, which makes the learning curve of Open Source project organization steep, and therefore already may derive from archival data. And, the real-time and monthly spider activities of the SDRA on large platform as SourceForge resulted in minor differences between the variables as hours pass by and things already changed on the forums, before a spider-activity was completed. This was already noticeable during the conduction of this research.

In respect to the research variables and investigated communication tools, only a limited amount of success indicators was used and possibly important ones may have been missed, and were left out from the research. Similar, there are many tools and methods to conduct social network analysis. Here, only three structural constructs were taken into account, though alternatives are available and may be of importance. For example, the closure construct could also have been investigated in the software revision repositories, rather than solely focusing on the public project forums. The bridging construct was limited by data availability of the presence of external relationships, rather than the nature of these relationships. The centrality construct was restricted to the forum activities of the administrators. For example, it is possible to take management and organization activities of project administrators into account to measure centrality, such as by taking the set of made-available tools into account, or the amount of documentation and project guidelines.

7.4. Research flaws Unfortunately, minor research flaws occurred which could not be fixed, for various reasons, in the process of conducting this research. These flaws are presented below.

In the sample selection procedure first a look was taken at the SourcForge population, where projects were selected on (1) the availability of a forum, and (2) these forums were not allowed to have the ability of anonymous posting on the project public forums. Next, a minimum total amount of 50 messages was selected, based on research of Krishnamurthy (2002) and Hinds (2008), to research structure.

Unfortunately, it appeared spam messages were still included in the conversations on the project public forums. In the SourceForge Research Data Archive, when a forum message is identified as spam by one of core group members, the code '100' is generated as member ID. These messages were manually eliminated from the projects' conversation volume by the author of this research. As a side effect two projects got below the minimum of 50 forum messages, respectively 39 and 49 messages. These project were still included in the research sample.

Another selection criterion was the minimum of at least 1 administrator, and a minimum of 2 developers in the core group (thus both administrators and core developers). However, it appeared a registered developer does not have to be active on the public forums, and thus is not 97

marked as developer in the perspective of this research. Therefore these core developers and administrators were manually eliminated by the author of this research from the bridging variables as these developers cannot transfer knowledge via the forums, since they are not active. However, due to minor time differences between the spider activities of several research variables of the SRDA, for a few projects the number of core developers and administrators were off by a maximum of 1 developer in relation to the actual number of active developers on the project public forums. For these projects the right amount of developers was manually corrected by the author of this research.

Although these two problems were solved properly, a side effect was the minimum core size of a project was reduced to 1, and both the administrator and developer subgroup minimum were reduced to 0. Two projects, had not an active administrator. The largest problem was 17 projects now just had 1 person in the core group; all administrators and no core developers. This resulted in a loss of 17 projects for the core density measure, thus 96 - 17 = 79 projects. These projects could be eliminated from the research sample, or the selection criterion could be adapted. Chosen was to not do either of those two, as it was expected the effect of inclusion in the sample does not affect the general research findings.

7.5. Recommendations A variety of recommendations for future research can be made. Here, several research directions are proposed.

First, it is plausible the success of an Open Source software project community is moderated by the type of the community or by the hosting platform. It is suggested to conduct similar researchers for Open Source software projects with different characteristics. For example, another Open Source platform may be chosen instead of using SourceForge.net. Other research may be focused on (moderating) characteristics including project development stage, intended audience, sponsored projects, spin-offs, even previously inactive projects which are taken-over, Open Source license types, and etcetera. It is plausible projects with different characteristics include other types of people and have different ways of management and organization, and therefore the chance on success of these projects differs as well.

Next, the proposed conceptual model in which structure does not directly affects the chance on success of an Open Source software project, but a third factor is responsible for this effect, and structure may be a mediating factor for this relationship, need to be further researched and tested.

In respect to social network analysis, which is used to study structural variables measured on actors in a set, more information about the collection of actors need to be included in future research. Not only the fact of the possibility of relationships between actors is important, of similar importance 98

is what exactly is exchanged via these relationships. The management and organization, the communication flows, the information and knowledge exchange in these entities are crucial to understand the Open Source phenomenon.

Another important research area involves the evolution and natural growth of Open Source software projects. Product development lifecycles have been identified and project size in relation to the age of a project, etcetera have been measured. However, not much is known about marginal changes and activities during the project. It is known, internet activity (amount of traffic) cycles during the day, and in days during the week (AMS-IX, 2009). As a majority of Open Source contributors are volunteers, it is interesting in which way, if any, this effect e.g. the development speed of a project.

Another research direction which need to be further explored is the direction which ultimately leads to a portfolio of Open Source software project success measures, as already is initiated by Crowston et al. (2002). It is clear, several of the current measures are not suitable for measuring success or structure of Open Source projects. There is a need for new, and improved, measures to explore the Open Source phenomenon further. Suggested is to include and explore individual success measures of project members.

It is suggested to first conduct more in-depth and detailed case studies to explore individual characteristics and motives of project members, and their management and organization on a project level. In the longer term, when Open Source measures are established and types of Open Source projects can be defined in detail, it is recommended to conduct laboratory experiments, such as agent- based models for Open Source networks, to explore these ecological issues further in a systematic way.

In respect to Open Source software project success measures in the longer term, it is needed to develop new methods which can measure (the effect of) the embeddedness of information and knowledge in a network, since this form of indirect communication between project members is now 'hidden'.

99

8. References Here, first the references from literature are provided. In succession textual annotations are presented where after a list of figures and a list of tables are shown.

8.1. Literature references AlMarzouq, Mohammad, Zheng, Li, Rong, Guang, Grover Varun, (2005), "Open Source: Concepts, Benefits, and Challenges", The Communications of the Association for Information Systems, Volume 16, 2005, Article 40

AMS-IX, (2009), "AMS-IX Traffic Statistics", Amsterdam Internet Exchange, http://www.ams- ix.net/technical/stats/ , last consulted March 2009

Anderson, Chris, (2006), "The Long tail: why the future of business is selling less of more", First edition, Hyperion Books, New York, ISBN: 1-4013-0237-8

Apache Software Foundation, (2009), "How the ASF works", http://www.apache.org/foundation/how-it-works.html#meritocracy , last consulted March 2009

Balkundi, Prasas, Harrison, David A., (2006), "Ties, leaders, and time in teams: strong inference about network structure's effects on team viability and performance", Accepted for publication at the Academy of Management Journal

Baron, Reuben M., and Kenny, David A., (1986),“The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations”, Journal of Personality and Social Psychology, Vol. 51, No.6, pp.1173-1182

Borgatti, Stephen P., Everett, Martin G., (1999), "Models of core/periphery structures", Social Networks, Vol. 21, pp.375-395

Borgatti, Stephen P., Everett, Martin G., (1999), "The centrality of groups and classes", to appear in Journal of Mathematical

Burt, Ronald S., "Structural Holes versus Network Closure as Social Capital", May 2000, University of Chicago and Institute Européen d'Administration d"Affaires (INSEAD), (Pre-print for a chapter in "Social Capital: Theory and Research", edited by Nan Lin, Karen S. Cook, and R.S. Burt. Aldine de Gruyter, 2001)

Bughin, Jacques, Hagel, John III, (2000), "The Operational Performance of Virtual Communities - Towards a Successful Business Model?", Electronic Markets, Volume 10 (4): pp.237-243

100

Capiluppi, Andrea, Lago, Patricia, Morisio, Maurizio, (2003), "Evidences in the evolution of OS project through Changelog Analyses", 3rd Workshop on Open Source Software Engineering, ICSE '03 International Conference on Software Engineering, Portland, Oregon, May 3-11, 2004, pp.19-24

Carley, Kathleen M., (1995), "Computational and Mathematical Organization Theory: Perspective and Directions", Computational and Mathematical Organization Theory, 1:1, pp.39-56

Crowston, Kevin, Annabi, Hala, Howison, James, Masango, Chengetai, (2004), “Towards A Portfolio of FLOSS Project Success Measures”, The 4th Workshop on Open Source Software Engineering, May 25, 2004, Edinburgh, Scotland

Crowston, Kevin and Howison, James, (2004), “The social structure of Free and Open Source software”, Syracuse FLOSS research working paper (submitted for review in November 2004), available at http://floss.syr.edu , last consulted March 01, 2009

Crowston, Kevin, Howison, James, (2006), "Hierarchy and centralization in Free and Open Source Software team communications", School of Information Studies, Syracuse University

Crowston, Kevin, Howison, James, Annabi, Hala (2006)(in press), “Information Systems Success in Free and Open Source Software Development: Theory and Measures”, Software Process: Improvement and Practice (Special Issue on Free/Open Source Software Processes) (Pre-print of publication scheduled for early 2006)

Cohen, Susan G., Bailey, Diane E., (1997), "What makes teams work: group effectiveness research from the shop floor to the executive suite", Journal of Management, 1997, Vol.23, No.3, pp.239-290

Cox, Alan, (1998), “Cathedrals, Bazaars and the Town Council”, Slashdot, 13 October 1998, available at http://slashdot.org/features/98/10/13/1423253.shtml , last consulted March 01, 2009

Davenport, Thomas H., Prusak, Lawrence, (2000), "Working knowledge: how organizations manage what they know", Excerpt of 'Working Knowledge: How Organizations Manage What They Know' by Thomas H. Davenport and Lawrence Prusak., available at http://www.acm.org/ubiquity/book/t_davenport_1.html , last consulted February 24, 2009

DeMarco, Tom, Lister, Timothy, (1999), "Peopleware: productive projects and teams", Second Edition, Dorset House Publishing, ISBN: 0-932633-43-9

De Souza, Clarisse Sieckenius, Preece, Jenny, (2004), "A framework for analyzing and understanding online communities", Interacting with Computers, The Interdisciplinary Journal of Human-Computer Interaction (accepted, in press)

101

Dul, Jan and Hak, Tony, (2008), "Case Study Methodology in Business Research", First edition 2008, Butterworth-Heinemann, Elsevier Ltd., United Kingdom, ISBN: 978-0-7506-8196-4

Fischer, Gerhard, (2001), "Communities of Interest: Learning through the Interaction of Multiple Knowledge Systems", IRIS'24, Norway

Gao, Yongqin, Freeh, Vince, Madey, Greg, (2003), "Analysis and Modeling of Open Source Software Community", available at http://www.nd.edu/~oss/Papers/NAACSOS_modeling.pdf , last consulted March 01, 2009

GNU, (2009), "The GNU Manifesto", Free Software Foundation, http://www.gnu.org/gnu/manifesto.html , last consulted March 01, 2009

Graham, Paul, (2003), "Design and Research", article can be found on http://www.paulgraham.com/desres.html, last consulted February 24, 2009

Graham, Paul, (2004), “Hackers & Painters – Big ideas from the computer age”, O’Reilly Media Inc., May 2004, First Edition, CA, United States of America, ISBN: 0-596-00662-4

Granovetter, Mark, (1983), “The Strength of Weak Ties: A Network Theory Revisited”, Sociological Theory, Vol. 1, pp. 201-233, American Sociological Association, 1983

Greenberg, Jerald, Baron, Robert A., (2002), "Behavior in Organizations - Understanding and Managing the Human Side of Work", Eight Edition, Prentice Hall, Pearson Education, ISBN: 0-13- 066491-X

Grewal, Rajdeep, Lilien, Gary L., Mallapragada, Girish, (2006), "Location, Location, Location: How Network Embeddnes Affects Project Success in Open Source Systems", Management Science, Vol. 52, No.7, July 2006, pp.1043-1056

Hanneman, Robert A., and Riddle, Mark, (2005), "Introduction to social network methods", Riverside, CA:University of California, Riverside, 2005, Published in digital form at http://faculty.ucr/~hanneman , last consulted March 01, 2009

Hackman, J. Richard , (1987), "The design of work teams", excerpt from The handbook of organizational behavior, J.W. Lorsch, Englewood Cliffs, NJ, Prentice Hall, pp.315-342

Hars, Alexander, Ou, Shaosong, (2001), "Working for Free? - Motivations of Participating in Open Source Projects", Proceedings of the 34th Hawaii International Conference on System Sciences

102

Healy, Kieran and Schussman, Alan (2003), "The Ecology of Open-Source Software Development", University of Arizona, January 23, 2003, available at http://opensource.mut.edu/papers/healyschussman.pdf , last consulted February 23, 2009

Hinds, David, (2008), "Social network structure as a critical success condition for Open Source software project communities", A dissertation submitted in partial fulfillment of the requirement for the degree of Doctor of Philosophy in Business Administration, Florida International University, Miami, Florida, 2008

Hinds, David, and Lee, Ronald M., (2008) “Social Network Structure as a Critical Success Condition for Virtual Communities”, Proceedings of the 41st Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers, Inc. (IEEE), 2008

Howison, James and Conklin, Megan S., (2005), "OSSmole: A collaborative repository for FLOSS research data and analysis", January 19, 2005

Hummel, Johannes and Lechner, Ulrike, (2002), “Social Profiles of Virtual Communities”, Proceedings of the 35th Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers, Inc. (IEEE), 2002

Krishnamurthy, Sandeep, (2002)“Cave or Community? – An empirical examination of 100 Mature Open Source Projects”, University of Washington, Bothell, United States of America, May 2002

Lievrouw, Leah A., Livingstone, Sonia, (2006), “The handbook of new media – updated student edition”, SAGE Publications Ltd., London, United Kingdom, ISBN 1-41720-1873-1

Leimeister, Jan Marco; Sidiras, Pascal; Kremar, Helmut, (2004), “Success factors of virtual communities from the perspective of members and operators: an empirical study”, Proceedings of the 37th Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers, Inc. (IEEE), 2004

Lin, Hui, Fan, Weiguo, Wallace, Linda, (2007), “An empirical study of web-based knowledge community success”, Proceedings of the 40th Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers, Inc. (IEEE), 2007

MacCormack, Alan, Rusnak, John, Baldwin, Carliss, (2004), "Exploring the structure of complex software designs: an empirical study of Open Source and proprietary code", Harvard Business School Working Paper Number 050916, draft dated June 9th, 2005

Nickels, William G., McHugh, James M., McHugh, Susan M., (2002), "Understanding Business", International Edition 6, McGraw-Hill Higher Education, ISBN 0-07-232054-0 103

Nonaka, Ikujiro, (1994), "A Dynamic Theory of Organizational Knowledge Creation", Organization Science, Vol.5, No.1, February 1994, pp.14-37

Nonnecke, Blair, Preece, Jenny, (2001), “Why lurkers lurk”, Americas Conference on Information Systems 2001

Madey, Greg, Free, Vincent, Tynan, Renee, (2002), "The Open Source Software Development Phenomenon: An Analysis Based On Social Network Theory", Eighth Americas Conference on Information Systems 2002, pp.1806-1813

Maguire, James, (2007), "The SourceForge Story", October 2007, http://itmanagement.earthweb.com/cnews/article.php/12035_3705731 , last consulted February 8, 2009

OSD, (2009), "The Open Source Definition (Annotated)", Open Source Initiative, http://www.opensource.org/docs/definition.php , last consulted March 01, 2009

Perens, Bruce, (2005), "The Emerging Economic Paradigm of Open Source", Cyber Security Policy Research Institute, George Washington University

Philips, (2009), Royal Philips website, http://www.philips.com/global/index.page , last consulted March 01, 2009

Powell, Anne, Piccoli, Gabriele, Ives, Blake, (2004), "Virtual Teams: A Review of Current Literature and Directions for Future Research", The DATA BASE for Advances in Information Systems, Winter 2004 (Vol. 35, No. 1)

Raymond, Eric Steven, (1998),“The Cathedral and the Bazaar”, First Monday, volume 3, number 3 (March), available at http://www.firstmonday.org/issues/issue3_3/raymond/ , last consulted February 23, 2009

Raymond, Eric Steven, (2000), "Homesteading the Noosphere", version 3.0, article can be found on http://www.catb.org/~esr/writings/cathedral-bazaar/ , last consulted February 24, 2009

Reed, David P, (2001), "The Law of the Pack", Harvard Business Review, February 2001, reprint F0102c, pp.23-24

Rothfuss, Gregor J. (2002), "A framework for Open Source projects", Master Thesis in Computer Science, Department of Information Technology, University of Zurich, Zurich, November 12, 2002

104

Sangwan, Sunanda, (2005), “Virtual community success: A uses and gratifications perspective”, Proceedings of the 38th Hawaii International Conference on System Sciences, Institute of Electrical and Electronics Engineers, Inc. (IEEE), 2005

Schenkel, Andrew and Teigland, Robin, (2008), “Improved organizational performance through communities of practice”, Journal of Knowledge Management, Volume 12, Number 1, 2008, pp.106- 118

Spolsky, Joel (2005), "Hitting the High Notes", July 25, 2005 http://www.joelonsoftware.com/articles/HighNotes.html , last consulted February 8, 2009

Stewart, Katherine J., Ammeter Tony, (2002), "An exploratory study of factors influencing the level of vitality and popularity of Open Source projects", Twenty-Third International Conference on Information Systems, 2002

Strader, Troy J., Lin, Fu-Ren, Shaw, Michael J., (1998), "Information infrastructure for electronic virtual organization management", Decision Support Systems, 23, 1998, pp.75-94

SourceForge.net (2007), " VA Software Corporation Announces Name Change to SourceForge, Inc.", http://ir.corp.sourceforge.com/phoenix.zhtml?c=82629&p=irol- newsArticle&ID=1037965&highlight= , May 24, 2007, last consulted February 8, 2009

Surowiecki, James, (2004), "The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, economics, societies and nations", First Anchors Book Edition, August 2005, ISBN: 0-385-72170-6

Tuckman, Bruce W., Jensen, Mary Ann C. (1977), "Stages of Small-Group Development Revisited", Group & Organization Studies, December 1977, 2, 4, pp.419-427

Vervest, Peter; Preiss, Kenneth; van Heck, Eric; Pau, Louis-François, “The Emergence of Smart Business Networks”, Journal of Information Technology, 2004

Weick, Karl, E., Book Review Symposium of “The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA”, (Vaughan, Diane, Chicago: University of Chicago Press, 1996, 575pp); Administrative Science Quarterly, June 1997, 42, 2, pg. 395-401

Wellman, Barry, (1996), “An Electronic Group is Virtually a Social Network”, September 1996 (Almost final version of Chapter 9 in Sara Kiesler, ed., ‘Culture of the Internet’, Hillsdale, NJ: Lawrence Erlbaum, 1997, pp. 197-205)

105

Watts, Duncan J., (2004), "The "New" Science of networks", Annual Reviews, Review in Advance 30, Department of Sociology, Columbia University, New York, March 9, 2004, pp.243-270

Wasko, McLure M, Faraj S., (2000), "'It is what one does': why people participate and help others in electronic communities of practice", Journal of Strategic Information Systems, Vol. 9, 2000, pp.155- 173

Wasserman, Stanley, Faust, Katherine, (1994), "Social network analysis - methods and applications", Cambridge University Press, ISBN: 0-521-38707-8

Wenger, Etienne (1998), "Communities of Practice: Learning as a ", Systems Thinker, June 1998

Wiederhold, Gio, (1991), "Mediators in the Architecture of Future Information Systems", Stanford University, September 1991, (an edited version was published in The IEEE Computer Magazine, March 1992)

Xu, Jin, Gao, Yongqin, Scott, Christley, Madey, Gregory, (2005), "A topological analysis of the Open Source software development community", Proceedings of the 38th Hawaii International Conference on System Sciences

Yang, Heng-Li, Tang, Jih-Hsin, (2004), "Team structure and team performance in IS development: a social network perspective", Information & Management, Vol. 41 (2004), pp.335-349

Ye, Yunwen, Kishida, Kouichi, Yoshiyuki, Nishinaka, Yamamato, Yasuhiro, Nakakoji, Kumiyo, (2002), "Evolution Patterns of Open-Source Software Systems and Communities", International Workshop on Principles of Software Evolution (IWPSE2002), Orlando, FL, May 19-20, 2002

106

8.2. Textual annotations [*1] Alternative Open Source hosting platforms Name Web Address Freshmeat http://freshmeat.net also a SourceForge trademark Savannah http://savanna.gnu.org or http://savanna.nongnu.org Open Source Flash http://osflash.org RubyForge http://rubyforge.org Tigris.org http://www.tigris.org BountySource http://www.bountysource.com BerliOS http://www.berlios.de German and Spanish platform JavaForge http://www.javaforge.com

Websites data sources SourceForge.net http://www.sourceforge.net SRDA http://zerlot.cse.nd.edu/mediawiki/index.php?title=Main_Page FLOSSmole http://ossmole.sourceforge.net/

8.3. List of figures

Figure 1 - Conceptual model ...... 29

Figure 2 - Development framework for social network constructs ...... 32

Figure 3 - Social network model of community success ...... 37

Figure 4 - Page views (left) and software downloads (right) vs. number of core developers ...... 56

Figure 5 - Project age (left) and percent of administrators (right) vs. the number of developers ...... 57

Figure 6 - The correlation between the number of threads and conversation volume ...... 60

Figure 7 - [Closure] Research findings (left) and new conceptual model (right) ...... 78

Figure 8 - [Bridging] Research finding (left) and new conceptual model (right) ...... 81

Figure 9 - [Centrality] Research findings (left) and new conceptual model (right) ...... 83

Figure 10 - General conceptual model ...... 88

Figure 11 - The 'Long Tail' ...... 92

107

8.4. Tables

Table 1 - Social network constructs overview ...... 36

Table 2 - Explanation of SDRA core tables ...... 42

Table 3 - Overview of the SF Games/Entertainment section ...... 49

Table 4 - SourceForge projects development statuses ...... 50

Table 5 - Developers per project ...... 51

Table 6 - Forum posts: sum of 60 days previous to June 2008 ...... 51

Table 7 - Overview of sample selection criteria ...... 54

Table 8 - Finding 1 of Krishnamurthy - small groups ...... 55

Table 9 - Average generated discussion per day on the project public forums ...... 56

Table 10 - Rotated component loading for accepted (log transformed) dependent variables ...... 62

Table 11 - Descriptive statistics of research variables ...... 64

Table 12 - Correlation matrix of research variables...... 64

Table 13 - Summary of regression on group density ...... 67

Table 14 - Summary of regression on core density...... 68

Table 15 - Summary of regression on peripheral two-mode density ...... 68

Table 16 - Summary of regression on core membership degree ...... 69

Table 17 - Summary of regression on administrator membership degree ...... 69

Table 18 - Summary of regression on administrator class centrality ...... 70

Table 19 - Summary of regressions of project rank on six social network constructs ...... 71

Table 20 - Summary of test results for closure hypotheses ...... 72

Table 21 - Summary of test results for bridging hypotheses ...... 74

Table 22 - Summary of test results for leader centrality hypotheses ...... 75

Table 23 - Summary of test results for project rank hypotheses ...... 76 108

9. Appendices

A. LIST OF KEYWORDS ...... 110 B. OPEN SOURCE SOFTWARE CONTRIBUTORS ...... 111 C. MEASUREMENT OF CONSTRUCTS ...... 114 D. BRIEF HISTORY OF SOURCEFORGE.NET ...... 115 E. SOURCEFORGE SCREENSHOTS ...... 117 F. SRDA ENTITY-RELATIONSHIP DIAGRAM ...... 121 G. QUERIES ...... 122 H. FLOSSMOLE JUNE 2008 SOURCEFORGE COLLECTION ...... 124 I. BUG TRACKERS ...... 125 J. SOURCEFORGE PROJECT RANKING FORMULA ...... 128 K. NORMALITY CHECKS ...... 129 L. PP PLOTS ...... 135 M. ONE-SAMPLE KOLMOGOROV-SMIRNOV TESTS ...... 141 N. QQ PLOTS AND DETRENDED QQ PLOTS ...... 142 O. HOMOSCEDASTICITY CHECKS ...... 146 P. MULTICOLLINEARITY CHECKS ...... 153 Q. DETAILED REGRESSION ANALYSES ...... 159

109

A. List of keywords Administrators Project leaders Bridging Bridging ties are connections of social network members to other social networks. Bridging is measured by affiliation degree Centrality Centrality is the extent to which the project leaders is the spill, related to the internal information flows, to the project group. Centrality is measured by leader centrality Closure (coreness) The extent to which the ties in a network are connected with each other. Closure is measured by density Core developers Lead developers Core group The administrator subgroup and the core developer subgroup taken together Group A collection of two or more interacting individuals with a stable pattern between them to share common goals and who perceive themselves as being a group (Greenberg and Baron, 2003) Open Source software Software which is freely redistributable and can readily be evolved and modified to fit changing need (Raymond, 1998) Open Source software The total amount of activities needed to develop a particular piece of project (OSSP) Open Source software Open Source software Population of individuals contributing to an Open Source software project community project Peripheral developers All individuals who are somehow involved in a project, and who are not project leaders or lead developers Smart business network A developing web of people and organizations, bound together in a dynamic and unpredictable way, creating smart outcomes from quickly (re)-configuring links between actors (Vervest et al., 2006) Social network A set of people (or organizations or other social entities) connected by a set of socially-meaningful relationships (Wellman, 1996) Social network analysis A set of methods and applications suitable for analyzing network data. Social network perspective The social network perspective focuses on the structure of relationships between social entities, and the nature of that structure, rather than the attributes of these entities themselves (Wasserman and Faust, 1994) Social network structure (The pattern of) all interactions between individuals of a social network Success as activity The quantity of participation of OSSP community members Success as output The quantity of produced software of an OSSP community Team A group whose members have complementary skills and are committed to a common purpose or set of performance goals for which they hold themselves mutually accountable (Greenberg and Baron, 2003) The Long Tail A popular description of Chris Anderson (2006) of the impact of the internet's infrastructure and technology on business models Virtual community A community in which the primary mode of interaction is electronic (online/virtual) and not face-to-face (Hinds and Lee, 2008) Wisdom of crowds Large groups of people are smarter than an elite few, no matter how brilliant -better at solving problems, fostering innovation, coming to wise decisions, even predicting the future (Surowiecki, 2004)

110

B. Open Source software contributors Based upon social science some exploratory research is conducted which are incentives of Open Source software project developers to actually contribute to a project. Hars and Ou (2001) identify two types of motivation why developers participate in Open Source software projects. The first category of motivation includes internal factors such as intrinsic motivation and altruism factors. The second category of motivation includes external rewards such as expected future returns and personal needs.

To understand incentives of Open Source software project contributors it is necessary to identify these developers, as the incentives of developers may differ from each (type of) person. Perens (2005) made an overview of common contributors to Open Source software projects. Generally, people think of 'volunteers' as contributors. But these volunteers are not ordinary people, as most have a more than average interest in, or have tight relationships with, the computer and information technology industry.

Here, the types of project contributors who Perens (2005) identified are discussed on the basis of the perspective of types of motivations of Hars and Ou (2001), to provide insight in these types of people. Each of these types work differently within the economics of Open Source software, as each draw from different sources to fund their contributions.

Volunteers

The majority of Open Source software developers perceive contributing to a project as a hobby activity. However, Perens (2005) noted volunteer is a more suitable definition than a hobbyist, as most contributors have professional ties with the information technology area. Motivations for contributing to a project are various and include using the project to develop (programming) skills or getting familiar with a particular programming language, distraction from the work-related programmed software, for fun, topic of interest, etcetera.

Software developers are often compared to artists (Perens 2005, Graham 2004, Spolsky, 2005). Not only in terms of abilities and skills, for example as Spolsky (2005) states 'Five Antonio Salieris won't produce Mozart's Requiem. Ever. Not if they work for 100 years', but as well for prestige. Open Source software volunteers just might want to show what they are capable of.

Academic contributors

A large part of contributions to Open Source software projects is done by scientists and academic researchers. Scientific experiments and other research need to be repeatable and verifiable for other scientists. Open Source enables the transparency of software. And, scientific research is 111

often specialized, and therefore proprietary software is not often applicable. Another advantage is by developing software via an Open Source model costs can be shared between the researchers and institutions.

Linux distributions

A general misperception of Open Source is companies distributing Linux and other Open Source Operating Systems create these products. However, these companies bundle Open Source software into software packages based on a Open Source platform. Their profit comes from these integrating activities, and also from support activities by fixing bugs and delivering support to customers of these packages.

Companies with a single Open Source program as product

Within this category of contributors a more specific distinction can be made. First, there are companies which have a mixed Open Source and proprietary licensing model. These companies sell identical software via two licenses, one commercial and one Open Source license. In general, the Open Source license may not be used for commercial purposes, or lacks support or additional software such as plug-ins or libraries. The Open Source license lowers entry barriers choosing for this software. Secondly, there are companies which sell an Open Source product as their core product, and make money from proprietary additions. An Open Source software product, or an Open Source infrastructure may be the fundament for creating proprietary software. Finally, there are companies which specialize in a single Open Source program accompanied with services. Although, the Open Source software may be free itself, profit is made from a wide variety of services delivered to support this software.

Services businesses

Service businesses sell Open Source related services to other companies, including consultancy, support activities, integrating activities, bug fixes and software control, etcetera. Although these companies contribute to the development of Open Source software, their individual impact on Open Source software development is relatively small.

Hardware manufacturers

Hardware manufacturers are developing software with the purpose of enabling the technology of their hardware products. In contrast to software, hardware is hard to copy. Previously, hardware manufacturers spent a lot of money developing proprietary software systems. Now, Linux and other Open Source Operating Systems are viable, and cheaper, alternatives to enable their hardware. Thus,

112

it is not their profit-center. And, via support services such as consultancy, installation, and training additional profit can be made.

End-user businesses

A lot of web companies can be perceived as end-user businesses of Open Source software. These companies use Open Source software to enable their web activities. By using Open Source software they not only can reduce costs, by adapting the Open Source software to their own needs they can also quickly react on market changes. Thus, they have more control over their software than with proprietary software, and have reduced their risk.

Governments

Ideally, governments are meant for public benefit, and therefore should pursuit to maintain neutrality to software vendors. Next, Open Source software is a viable solution as the software is independent and transparent. In addition, software operating costs may be reduced because development costs of Open Source software are generally shared. The transparency of the source code may add trust to the general public.

113

C. Measurement of constructs Here, an overview of the measurement of constructs is provided. For each construct, the appropriate variables are given, where after the measuring of these variables is noted. Construct Variable Measuring Group Closure Group Density 2L / g(g-1)

where L = # of ties g = # of group members Core Closure Core Density 2L / c(c-1)

where L = # of ties c = # of core group members Peripheral Peripheral Two- Ls1 / c * s1 Two-Mode Mode Density Closure where Ls1 = # ties of core group with peripheral subgroup s1 = # of peripheral subgroup members c = # of core group members Core Bridging Core Membership d^c = Xc / s2 Degree where Xc = # of outgroup ties of core group members s2 = # of core subgroup members d^c = core mean nodal degree (tie degree) Administrator Administrator d^a = Xa / s3 Bridging Membership Degree where Xa = # of outgroup ties of administrators s3 = # of administrators d^a = administrator mean nodal degree (tie degree) Leader Administrator Class Ls12 / (s3) Centrality Centrality where Ls12 = # of ties with peripheral + core subgroup (s3) = administrator subgroup, where size of administrator subgroup is represented as '1'. Thus the effect of ties from s12 to multiple members of s3 are counted only once Success Software Downloads Total number of software downloads Page Views Total number of page views Web Hits Total number of web hits Rank Mean project Rank of May 2008 (Rank is measured bi-monthly) Trackers Opened Number of trackers opened 60 days prior to June 2008 Trackers Closed Number of trackers closed 60 days prior to June 2008 Controlling Age Age of a project in days for Conversation Volume Total number of a projects' public forum messages Threads Total number of a projects' public forum threads Group Size Group size of a project Core Size Core size of a project 114

D. Brief history of SourceForge.net SourceForge roots go back to November 1999, though the roots of its creater VA Linux go even further back in time. In 1993, two graduate students of Stanford university founded VA Research. James Vera and Larry Augustin (note the first letter of their surnames) built and sold personal computers which were equipped with a pre-installed Open Source Operating System called Linux. Because these systems were much cheaper and more user-friendly than the then current standard solution of heavy workstations equipped with the Unix Operating System, VA Research grew rapidly.

After 5 years of business, in 1998, VA Research had a market share of approximately 20 percent of the Linux hardware market, worldwide sales of over 100 million Dollar, and a profit margin of 10 percent. In the beginning of 1999 VA Research merged with their number one competitor Linux Hardware solutions. As a result VA Research changed their name into VA Linux (Systems) (SourceForge, 2007).

In December 1999 VA Linux had its Initial Public Offering (IPO). The day the company went public its stock rocketed from 30 Dollar to nearly 240 Dollar in a single day. Almost a 700% return, which is still one of the most successful IPO's ever (Maguire, 2007).

Just a few months before the IPO VA Linux got the idea to start SourceForge.net. As Maguire (2007) sets out, VA Linux decided to launch a website to host the work of Open Source software developers. Offering an extensive set of tools to support these software developers as a totally free service. Within several weeks the SourceForge platform was developed by just seven developers and launched in November 1999.

At the end of 1999 just a small group of people had joined SourceForge, resulting in a small group of projects. Not totally strange, as in that time generally Open Source software was only known by highly skilled technical people. But this changed. At the end of the following year, in 2000, SourceForge had several thousands of projects on their platform, and at the end of 2001 almost 30,000 projects were registered and active, adding hundreds of projects per day. As of March 2008, according to SourceForge it hosts 174,289 registered projects and has 1,827,322 registered users.

Although SourceForge now is highly successful, SourceForge almost did not survive due to the rapidly growth of the platform and the lack of income. In the period of 2003 to 2005 the SourceForge site nearly cracked down due to heavy traffic. Since the original code was not very scalable, most functionalities got broken.

115

Surprisingly, in the beginning of 2006 VA Research got their grip back on SourceForge. The staff was increased to nearly 30 developers and the infrastructure and most of the source code of the SourceForge site systemically got rewritten. From then on the platform flourished as never before.

Several Open Source software projects hosted on the SourceForge platform achieved enormous success. In April 2006, JBoss, an Open Source Java based application server, was acquired by Red Hat for 350 million Dollar (and approximately 70 million Dollar in subject to the achievement of certain future performance metrics). Most popular success which initially hosted on SourceForge is Zimbra. Zimbra is an Open Source collaboration suite that supports email and group calendars. In September 2007, Yahoo acquired Zimbra for a similar amount of approximately 350 million Dollar. Another major success is SugarCRM, an Open Source customer relationship management program. Founded in April 2004, in the period of August 2004 to October 2005 SugarCRM raised approximately 26 million Dollar of venture capital in three rounds of investments. In February 2008 they raised an additional amount of 20 million Dollar, totaling 46 million Dollar of funding.

116

E. SourceForge screenshots

The screenshots in this Appendix provide an overview of important parts of the SourceForge platform. All screenshots were taken on October 14, 2008. As an example the CDex project is taken (project Unix name: cdexos; project ID = 567). Next to the source URL, a generic URL is provided to look at other projects.

Figure E.1 - SourceForge platform homepage

Source URL: http://sourceforge.net

117

Figure E.2 - SourceForge project homepage

Source URL: http://sourceforge.net/projects/cdexos Generic URL: http://sourceforge.net/project/project Unix name

Figure E.3 - SourceForge project tracker system overview

Source URL: http://sourceforge.net/tracker2/?group_id=567 Generic URL: http://sourceforge.net/tracker2/?group_id=xxx where xxx is project ID

118

Figure E.4 - SourceForge project member page

Source URL: http://sourceforge.net/project/memberlist.php?group_id=567 Generic URL: http://sourceforge.net/project/memberlist.php?group_id=xxx where xxx is project ID

Figure E.5 - SourceForge project forum topic listing

Source URL: http://sourceforge.net/project/memberlist.php?group_id=567 Generic URL: http://sourceforge.net/project/memberlist.php?group_id=xxx where xxx is project ID

119

Figure E.6 - SourceForge project forum discussion text

Source URL: http://sourceforge.net/forum/forum.php?forum_id=1794 Generic URL: http://sourceforge.net/project/forum.php?forum_id=xxx where xxx is one of the forum IDs of a project group.

Figure E.7 - SourceForge project statistics

Source URL: http://sourceforge.net/project/stats/?group_id=567&ugn=cdexos Generic URL: http://sourceforge.net/project/stats/?group_id=xxx&ugn=yyy where xxx is the project ID and yyy is the project Unix name

120

F. SRDA Entity-Relationship Diagram

Source: SourceForge Research Data Archive 121

G. Queries Here, the most used, and most important queries are provided which were used to retrieve information from the SourceForge Research Data Archive. A query can exist of two or three parts. After 'SELECT' the columns of a table are filled in. After 'FROM' the table is filled in, where after 'WHERE' a restriction can be filled in.

List of all projects SELECT DISTINCT b.group_forum_id, b.group_id, b.allow_anonymous FROM sf0508.forum_group_list b

List of projects which have at least one forum enabled SELECT DISTINCT a.* FROM sf0508.groups a WHERE a.use_forum = 1 ORDER BY a.group_id

List of all projects which allow anonymous posting on their forums SELECT DISTINCT b.group_forum_id, b.group_id, b.allow_anonymous FROM sf0508.forum_group_list b WHERE b.allow_anonymous = 1

List of all projects which allow anonymous posting on their forums and have a minimum of 50 messages on their forums SELECT DISTINCT b.group_forum_id, b.group_id, b.allow_anonymous, a.count FROM sf0508.forum_group_list b, sf0508.forum_agg_msg_count a WHERE a.group_forum_id = b.group_forum_id AND a.count > 50 AND b.allow_anonymous = 1

List of all projects which not allow anonymous posting on their forums SELECT DISTINCT b.group_forum_id, b.group_id, b.allow_anonymous FROM sf0508.forum_group_list b WHERE b.allow_anonymous = 0

List of all projects excluding all anonymous forums SELECT DISTINCT a.group_id, a.group_name, a.use_forum FROM sf0508.groups a, sf0508.forum_group_list b WHERE a.group_id = b.group_id AND b.allow_anonymous = 0 ORDER BY group_id

List of all projects which not allow anonymous posting on their forums and have a minimum of 50 messages on their forums SELECT DISTINCT b.group_forum_id, b.group_id, b.allow_anonymous, a.count FROM sf0508.forum_group_list b, sf0508.forum_agg_msg_count a WHERE a.group_forum_id = b.group_forum_id AND a.count > 50 AND b.allow_anonymous = 0

Similar; extended version SELECT DISTINCT a.group_id, a.unix_group_name, a.license, a.register_time, a.use_forum, c.count, d.user_id FROM sf0508.groups a, sf0508.forum_group_list b, sf0508.forum_agg_msg_count c, sf0508.user_group d WHERE a.group_id = b.group_id AND d.group_id = a.group_id AND b.group_forum_id = c.group_forum_id AND b.allow_anonymous = 0 AND c.count > 50 ORDER BY a.group_id

122

Obtain all forum IDs of projects which successfully meet the selection criteria SELECT DISTINCT a.group_id, b.group_forum_id, c.count FROM sf0508.groups a, sf0508.forum_group_list b, sf0508.forum_agg_msg_count c, sf0508.trove_cat d WHERE d.trove_cat_id = 12 AND a.group_id = b.group_id AND b.group_forum_id = c.group_forum_id AND b.allow_anonymous = 0 AND a.use_forum = 1 AND c.count > 50 ORDER BY a.group_id Criteria: use a forum, no anonymous posting, at least a minimum of 50 forum messages, mature project stage

Group development status SELECT DISTINCT a.unix_group_name, a.group_id, a.license, a.register_time, b.trove_group_id, b.trove_cat_id, c.shortname FROM sf0508.groups a, sf0508.trove_group_link b, sf0508.trove_cat c WHERE a.group_id = b.group_id AND b.trove_cat_id = c.trove_cat_id AND b.trove_cat_root = 6 ORDER BY a.group_id Trove_cat_root = 6 means 'development status', trove_cat_id = 6 means 'mature' Deeplinking is also possible: d.trove_cat_id = 12 as 'mature development status'

Information of core users per group SELECT b.user_name, b.user_id, a.group_id, b.admin_flags FROM sf0508.user_group a, (SELECT b.user_name, b.user_id, a.admin_flags FROM sf0508.user_group a, sf0508.users b WHERE group_id = XXX AND a.user_id = b.user_id ORDER BY b.user_name) AS b WHERE a.user_id = b.user_id ORDER BY b.user_name group_id= XXX

Messages per group SELECT DISTINCT b.group_forum_id, b.posted_by, b.subject, b.date, b.is_followup_to, b.thread_id, b.has_followups, b.most_recent_date FROM sf0508.forum_group_list a, sf0508.forum b WHERE (a.group_id = XXX) AND a.group_forum_id = b.group_forum_id a.group_id = XXX

123

H. FLOSSmole June 2008 SourceForge collection This appendix shows an overview of the FLOSSmole data files of June 2008. Although the 19 files are just compressed into 24.7MB, these files represents 130.0MB of data in plain text format. The files are numbered, which is only used to refer to these files in the text of this Master Thesis.

FLOSSmole June 2008 SourceForge (SF) collection

# Filename Description [1] sfRawUserIntData2008-Jun.txt.bz2 SF Project User Interfaces [2] sfRawTrackerData2008-Jun.txt.bz2 SF Project Tracker Numbers [3] sfRawTopicData2008-Jun.txt.bz2 SF Project Topics [4] sfRawStatusData2008-Jun.txt.bz2 SF Project Statuses [5] sfRawRealUrlData2008-Jun.txt.bz2 SF Project URLs [6] sfRawRanksData2008-Jun.txt.bz2 SF Project Ranks [7] sfRawProgLangData2008-Jun.txt.bz2 SF Project Programming Languages [8] sfRawOpSysData2008-Jun.txt.bz2 SF Project Operating Systems [9] sfRawLicenseData2008-Jun.txt.bz2 SF Project License Types [10] sfRawIntAudData2008-Jun.txt.bz2 SF Project Intended Audiences [11] sfRawForumPosts2008-Jun.txt.bz2 SF Project Forum Post Numbers [12] sfRawDownloadsData2008-Jun.txt.bz2 SF Project Downloads [13] sfRawDonorsData2008-Jun.txt.bz2 SF Project Donors [14] sfRawDeveloperProjectData2008-Jun.txt.bz2 SF Developers on Projects [15] sfRawDeveloperData2008-Jun.txt.bz2 SF Developers [16] sfRawDbEnvData2008-Jun.txt.bz2 SF Project Database Environments [17] sfProjectList2008-Jun.txt.bz2 SF Project List [18] sfProjectInfo2008-Jun.txt.bz2 SF Project Information [19] sfProjectDesc2008-Jun.txt.bz2 SF Project Descriptions

Source: http://code.google.com/p/flossmole/downloads/list , last consulted February 27, 2009

If one would like to look up data for non-research purposes in respect to Open Source software hosting platforms it is suggested to use FLOSSmole as these data are freely available, in contrary to the SourceForge Research Data Archive (SRDA) for which first access need to be requested.

Here, the provided sample data are limited due to SRDA terms and conditions, and to preserve privacy of the SourceForge.net users.

124

I. Bug trackers Here, a closer look is taken at the bug tracker system. The tracker system is a system where any project developer can report a bug by opening a tracker. A core developer or an administrator can close the tracker when the bug is solved or appropriate action is taken. Due to these statistics need to be manually retrieved per month for every project group, which is an extensive job, first is focused on the trackers of May 2008, parallel to the timeframe of the other research data.

Of the 96 projects 93 (96.9%) projects have an active bug tracker system. A tracker system is registered as active, when one of the project administrators has enabled the system, and at least one tracker have been opened. To gain insight in the tracker system several descriptive statistics are shown in Table I.1.

Table I.1 - SourceForge sample bug tracker statistics, May 2008

Trackers opened Frequency Trackers closed Frequency 0 63 0 73 1 8 1 2 2-5 9 2-5 8 6-10 8 6-10 4 11+ 5 11+ 6

Total 93 projects Total 93 projects

Trackers opened Trackers closed Mean 2.32 2.05 Std. error of mean .620 .624 Std. deviation 5.975 6.013 Variance 35.699 36.160

Noticeable is the inactivity of the majority of the trackers during May 2008. 63 (67.7%) projects have not opened a tracker and 73 (78.5%) projects have not closed a tracker during the month May. In comparison, Hinds (2008) found a high correlation (Pearson correlation = .86) between trackers opened and trackers closed during a two-year period. The May 2008 data has a much lower Pearson correlation coefficient of .547. Therefore it is suspected a longer timeframe is needed to correctly make use of the tracker statistics. It is plausible trackers are not closed in the same month as when they were opened. Or for example, tracker activity may be higher just after a new software release or major update is published. Thus, extended research was conducted to improve the accuracy of the tracker data. Now a timeframe of 5 months, from January 2008 to May 2008 is investigated.

125

Table I.2 - SourceForge sample tracker statistics, January 2008 - May 2008

Trackers opened Trackers closed Mean Std. deviation Mean Std. deviation January 2008 3.04 10.634 2.63 8.720 February 2008 3.02 8.095 3.06 10.768 March 2008 3.11 8.503 3.22 10.794 April 2008 3.16 7.737 3.16 10.059 May 2008 2.32 5.975 2.05 6.013

Total 14.65 38.308 14.13 37.136 (N = 93 projects)

Table I.2 shows there is not much difference between trackers opened and trackers closed during the first 5 months of 2008. The Pearson correlation coefficient of the total of these 5 months is .898, which is much higher than May 2008. Now, it is plausible there is indeed a logical connection between trackers opened and trackers closed. A tracker must be opened before it can be closed. Observation of these monthly statistics shows more problems occur. The data of May 2008 indicate the majority of the project groups did not close or open a tracker at all. If these data are expanded to the five months timeframe of 2008, it is noticeable 36 (38,7%) projects have no tracker activity at all.

Table I.3 - SourceForge sample tracker statistics, January 2008 - May 2008 (active projects)

Trackers opened Trackers closed Mean Std. deviation Mean Std. deviation January 2008 4.96 13.269 4.30 10.846 February 2008 4.93 9.903 5.00 13.440 March 2008 5.07 10.423 5.25 13.438 April 2008 5.16 9.373 5.16 12.480 May 2008 3.79 7.279 3.35 7.415

Total 23.89 46.754 23.05 45.346 (N = 57 projects)

Without the inactive projects, the monthly tracker statistics are much higher as can be seen in Table I.3. Now, the Pearson correlation is .887, which is comparable with previous statistics. It appears, for unknown reasons, May 2008 is a slightly less active month than the other measured months of 2008.

Observation of the bug tracker statistics learns some projects just only close trackers when a software update releases, or just close all opened trackers at a certain point in time. It is plausible bugs may be reported via different communication protocols, for example by email or a digital form. Observation shows in various projects bugs and problems are reported via the project public forums.

Here, the high tracker inactivity of projects and the high correlation between trackers opened and trackers closed, make these variables unusable and are therefore eliminated. 126

An alternative measure for future research may be the mean time between a tracker is opened and closed. This can be an indication for the activity (response time) of the core group. Though this is not a perfect solution for measuring activity it can be very useful together with other activity or success measures, especially for measuring larger projects. Then bug reports need to be identified, which unfortunately was not an option here. It also may be noted bugs are not systematic errors and therefore this alternative measure can only be used as an indication, as the amount of bugs, and the quality of repairing bugs may heavily vary between projects. Sometimes bugs are not noticed at all, or just after a very long time. Secondly, critical bugs may be solved faster than non-critical bugs, and not always a solution can easily be found. Even new bugs may be introduced by solving an old one.

127

J. SourceForge project ranking formula Below the SourceForge project ranking formula is shown. The ranking formula is composed of three individual measures: traffic, development and communication.

SourceForge project ranking formula

Traffic: ( (log(prior 7 days download total + 1) / log(highest all-project download total + 1)) +(log(prior 7 days logo hits total + 1) / log(highest all-project logo hits + 1)) +(log(prior 7 days site hits total + 1) / log(highest all-project site hits + 1)) ) / 3

Development: ( (log(prior 7 days cvs commit total + 1) / log(highest all-project total + 1)) +( (100-age of latest file release (in days, max 100)) / 100 ) +( (100-days since last project administrator login (max 100)) / 100 ) ) / 3

Communication: ( (log(prior 7 days Tracker submission count + 1) / log(highest all-project total + 1)) +(log(prior 7 days ML post count + 1) / log(highest all-project total + 1)) +(log(prior 7 days Forum post count + 1) / log(highest all-project total + 1)) ) / 3

The Traffic component is only considered for projects with File Releases. The Communication component is only considered for projects that have categorized themselves within the Software Map (Trove categorization). Tools that use an all-time ranking instead of a weekly ranking, such as search results, will use an aggregate tool item count rather the data from the past 7 days, as displayed above. total = traffic + development + communication

Source: http://alexandria.wiki.sourceforge.net/Statistics , last consulted February 27, 2009

128

K. Normality checks

Normality checks - Group Density

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

129

Normality checks - cont. (2/6)

Normality checks - Core Density

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

130

Normality checks - cont. (3/6)

Normality checks - Peripheral Two-Mode Density

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

131

Normality checks - cont. (4/6)

Normality checks - Core Membership Degree

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

132

Normality checks - cont. (5/6)

Normality checks - Administrator Membership Degree

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

133

Normality checks - cont. (6/6)

Normality checks - Administrator Class Centrality

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

134

L. PP plots PP plots - Group Density

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

135

PP plots - cont. (2/6)

PP plots - Core Density

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

136

PP plots - cont. (3/6)

PP plots - Peripheral Two-Mode Density

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

137

PP plots - cont. (4/6)

PP plots - Core Membership Degree

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

138

PP plots - cont. (5/6)

PP plots - Administrator Membership Degree

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

139

PP plots - cont. (6/6)

PP plots - Administrator Class Centrality

Software Downloads Page Views

Web Hits Project Rank

Trackers Opened Trackers Closed

140

M. One-Sample Kolmogorov-Smirnov tests A Monte Carlo method with 1,000,000 sample tables was performed (starting seed: 1,993,510,611). For each dependent variable the Kolmogorov-Smirnov Z-scores are calculated. Table M.1 provides the results of the tests for the untransformed variables, where Table P.2. provides the results of the tests for the natural transformed variables. First, the Z-score and its significance level are provided. Then, the Monte Carlo significance level is shown, along with the lower and upper bounds of this significance level.

Table M.1 - One-Sample Kolmogorov-Smirnov tests for the untransformed dependent variables

Untransformed variables Dependent Kolmogorov- Sig. Level MC Sig. MC Lower MC Upper variable Smirnov Level Bound Bound Z-scores Software 3.921 .000 .000 .000 .000 Downloads Page Views 3.508 .000 .000 .000 .000 Web Hits 3.876 .000 .000 .000 .000 Rank 2.595 .000 .000 .000 .000 Trackers 3.386 .000 .000 .000 .000 Opened Trackers 3.393 .000 .000 .000 .000 Closed

Table M.2 - One-Sample Kolmogorov-Smirnov tests for the transformed dependent variables

Transformed variables Dependent Kolmogorov- Sig. Level MC Sig. MC Lower MC Upper variable Smirnov Level Bound Bound Z-scores Software .682 .741 .714 .713 .715 Downloads Page Views .662 .774 .747 .746 .748 Web Hits 1.315 .063 .057 .057 .058 Rank .811 .526 .500 .498 .501 Trackers .000 .000 .000 .000 .000 Opened Trackers .000 .000 .000 .000 .000 Closed

In cases where the dependent variable had a zero ('0') value, the log transformation of that value would result in a missing value. In these cases the value of the log transformation was set to zero as well.

141

N. QQ plots and detrended QQ plots The first two pages of this Appendix provide the QQ plots and detrended QQ plots for the dependent variables. The last two pages of this Appendix provide the QQ plots and detrended QQ plots for the natural transformed dependent variables.

QQ plots and detrended QQ plots - untransformed dependent variables

QQ plot - Software Downloads Detrended QQ plots - Software Downloads

QQ plot - Page Views Detrended QQ plots - Page Views

QQ plot - Web Hits Detrended QQ plots - Web Hits

142

QQ plots and detrended QQ plots - cont. (2/4)

QQ plots and detrended QQ plots - untransformed dependent variables

QQ plot - Project Rank Detrended QQ plots - Project Rank

QQ plot - Trackers Opened Detrended QQ plots - Trackers Opened

QQ plot - Trackers Closed Detrended QQ plots - Trackers Closed

143

QQ plots and detrended QQ plots - cont. (3/4)

QQ Plots and detrended QQ Plots - natural transformed dependent variables

Software Downloads (transformed)

Page Views (transformed)

Web Hits (transformed)

144

QQ plots and detrended QQ plots - cont. (4/4)

QQ Plots and detrended QQ Plots - natural transformed dependent variables

Project Rank (transformed)

Trackers Opened (transformed)

Trackers Closed (transformed)

145

O. Homoscedasticity checks Here, the homoscedasticity checks are provided.

Control variables

Software Downloads Page views

Web Hits

146

Homoscedasticity checks - cont. (2/7)

Homoscedasticity checks - Group Density

Software Downloads

Page views

Web Hits

147

Homoscedasticity checks - cont. (3/7)

Homoscedasticity checks - Core Density

Software Downloads

Page views

Web Hits

148

Homoscedasticity checks - cont. (4/7)

Homoscedasticity checks - Peripheral Two-Mode Density

Software Downloads

Page views

Web Hits

149

Homoscedasticity checks - cont. (5/7)

Homoscedasticity checks - Core Membership Degree

Software Downloads

Page views

Web Hits

150

Homoscedasticity checks - cont. (6/7)

Homoscedasticity checks - Administrator Membership Degree

Software Downloads

Page views

Web Hits

151

Homoscedasticity checks - cont. (7/7)

Homoscedasticity checks - Administrator Class Centrality

Software Downloads

Page views

Web Hits

152

P. Multicollinearity checks This appendix provides an overview of the multicollinearity checks. The detailed regression analysis can be found in Appendix T. Next to the Tolerance, the reciprocal Variance Inflation Factor (VIF) is shown.

2 Tolerance = 1 - R x, other VIF = 1 / Tolerance

Hypothesis 1a - Log-transformed software downloads regressed on group density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.458 .279 3.586 .263 3.809 Core Size .950 1.052 .950 1.053 .948 1.055 Con. Volume .300 3.338 .299 3.347 .297 3.363 Age .943 1.060 .918 1.089 .899 1.112 Group Density .885 1.130 .220 4.540 Group Density .246 4.072 (m-c and squared)

Hypothesis 1b - Log-transformed page views regressed on group density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .278 3.591 .262 3.814 Core Size .950 1.053 .950 1.053 .947 1.056 Con. Volume .299 3.342 .298 3.351 .297 3.368 Age .945 1.059 .920 1.087 .902 1.109 Group Density .887 1.128 .222 4.505 Group Density .247 4.046 (m-c and squared)

Hypothesis 1c - Log-transformed web hits regressed on group density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .290 3.451 .279 3.581 .263 3.809 Core Size .946 1.057 .946 1.057 .945 1.059 Con. Volume .300 3.336 .299 3.345 .297 3.363 Age .943 1.060 .920 1.087 .908 1.102 Group Density .888 1.126 .240 4.162 Group Density .267 3.739 (m-c and squared)

Hypothesis 2a - Log-transformed software downloads regressed on core density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.458 .292 3.424 .292 3.425 Core Size .950 1.052 .893 1.120 .829 1.207 Con. Volume .300 3.338 .300 3.334 .299 3.341 Age .943 1.060 .875 1.143 .870 1.149 Core Density .842 1.187 .842 1.187 Core Density .915 1.093 (m-c and squared)

153

Multicollinearity checks - cont. (2/6)

Hypothesis 2b - Log-transformed page views regressed on core density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .292 3.427 .292 3.428 Core Size .950 1.053 .890 1.124 .826 1.211 Con. Volume .299 3.342 .300 3.334 .299 3.342 Age .945 1.059 .875 1.143 .870 1.149 Core Density .841 1.189 .841 1.189 Core Density .912 1.096 (m-c and squared)

Hypothesis 2c - Log-transformed page views regressed on core density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .290 3.451 .293 3.410 .293 3.410 Core Size .946 1.057 .889 1.125 .825 1.213 Con. Volume .300 3.336 .301 3.324 .300 3.333 Age .943 1.060 .872 1.147 .870 1.149 Core Density .840 1.191 .839 1.191 Core Density .911 1.097 (m-c and squared)

Hypothesis 3a - Log-transformed software downloads regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.458 .282 3.547 .273 3.669 Core Size .950 1.052 .855 1.170 .791 1.264 Con. Volume .300 3.338 .298 3.352 .294 3.402 Age .943 1.060 .873 1.145 .837 1.195 PT Density .757 1.321 .261 3.829 PT Density .335 2.986 (m-c and squared)

Hypothesis 3b - Log-transformed page views regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .281 3.554 .272 3.675 Core Size .950 1.053 .852 1.174 .787 1.271 Con. Volume .299 3.342 .298 3.357 .294 3.405 Age .945 1.059 .877 1.140 .845 1.184 PT Density .756 1.322 .268 3.737 PT Density .344 2.909 (m-c and squared)

154

Multicollinearity checks - cont. (3/6)

Hypothesis 3c- Log-transformed web hits regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .290 3.451 .282 3.540 .273 3.665 Core Size .946 1.057 .853 1.172 .787 1.270 Con. Volume .300 3.336 .299 3.350 .294 3.398 Age .943 1.060 .867 1.154 .828 1.207 PT Density .750 1.334 .259 3.857 PT Density .336 2.980 (m-c and squared)

Hypothesis 4a - Log-transformed software downloads regressed on core membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.458 .289 3.465 .288 3.467 Core Size .950 1.052 .937 1.067 .937 1.067 Con. Volume .300 3.338 .299 3.339 .299 3.346 Age .943 1.060 .934 1.070 .888 1.126 CMD .975 1.025 .273 3.668 CMD .274 3.649 (m-c and squared)

Hypothesis 4b - Log-transformed page views regressed on core membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .288 3.469 .288 3.470 Core Size .950 1.053 .939 1.066 .938 1.066 Con. Volume .299 3.342 .299 3.344 .298 3.351 Age .945 1.059 .937 1.068 .894 1.119 CMD .979 1.022 .278 3.597 CMD .279 3.588 (m-c and squared)

Hypothesis 4c - Log-transformed web hits regressed on core membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .290 3.451 .289 3.458 .289 3.460 Core Size .946 1.057 .936 1.069 .935 1.069 Con. Volume .300 3.336 .300 3.337 .299 3.344 Age .943 1.060 .938 1.066 .900 1.111 CMD .981 1.020 .274 3.644 CMD .274 3.644 (m-c and squared)

155

Multicollinearity checks - cont. (4/6)

Hypothesis 5a - Log-transformed software downloads regressed on administrator membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.458 .288 3.469 .288 3.477 Core Size .950 1.052 .949 1.054 .944 1.060 Con. Volume .300 3.338 .297 3.365 .292 3.419 Age .943 1.060 .937 1.067 .896 1.116 AMD .987 1.104 .349 2.868 AMD .347 2.879 (m-c and squared)

Hypothesis 5b - Log-transformed page views regressed on administrator membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .288 3.473 .287 3.479 Core Size .950 1.053 .949 1.054 .942 1.061 Con. Volume .299 3.342 .297 3.370 .292 3.422 Age .945 1.059 .939 1.065 .901 1.110 AMD .987 1.013 .353 2.837 AMD .351 2.848 (m-c and squared)

Hypothesis 5c - Log-transformed web hits regressed on administrator membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .290 3.451 .289 3.462 .288 3.470 Core Size .946 1.057 .945 1.058 .938 1.066 Con. Volume .300 3.336 .297 3.363 .293 3.417 Age .943 1.060 .939 1.065 .905 1.105 AMD .988 1.012 .350 2.858 AMD .348 2.875 (m-c and squared)

Hypothesis 6a - Log-transformed software downloads regressed on administrator class centrality Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.458 .283 3.536 .283 3.537 Core Size .950 1.052 .944 1.059 .931 1.074 Con. Volume .300 3.338 .299 3.340 .298 3.351 Age .943 1.060 .782 1.279 .776 1.288 ACC .765 1.307 .704 1.420 ACC .892 1.121 (m-c and squared)

156

Multicollinearity checks - cont. (5/6)

Hypothesis 6b - Log-transformed page views regressed on administrator class centrality Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .283 3.539 .282 3.541 Core Size .950 1.053 .944 1.060 .933 1.072 Con. Volume .299 3.342 .299 3.344 .298 3.356 Age .945 1.059 .782 1.280 .776 1.289 ACC .764 1.308 .706 1.416 ACC .898 1.114 (m-c and squared)

Hypothesis 6c - Log-transformed web hits regressed on administrator class centrality Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .290 3.451 .284 3.526 .284 3.527 Core Size .946 1.057 .939 1.065 .929 1.057 Con. Volume .300 3.336 .300 3.337 .299 3.349 Age .943 1.060 .782 1.279 .775 1.291 ACC .768 1.303 .710 1.409 ACC .900 1.112 (m-c and squared)

Hypothesis 1d - Log-transformed project rank regressed on group density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .278 3.591 .262 3.814 Core Size .950 1.053 .950 1.053 .947 1.056 Con. Volume .299 3.342 .298 3.351 .297 3.368 Age .945 1.059 .920 1.087 .902 1.109 Group Density .887 1.128 .222 4.505 Group Density .247 4.046 (m-c and squared)

Hypothesis 2d - Log-transformed project rank regressed on core density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .292 3.428 .292 3.428 Core Size .950 1.053 .826 1.211 .826 1.211 Con. Volume .299 3.342 .299 3.342 .299 3.342 Age .945 1.059 .870 1.149 .870 1.149 Core Density .841 1.189 .841 1.189 Core Density .912 1.096 .912 1.096 (m-c and squared)

157

Multicollinearity checks - cont. (6/6)

Hypothesis 3d - Log-transformed project rank regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .281 3.554 .272 3.675 Core Size .950 1.053 .852 1.174 .787 1.271 Con. Volume .299 3.342 .298 3.357 .294 3.405 Age .945 1.059 .877 1.140 .845 1.184 PT Density .756 1.322 .268 3.737 PT Density .344 2.909 (m-c and squared)

Hypothesis 4d - Log-transformed project rank regressed on core membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .288 3.469 .288 3.470 Core Size .950 1.053 .939 1.066 .938 1.066 Con. Volume .299 3.342 .299 3.344 .298 3.351 Age .945 1.059 .937 1.068 .894 1.119 CMD .979 1.022 .278 3.597 CMD .279 3.588 (m-c and squared)

Hypothesis 5d - Log-transformed project rank regressed on administrator membership degree Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .288 3.473 .287 3.479 Core Size .950 1.053 .949 1.054 .942 1.061 Con. Volume .299 3.342 .297 3.370 .292 3.422 Age .945 1.059 .939 1.065 .901 1.110 AMD .987 1.013 .353 2.837 AMD .351 2.848 (m-c and squared)

Hypothesis 6d - Log-transformed project rank regressed on administrator class centrality Model 1 Model 2 Model 3 Variables Tolerance VIF Tolerance VIF Tolerance VIF Group Size .289 3.462 .283 3.539 .282 3.541 Core Size .950 1.053 .944 1.060 .933 1.072 Con. Volume .299 3.342 .299 3.344 .298 3.356 Age .945 1.059 .782 1.280 .776 1.289 ACC .764 1.308 .706 1.416 ACC .898 1.114 (m-c and squared)

158

Q. Detailed regression analyses Here, the detailed regression analyses are provided. For each hypothesis the dependent variable is regressed on the independent variable (Model 1). Secondly, the control variables are included and the regression test is repeated (Model 2). The third test includes a transformation of the independent variable, which is mean-centered and squared (Model 3). Here is chosen to present the unstandardized coefficients (B and Std. Error), above the standardized coefficients (Beta). When significant, the numbers are marked with asterisks '*'.

Table Q.1 - Hypothesis 1a - Log-transformed software downloads regressed on group density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.004*** (.868) 8.343*** (.865) 9.133*** (.894) Group Size .002** (.001) .002* (.001) .001 (.001) Core Size .082* (.037) .080* (.034) .084* (.033) Con. Volume -.000028 (.000) -.000014 (.000) -.000002 (.000) Age .000 (.000) .000 (.000) .000* (.000) Group Density -11.287*** (2.764) -23.228*** (5.373) Group Density .291* (.114) (m-c and squared)

R² .239 .360 .405 F-Statistic 6.983*** 9.905*** 9.873***

Adjusted R² .205 .324 .364

ΔR² .121 .045 Δ F-Statistic 2.922 -.032 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 93 projects M3 N = 93 projects

Table Q.2 - Hypothesis 1b - Log-transformed page views regressed on group density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 9.220*** (.727) 10.538*** (.697) 11.205*** (.715) Group Size .002** (.001) .002** (.001) .001 (.001) Core Size .096** (.031) .095*** (.028) .098*** (.027) Con. Volume -.000057 (.000) -.000042 (.000) -.000032 (.000) Age .000 (.000) .000** (.000) .000* (.000) Group Density -11.200*** (2.226) -21.413*** (4.295) Group Density .250** (.091) (m-c and squared)

R² .286 .433 .486 F-Statistic 9.114*** 14.304*** 14.044***

Adjusted R² .255 .412 .452

ΔR² .157 .043 Δ F-Statistic 5.190 -.260 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 95 projects M3 N = 95 projects 159

Detailed regression analyses - cont. (2/12)

Table Q.3 - Hypothesis 1c

Log-transformed web hits regressed on group density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 6.307*** (1.406) 7.082*** (1.110) 7.251*** (1.177) Group Size .002* (.001) .002* (.001) .002 (.001) Core Size .066 (.044) .065 (.043) .066 (.044) Con. Volume -.000098 (.000) -.000089 (.000) -.000086 (.000) Age .001* (.000) .001 (.000) .001 (.000) Group Density -7.053 (3.727) -9.811 (7.200) Group Density .067 (.149) (m-c and squared)

R² .202 .233 .235 F-Statistic 5.633*** 5.353*** 4.454***

Adjusted R² .166 .190 .182

ΔR² .031 .002 Δ F-Statistic -.280 -.899 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 93 projects M3 N = 93 projects

Table Q.4 - Hypothesis 2a

Log-transformed software downloads regressed on core density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.004*** (.868) 7.100*** (1.285) 6.805*** (1.331) Group Size .002** (.001) .002* (.001) .002 (.001) Core Size .082* (.037) .085* (.042) .095 (.044) Con. Volume -.000028 (.000) -.000014 (.000) -.000011 (.000) Age .000 (.000) .000 (.000) .000 (.000) Core Density -.404 (.803) -.399 (.804) Core Density .306 (.354) (m-c and squared)

R² .239 .255 .263 F-Statistic 6.983*** 4.933*** 4.221***

Adjusted R² .205 .203 .201

ΔR² .016 .008 Δ F-Statistic -2.050 -.712 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 77 projects M3 N = 77 projects

160

Detailed regression analyses - cont. (3/12)

Table Q.5 - Hypothesis 2b

Log-transformed page views regressed on core density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 9.220*** (.727) 9.385*** (1.089) 9.209*** (1.130) Group Size .002** (.001) .002** (.001) .002** (.001) Core Size .096** (.031) .098** (.036) .104** (.037) Con. Volume -.000057 (.000) -.000047 (.000) -.000045 (.000) Age .000 (.000) .000 (.000) .000 (.000) Core Density -.341 (.674) -.343 (.677) Core Density .184 (.300) (m-c and squared)

R² .286 .298 .302 F-Statistic 9.114*** 6.199*** 5.184***

Adjusted R² .255 .250 .244

ΔR² .012 .004 Δ F-Statistic -2.915 -1.015 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 78 projects M3 N = 78 projects

Table Q.6 - Hypothesis 2c

Log-transformed web hits regressed on core density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 6.307*** (1.046) 7.250*** (1.497) 7.553*** (1.566) Group Size .002* (.001) .002* (.001) .002* (.001) Core Size .066 (.044) .054 (.048) .045 (.050) Con. Volume -.000098 (.000) .000 (.000) .000 (.000) Age .001* (.000) .001 (.001) .001 (.001) Core Density -.499 (.905) -.497 (.909) Core Density -.283 (.410) (m-c and squared)

R² .202 .206 .211 F-Statistic 5.633*** 3.679** 3.122**

Adjusted R² .166 .150 .144

ΔR² .004 .005 Δ F-Statistic -1.954 -.577 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 76 projects M3 N = 76 projects

161

Detailed regression analyses - cont. (4/12)

Table Q.7 - Hypothesis 3a

Log-transformed software downloads regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.004*** (.868) 7.276*** (1.070) 7.780*** (1.191) Group Size .002** (.001) .002* (.001) .002* (.001) Core Size .082* (.037) .078* (.038) .067 (.040) Con. Volume -.000028 (.000) -.000022 (.000) -.000013 (.000) Age .000 (.000) .000 (.000) .000 (.000) PT Density -.594 (1.208) -2.198 (2.058) PT Density .182 (.189) (m-c and squared)

R² .239 .245 .253 F-Statistic 6.983*** 5.635*** 4.847***

Adjusted R² .205 .201 .201

ΔR² .006 .008 Δ F-Statistic -1.348 -.788 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 92 projects M3 N = 92 projects

Table Q.8 - Hypothesis 3b

Log-transformed page views regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 9.220*** (.727) 9.423*** (.892) 9.847*** (.989) Group Size .002** (.001) .002** (.001) .002** (.001) Core Size .096** (.031) .093** (.032) .084* (.033) Con. Volume -.000057 (.000) -.000052 (.000) -.000044 (.000) Age .000 (.000) .000 (.000) .000 (.000) PT Density -.455 (1.007) -1.807 (1.693) PT Density .155 (.156) (m-c and squared)

R² .286 .294 .302 F-Statistic 9.114*** 7.429*** 6.355***

Adjusted R² .255 .255 .255

ΔR² .008 .008 Δ F-Statistic -1.685 -1.074 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 94 projects M3 N = 94 projects

162

Detailed regression analyses - cont. (5/12)

Table Q.9 - Hypothesis 3c

Log-transformed web hits regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 6.307*** (1.046) 6.222*** (1.291) 7.158*** (1.430) Group Size .002* (.001) .002* (.001) .002* (.001) Core Size .066 (.044) .069 (.045) .050 (.047) Con. Volume -.000098 (.000) -.000094 (.000) -.000078 (.000) Age .001* (.000) .001* (.000) .001 (.000) PT Density .053 (1.435) -2.842 (2.424) PT Density .325 (.220) (m-c and squared)

R² .202 .213 .233 F-Statistic 5.633*** 4.712*** 4.343

Adjusted R² .166 .168 .179

ΔR² .011 .020 Δ F-Statistic -.921 -.369 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 92 projects M3 N = 92 projects

Table Q.10 - Hypothesis 4a

Log-transformed software downloads regressed on core membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.004*** (.868) 6.716*** (.893) 6.683*** (.906) Group Size .002** (.001) .002** (.001) .002** (.001) Core Size .082* (.037) .087* (.037) .088* (.038) Con. Volume -.000028 (.000) -.000030 (.000) -.000031 (.000) Age .000 (.000) .000 (.000) .000 (.000) CMD .120 (.093) .161 (.177) CMD -.035 (.129) (m-c and squared)

R² .239 .253 .254 F-Statistic 6.983*** 5.964*** 4.930***

Adjusted R² .205 .211 .202

ΔR² .014 .001 Δ F-Statistic -.1019 -1.034 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 93 projects M3 N = 93 projects

163

Detailed regression analyses - cont. (6/12)

Table Q.11 - Hypothesis 4b

Log-transformed page views regressed on core membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 9.220*** (.727) 8.943*** (.745) 8.961*** (.755) Group Size .002** (.001) .002*** (.001) .002*** (.001) Core Size .096** (.031) .101** (.031) .101** (.031) Con. Volume -.000057 (.000) -.00059 (.000) -.00058 (.000) Age .000 (.000) .000 (.000) .000 (.000) CMD .116 (.077) .094 (.145) CMD .019 (.106) (m-c and squared)

R² .286 .304 .304 F-Statistic 9.114*** 7.850*** 6.477***

Adjusted R² .255 .265 .257

ΔR² .018 .000 Δ F-Statistic -1.264 -1.373 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 95 projects M3 N = 95 projects

Table Q.12 - Hypothesis 4c

Log-transformed web hits regressed on core membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 6.307*** (1.046) 6.253*** (1.088) 6.124*** (1.103) Group Size .002* (.001) .002* (.001) .002* (.001) Core Size .066 (.044) .067 (.044) .067 (.045) Con. Volume -.000098 (.000) -.000098 (.000) .000 (.000) Age .001** (.000) .001* (.000) .001* (.000) CMD .021 (.111) .163 (.209) CMD -.122 (.152) (m-c and squared)

R² .202 .202 .208 F-Statistic 5.633*** 4.465*** 3.812**

Adjusted R² .166 .157 .154

ΔR² .000 .006 Δ F-Statistic -1.168 -.653 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 93 projects M3 N = 93 projects

164

Detailed regression analyses - cont. (7/12)

Table Q.13 - Hypothesis 5a

Log-transformed software downloads regressed on administrator membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.004*** (.868) 6.623 (.890) 6.638*** (.894) Group Size .002** (.001) .002** (.001) .002** (.001) Core Size .082* (.037) .082* (.037) .083* (.037) Con. Volume -.000028 (.000) -.000039 (.000) -.000033 (.000) Age .000 (.000) .000 (.000) .000 (.000) AMD .091 (.067) .042 (.113) AMD .058 (.109) (m-c and squared)

R² .239 .256 .259 F-Statistic 6.983*** 5.924*** 4.944***

Adjusted R² .205 .213 .206

ΔR² .017 .003 Δ F-Statistic -1.059 -.980 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 91 projects M3 N = 91 projects

Table Q.14 - Hypothesis 5b

Log-transformed page views regressed on administrator membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 9.220*** (.727) 8.868*** (.742) 8.878*** (.745) Group Size .002** (.001) .002*** (.001) .002*** (.001) Core Size .096** (.031) .096** (.031) .098** (.031) Con. Volume -.000057 (.000) -.000067 (.000) -.000064 (.000) Age .000 (.000) .000 (.000) .000 (.000) AMD .093 (.056) .055 (.094) AMD .046 (.091) (m-c and squared)

R² .286 .311 .313 F-Statistic 9.114*** 7.942*** 6.605***

Adjusted R² .255 .272 .266

ΔR² .025 .002 Δ F-Statistic -1.172 -1.337 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 93 projects M3 N = 93 projects

165

Detailed regression analyses - cont. (8/12)

Table Q.15 - Hypothesis 5c

Log-transformed web hits regressed on administrator membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 6.307*** (1.046) 6.236*** (1.099) 6.202*** (1.103) Group Size .002* (.001) .002* (.001) .002* (.001) Core Size .066 (.044) .067 (.045) .064 (.045) Con. Volume -.000098 (.000) .000 (.000) .000 (.000) Age .001* (.000) .001* (.000) .001 (.000) AMD .021 (.081) .099 (.137) AMD -.093 (.131) (m-c and squared)

R² .202 .205 .210 F-Statistic 5.633*** 4.436*** 3.758**

Adjusted R² .166 .159 .154

ΔR² .003 .005 Δ F-Statistic -1.197 -.678 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 91 projects M3 N = 91 projects

Table Q.16 - Hypothesis 6a

Log-transformed software downloads regressed on administrator class centrality Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.004*** (.868) 8.382*** (1.268) 8.040*** (1.365) Group Size .002** (.001) .002* (.001) .002* (.001) Core Size .082* (.037) .086* (.037) .089* (.037) Con. Volume -.000028 (.000) -.000026 (.000) -.000029 (.000) Age .000 (.000) .000 (.000) .000 (.000) ACC -1.376 (.928) -1.187 (.970) ACC .080 (.116) (m-c and squared)

R² .239 .257 .261 F-Statistic 6.983*** 6.101*** 5.134***

Adjusted R² .205 .215 .211

ΔR² .018 .004 Δ F-Statistic -.882 -.967 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 93 projects M3 N = 93 projects

166

Detailed regression analyses - cont. (9/12)

Table Q.17 - Hypothesis 6b

Log-transformed page views regressed on administrator class centrality Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 9.220*** (.727) 10.123*** (1.067) 9.817*** (1.144) Group Size .002** (.001) .002** (.001) .002** (.001) Core Size .096** (.031) .099** (.031) .101** (.031) Con. Volume -.000057 (.000) -.000055 (.000) -.000058 (.000) Age .000 (.000) .000 (.000) .000 (.000) ACC -.902 (.781) -.732 (.815) ACC .073 (.096) (m-c and squared)

R² .286 .296 .301 F-Statistic 9.114*** 7.584*** 6.386***

Adjusted R² .255 .257 .254

ΔR² .010 .005 Δ F-Statistic -1.530 -1.198 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 95 projects M3 N = 95 projects

Table Q.18 - Hypothesis 6c

Log-transformed web hits regressed on administrator class centrality Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 6.307*** (1.046) 4.990 (1.522) 5.268*** (1.637) Group Size .002* (.001) .002* (.001) .002* (.001) Core Size .066 (.044) .061 (.044) .059 (.044) Con. Volume -.000098 (.000) .000 (.000) -.000098 (.000) Age .001* (.000) .001* (.001) .001* (.001) ACC 1.309 (1.102) 1.159 (1.151) ACC -.064 (.136) (m-c and squared)

R² .202 .215 .217 F-Statistic 5.633*** 4.810*** 4.010***

Adjusted R² .166 .170 .163

ΔR² .013 .002 Δ F-Statistic -.823 -.800 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 93 projects M2 N = 93 projects M3 N = 93 projects

167

Detailed regression analyses - cont. (10/12)

Table Q.19 - Hypothesis 1d

Log-transformed project rank regressed on group density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.606*** (.678) 7.004*** (.717) 6.838*** (.764) Group Size -.001* (.001) -.001 (.001) .000 (.001) Core Size -.114*** (.029) -.114*** (.028) -.114*** (.029) Con. Volume .000039 (.000) .000032 (.000) .000030 (.000) Age .001 (.000) .001* (.000) .001* (.000) Group Density 5.117* (2.289) 7.657 (4.590) Group Density -.062 (.097) (m-c and squared)

R² .243 .283 .286 F-Statistic 7.310 7.104 5.949

Adjusted R² .210 .243 .238

ΔR² .040 .003 Δ F-Statistic -.206 -1.155 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 95 projects M3 N = 95 projects

Table Q.20 - Hypothesis 2d

Log-transformed project rank regressed on core density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.606*** (.678) 7.501*** (.969) (7.449)*** (1.008) Group Size -.001* (.001) -.001* (.001) -.001* (.001) Core Size -.114*** (.029) -.116*** (.032) -.114*** (.033) Con. Volume .000039 (.000) .000046 (.000) .000047 (.000) Age .001 (.000) .001 (.000) .001 (.000) Core Density -.222 (.600) -.222 (.604) Core Density .055 (.268) (m-c and squared)

R² .243 .258 .259 F-Statistic 7.310 5.089 4.192

Adjusted R² .210 .208 .197

ΔR² .015 .001 Δ F-Statistic -2.221 -.897 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 78 projects M3 N = 78 projects

168

Detailed regression analyses - cont. (11/12)

Table Q.21 - Hypothesis 3d

Log-transformed project rank regressed on peripheral two-mode density Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.606*** (.678) 8.088*** (.844) 7.439*** (.927) Group Size -.001* (.001) -.001* (.001) -.001 (.001) Core Size -.114*** (.029) -.124*** (.030) -.110*** (.031) Con. Volume .000039 (.000) .000041 (.000) .000029 (.000) Age .001 (.000) .000 (.000) .001 (.000) PTD -.848 (.952) 1.221 (1.586) PTD -.237 (.146) (m-c and squared)

R² .243 .252 .274 F-Statistic 7.310 6.002 5.532

Adjusted R² .210 .210 .224

ΔR² .009 .022 Δ F-Statistic -1.308 -.470 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 94 projects M3 N = 94 projects

Table Q.22 - Hypothesis 4d

Log-transformed project rank regressed on core membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.606*** (.678) 8.015*** (.682) 8.122*** (.685) Group Size -.001* (.001) -.001* (.001) -.001* (.001) Core Size -.114*** (.029) -.122*** (.028) -.122*** (.028) Con. Volume .000039 (.000) .000042 (.000) .000045 (.000) Age .001 (.000) .001* (.000) .001* (.000) CMD -.172* (.071) -.308* (.132) CMD .118 (.097) (m-c and squared)

R² .243 .290 .302 F-Statistic 7.310 7.354 6.410

Adjusted R² .210 .251 .255

ΔR² .047 .012 Δ F-Statistic .044 -.944 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 95 projects M3 N = 95 projects

169

Detailed regression analyses - cont. (12/12)

Table Q.23 - Hypothesis 5d

Log-transformed project rank regressed on administrator membership degree Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.606*** (.678) 8.035*** (.680) 8.041*** (.684) Group Size -.001* (.001) -.001* (.001) -.001* (.001) Core Size -.114*** (.029) -.115*** (.028) -.114*** (.028) Con. Volume .000039 (.000) .000053 (.000) .000055 (.000) Age .001 (.000) .001 (.000) .001* (.000) AMD -.120* (.051) -.141 (.086) AMD .026 (.083) (m-c and squared)

R² .243 .291 .292 F-Statistic 7.310 7.225 5.976

Adjusted R² .210 .251 .243

ΔR² .048 .001 Δ F-Statistic -.085 -1.249 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 93 projects M3 N = 93 projects

Table Q.24 - Hypothesis 6d

Log-transformed project rank regressed on administrator class centrality Model 1 Model 2 Model 3 Variables B Std. Error B Std. Error B Std. Error (Constant) 7.606*** (.678) 8.570*** (.994) 9.156*** (1.054) Group Size -.001* (.001) -.001* (.001) -.001* (.001) Core Size -.114*** (.029) -.111*** (.029) -.116*** (.029) Con. Volume .000039 (.000) .000041 (.000) .000046 (.000) Age .001 (.000) .000 (.000) .000 (.000) ACC -.962 (.727) -1.287 (.751) ACC -.139 (.089) (m-c and squared)

R² .243 .258 .278 F-Statistic 7.310 6.246 5.701

Adjusted R² .210 .216 .229

ΔR² .015 .020 Δ F-Statistic -1.064 -.545 Controlling for group size, core size, conversation volume and age * p < .05 ** p < .01 *** p < .001 M1 N = 95 projects M2 N = 95 projects M3 N = 95 project

170