Reference Coupling: An Exploration of Inter-project Technical Dependencies and their Characteristics within Large Ecosystems

Kelly Blincoea,∗, Francis Harrisonb, Navpreet Kaurb, Daniela Damianb

aUniversity of Auckland, New Zealand bUniversity of Victoria, BC, Canada

Abstract

Context: Software projects often depend on other projects or are developed in tandem with other projects. Within such software ecosystems, knowledge of cross-project technical dependencies is important for 1) practitioners under- standing of the impact of their code change and coordination needs within the ecosystem and 2) researchers in exploring properties of software ecosystems based on these technical dependencies. However, identifying technical depen- dencies at the ecosystem level can be challenging. Objective: In this paper, we describe Reference Coupling, a new method that uses solely the information in developers online interactions to detect technical dependencies between projects. The method establishes dependencies through user-specified cross-references between projects. We then use the output of this method to explore the properties of large software ecosystems. Method: We validate our method on two datasets — one from open-source projects hosted on GitHub and one commercial dataset of IBM projects. We manually analyze the identified dependencies, categorize them, and compare them to dependencies specified by the development team. We examine the types of projects involved in the identified ecosystems, the structure of the iden-

∗Corresponding author Email addresses: [email protected] (Kelly Blincoe), [email protected] (Francis Harrison), [email protected] (Navpreet Kaur), [email protected] (Daniela Damian)

Preprint submitted to Elsevier February 2, 2019 tified ecosystems, and how the ecosystems structure compares with the social behaviour of project contributors and owners. Results: We find that our Reference Coupling method often identifies tech- nical dependencies between projects that are untracked by developers. We de- empirical insights about the characteristics of large software ecosystems. We find that most ecosystems are centered around one project and are inter- connected with other ecosystems. By exploring the socio-technical alignment within the GitHub ecosystems, we also found that the project owners social be- haviour aligns well with the technical dependencies within the ecosystem, but the project contributors social behaviour does not align with these dependencies. Conclusions: We conclude with a discussion on future research that is en- abled by our Reference Coupling method.

1. Introduction

Software is not developed in isolation anymore. Whether open source or corporate-led, software development takes place within “a collection of software projects which are developed and which co-evolve together in the same envi- ronment”, and which are referred to as software ecosystems [1]. Within such ecosystems, projects depend on one another [1], and yet awareness of such de- pendencies is not trivial. Identifying technical dependencies to external projects within the ecosystem is important for two main reasons: First, developers need to understand how their tasks and code changes impact other projects and who they need to coordinate their changes with at the ecosystem level [1, 2]. For open source ecosystems in particular, it is also important for attracting new contributors [1] since dependencies to other projects within an ecosystem are more likely to attract attention. Newcomers might decide to join the project based on a deeper knowledge of the structure of the system as well as of its exter- nal dependencies. Second, information about software ecosystems and technical dependencies within them enables further exploration and modeling of software ecosystems, an area currently understudied in the software engineering litera-

2 ture [3]. However, identifying technical dependencies between projects on a large scale has proven to be difficult [4]. Existing static dependency analysis approaches do not identify dependencies across projects. Methods for extracting exter- nal dependencies from a project’s or configuration files have been proposed [1, 4, 5, 6, 7, 8, 9, 10, 11], but these approaches limit the types of de- pendencies detected to explicit relationships. Implicit relationships like depen- dencies on web services, operating systems, or hardware are not always visible in configuration files or source code. Further, methods that extract dependencies from source code require large amounts of memory and computation time, so they cannot be employed across a large set of projects [12]. Methods that are applied to configuration or build files are not memory or computation-intensive, but such files are not always available or accurate since not all projects use a . Without a way to easily establish a comprehensive set of dependencies be- tween projects, software practitioners are unable to quickly identify the external dependencies of a software project or understand where their project sits within a software ecosystem. In this paper, we propose a new method, Reference Coupling, to detect cross- project dependencies that leverages solely the information in the developers’ so- cial interactions. The social aspects of a project and its surrounding ecosystem significantly influence the way in which the software project will evolve over time [13], and they cannot be ignored in the development of models, guide- lines and best practices for the analysis and maintenance of software ecosys- tems’ health [14]. Reference Coupling mines the references to other projects that developers in their online interaction (referred to as cross-references henceforth). We validated the method in identifying true cross-project technical dependencies by using it in two large datasets of GitHub and IBM projects and comparing its results with manually identified cross-references in each dataset. We found that the Reference Coupling method does identify technical depen- dencies between projects, and we describe several properties of the ecosystems

3 identified using this method. Reference coupling represents a significant novel complement to the other existing, but insufficient, code-based or configuration file-based methods to identify external dependencies within a project’s ecosys- tem. Our method identifies ecosystems of projects within the same organization, and also outside of the organization for which there are technical dependencies, often hidden or even unknown to developers within a project. Having identified external technical dependencies to projects in our datasets, we further used our Reference Coupling method with a popular community detection algorithm [15] to identify ecosystems across all GitHub-hosted projects and explore socio-technical aspects within these ecosystems. We found that the developers’ socio-technical behavior within GitHub project ecosystems differs between the project owners and actual contributors. Our analysis illustrates the potential for further analysis of software ecosystems’ health, something difficult to assess given the dynamic nature of relationships within the ecosystem, as well as the lack of a centralized management structure for overseeing the ecosystem’s health and survival, most often typical of open source projects [14]. Our previous conference publication reported on some elements of this work [16]; however, this paper introduces numerous extensions to our work. Specifi- cally this paper extends our previous publications by:

• describing how to utilize the Reference Coupling method on a wide variety of software tools (compared to just GitHub in the conference publication).

• validating the Reference Coupling approach on a new dataset from a pro- prietary ecosystem of IBM projects.

• providing a more detailed understanding of the types of dependencies that are captured by the Reference Coupling method through the analysis and categorization of dependencies from the IBM ecosystem and a more de- tailed analysis of the dependencies on GitHub.

• extending the discussion to further describe how the Reference Coupling method can be used to support software developers and software engineer-

4 ing researchers.

The paper has also been significantly restructured to better describe our research methods, development, validation and application of the Reference Coupling technique. The rest of the paper is structured as follows: Section 2 provides an overview of related work in software ecosystems and dependency conceptualizations. The Reference Coupling method is described in Section 3. Our research methods are presented in Section 4. Our results are presented in Section 5. In Section 6, we summarize our findings and discuss open questions for future research. We provide a brief conclusion in Section 7.

2. Related Work

The term software ecosystem (SECO) has emerged as a paradigm to un- derstand the dynamics and heterogeneity in collaborative software engineering. Unlike natural ecosystems, however, there is no common definition for SECO. Two different perspectives on SECOs have been identified in the literature, namely business-centric and platform-centric [3]. The business-centric defini- tion refers to the holistic, business-oriented perspective of a SECO as a network of actors, organizations and companies [17, 18, 19]. The platform-specific per- spective emphasizes the social and technical aspects of a set of software projects, technical platforms, and communities, in line with work of [20, 21]. In our work, we take a platform-specific perspective to leverage and study the socio-technical relationships within ecosystems. The ecosystems we consider are not limited to only those projects in the same organization, but also include projects outside of the organization for which there are technical dependencies. The software ecosystems that received the most attention in previous liter- ature include those around (e.g. [6, 22]), (e.g. [23, 24]), Apache (e.g. [7, 9]), and GNOME (e.g. [13]). Notable research developments also exist in the area of frameworks for analysis of Open Source Software Ecosys- tems (OSSECOs) (e.g. [21]), OSSECO health measurement (e.g. [25]), and tools

5 for visualizing OSSECO projects (e.g. [26]). Recent extensive literature surveys show a growing interest in studies of ecosystems, both in the domain of proprietary [27] as well as open source soft- ware development [3]. In the space, the focus has been on the organizational and business aspects of the ecosystems, with a clear lack of deeper investigations of technical and collaborative aspects of work [27]. In the open source software ecosystems space, the pressing research challenges include the development of methods and tools for the ecosystem modelling and analysis, socio-technical theories to explain the interplay between the social and technical system within ecosystems, as well as the diagnostic and monitoring of ecosystem quality and health [3]. An important step in this direction lies with methods for the identification of a project’s external technical dependencies within its ecosystem, in order to study its structure and socio-technical aspects. Analysis of a project’s source code is a common technique to identify technical dependencies within a project (intra-project). However, these techniques do not scale up well to identify depen- dencies between projects (inter-project). Lungu et al. [5] describe several meth- ods for extracting inter-project dependencies by considering external method and class calls in a project’s source code. However, when investigating a large number of projects, obtaining the source code for every project is not always feasible. Collecting source code data across an entire versioning system would require multiple Terabytes of data and more than a year in processing time [12]. Ossher et al. [4] introduced a technique that analyzes import statements in source code to resolve inter-project dependencies. Businge and Serebrenik [6] employ a similar technique in their study of the Eclipse ecosystem. However, this technique still requires obtaining a large amount of source code and, therefore, requires a large amount of memory. These techniques, therefore, are limited in the number of projects that can be studied. Previous studies have also proposed ways to identify technical dependencies without relying on analysis of source code. One method is to identify technical dependencies by examining declared dependencies from a project’s configuration

6 files or its build files from a dependency management tool like Maven [1, 7, 8, 9, 10, 11]. However, not all projects declare dependencies in configuration files or employ a dependency manager, and, even for those that do, the data can be missing. Bavota et al. [9] found that this information was missing in 37% of releases in a study of the Apache project. Syeed et al. [23] extracted metadata on inter-project dependencies from the published specifications at .org in their study of the Ruby on Rails ecosystem. However, the specified dependencies may be out of date and the approach is specific to only projects that publish dependency specifications. Considering the social aspects of software ecosystems are also important [3]. The work of Mens and his colleagues highlights the role that social aspects play in the future understanding and development of tools, prediction models, guidelines and best practices that allow ecosystem communities to improve upon their current practices [13]. Several other studies [28, 29] have explored ways to detect social connections in software ecosystems. These studies used community detection algorithms to detect communities across GitHub projects focusing on relationships between developers. We use technical dependencies for community detection since the structure of an ecosystem is defined by its technical dependencies [5]. Thung et al. [30] constructed similar project-to-project networks for GitHub-hosted projects. In their networks, edges between projects represent a single developer contributing to both projects. Since developers can often work on multiple independent projects, sharing developers is not an indication of a technical dependency and their network is more of a social perspective. In our approach, we propose a method for automatic identification of techni- dependencies that does not rely on analyzing source code, but takes advan- tage of the cross-references that can be made in developers’ social interactions. These cross-references are user-specified between a pair of projects. They are made in comments on work items, pull requests, issues, and commits as de- velopers coordinate and manage their work dependencies. With these identified technical dependencies we were then able to conduct a socio-technical analy-

7 sis of the behavior of different project members, namely project owners and contributors in the GitHub ecosystem.

3. Reference Coupling

To identify dependencies between projects, we relied on comments made by the developers within one project’s tasks, issues, pull requests and commits that cross-reference another project. Modern collaborative software development tools make it easy for developers to create links between projects within their comments. We call this conceptualization of dependencies between projects Reference Coupling. To develop the Reference Coupling method, we manually examined cross- references between projects on several software development tools to under- stand how they could be automatically extracted. We examined comments that cross-reference other projects on the following, popular open source software code hosting platforms, forges, and issue trackers: GitHub, GitLab, BitBucket, SourceForge, and Jira. We also examined cross-references in a proprietary soft- ware ecosystem by examining the comments in an IBM set of products that together form the Rational solution for Collaborative Lifecycle Management (CLM). We found that all of the open source software code hosting platforms, forges, and issue trackers we examined (GitHub, GitLab, BitBucket, SourceForge, and Jira) employ Markdown languages, which allow plain text to be formatted in a lightweight way. When a Markdown language is employed and a user cross- references another project in a comment using the appropriate syntax, a link to the other project is automatically created, making it easier to navigate between the projects. Due to the adoption of these Markdown languages, cross-references to other projects are created in a standard format to enable these automatic links to be created. GitHub, Bitbucket, and GitLab all extend the Common- Mark specification [31], which was introduced to standardize markdown imple- mentations. Jira and SourceForge, use their own Markdown languages.

8 Table 1: Syntax of Cross-References Syntax Tool Link To Example Project Identifier Artifact Type Artifact ID GitHub, Issues/Pull Requests OWNER/PROJECT # NUMBER rails/rails#123 GitLab, Bitbucket, and Commits OWNER/PROJECT @ SHA twbs/bootstrap@6e2a82 Jira PROJECT[/SUBPROJECT] bugs NUMBER allura:bugs:#123 SourceForge Issues PROJECT[/SUBPROJECT] features #NUMBER allura:features:#123 Commits PROJECT[/SUBPROJECT] code SHA allura:code:3b9d48 — [work] item NUMBER work item 123 — task NUMBER task 456 — story NUMBER story 789 IBM CLM Work Item — defect NUMBER defect 123 — URL NUMBER ://jazz.net/jazz/resource/itemName/ com.ibm.team.workitem.WorkItem/123 *Text in [ ] is optional

The one proprietary software tool ecosystem we reviewed, the IBM CLM suite, did not employ a Markdown language. However, it still allowed explicit links to be created between projects using a . When entering text in a comment, users can click on a button that will allow them to insert a link to a work item. This will open a window which will allow the user to search for the work item(s) they wish to link to. For all of these tools, whether they employ a Markdown language or not, the format of the cross-references in the comment follows the same high-level pattern with an optional project identifier followed by the artifact type and the artifact identifier. The project identifier is optional in the cases that all projects in the ecosystem share the same instance of the project management tool, resulting in unique artifact identifiers across all of the projects, as is the case in the IBM CLM ecosystem. Given the commonality of the syntax across all of these tools, such cross-references can be automatically extracted. Thus, our Reference Coupling method identifies dependencies by considering the following pattern in comments:

The detailed syntax for the various tools and artifact types is shown in Table 1. If this method were to be implemented in a tool, these cross-references could

9 be automatically stored in a separate database table when they are created, making the detection of dependencies nearly real-time. Post-hoc analysis, where cross-references have not been previously extracted and stored, could be done in O(n) time where n is the number of comments to be analyzed since one computation is required for each comment to examine whether there is a cross- reference to another project.

4. Research Methodology

4.1. Research Questions

We validated the Reference Coupling method on two datasets — one from open-source projects hosted on GitHub and the one commercial dataset of IBM projects. Our validation was guided by the following research questions: RQ1a: Does the Reference Coupling method identify inter-project technical de- pendencies on GitHub issue, pull request, and commit comments? If so, what are the characteristics of these dependencies? RQ1b: Does the Reference Coupling method identify inter-project technical de- pendencies on IBM work item comments? If so, what are the characteristics of these dependencies? We then explored the characteristics of as well as socio-technical alignment within the identified GitHub ecosystems by asking: RQ2: What ecosystems exist across GitHub-hosted projects and what is their structure?, and RQ3: Do the project owners’ and contributors’ social behaviours align with the technical dependencies?

4.2. Research Setting and Data Collection

To answer our research questions, we conducted an analysis of the comments and cross-references in the GitHub and IBM CLM projects.

4.2.1. GitHub We obtained data from the GHTorrent [32] project, which provides a mirror of the GitHub API data. GHTorrent obtains its data by monitoring and record-

10 ing GitHub events as they occur. We used the MySQL 2014-04-02 dataset to obtain information on the projects since this paper extends our previous con- ference publication which used this dataset [16]. This dataset contains data on 2,399,526 repositories, 3,426,046 users, and their events — including commits, issues, pull requests and comments. We define a project as a repository and all of its forks as recommended by Kalliamvakou et al. [33]. Since the MySQL database contains only the first 256 characters of com- ments, we obtained all comments from GHTorrent’s main MongoDB in May 2014. The MongoDB contains the full text of all comments. These com- ments were downloaded and stored in a PostgreSQL database for analysis. No pre-processing was needed. Using our Reference Coupling method, we identified 89,784 comments in the GitHub data with a cross-reference to another project1. There are 29,018 repositories (18,533 unique projects when forks are considered) that make a cross-reference to another project. While this is only a small portion of the total number of repositories in our dataset, this is expected since Kalliamvakou et al. [33] have found that the majority of the projects on GitHub are personal and inactive.

4.2.2. IBM Collaborative Lifecycle Management (CLM) We collected data from the products in the IBM Rational solution for Collab- orative Lifecycle Management (CLM). CLM brings together requirements man- agement, quality management, change and configuration management, project planning and tracking on common uniform platform. CLM consists of number of products including Rational Team Concert (RTC), Rational Quality Man- ager (RQM), Rational DOORs Next Generation (DNG), Rational Requirement (RRC), Rational Software Architect (RSA), Rational Rhapsody and Rational insight. The IBM CLM ecosystem is broken into 16 distinct projects.

1These cross-references and the scripts used to identify them are available in a replication package at https://doi.org/10.5281/zenodo.2555526

11 Method RQ1a: GitHub RQ1b: IBM Validation

Manual analysis of 200 random Compare IBM Developer specified cross-references dependencies to cross-references • Technical dependency? Manual analysis of 111 cross-references • If yes, type of dependency? Two Coders not declared as dependency by devs

Method RQ2 RQ3 Application Use cross-references to construct a Construct project-to- technical dependency network project networks based

netdna/bootstrap-cdn yabawock/bootstrap-sass-rails on the “following” and dart-lang/dartlang.org MaxCDN/bootstrap-cdn Identify ecosystems using Louvain “starring” activity of the uq-eresearch/aorra twitter/recess twbs/bootstrap community detection method [11] project owners and contributors todc/todc-bootstrap rouge8/Font-Awesome FortAwesome/Font-Awesome

mjgallag/-bootstrap-3 Characterize ecosystems by: • Manual analysis of visualizations Compare edge weights of these social • Compute network statistics networks with the edge weights from • Examine project README files the dependency network

Figure 1: Summary of Research Methods

For each CLM project, we collected data on all work items created from 2005 to 2015. A work item is a task or an issue that must be attended to during development. The data was downloaded in XML format from the IBM REST API and converted into JSON and stored in PSQL tables. For each work item, we collected the metadata, history, and comments. No pre-processing was done on the data. The dataset consisted of 3,009 work items with a total of 17,708 comments. There were 635 cross-references to another project within those comments. All projects contain cross-references.

4.3. Research Methods

An overview of our research methods to answer each research question are shown in Figure 1 and described in detail in the following subsections.

12 4.3.1. Reference Coupling: Method Validation To validate the Reference Coupling method, we used the method to identify cross-references between projects in both GitHub and the IBM CLM products. We automatically extracted these cross-references with pattern matching using Java Regular expressions [34]. Since we are interested only in relationships between projects, we filtered the cross-references to ignore references within the same project. RQ1a: Does the Reference Coupling method identify inter-project technical dependencies on GitHub issue, pull request, and commit comments? If so, what are the characteristics of these dependencies? To verify that the cross-references to other projects made in comments on GitHub identified through the Reference Coupling method are a valid concep- tualization of dependencies, we examined 200 random comments which were classified as dependencies using the Reference Coupling method since they cross- referenced another project. Different types of comments may be made on differ- ent artifacts. To ensure our analysis included various types of cross-references that may occur, we ensured our randomly selected 200 comments were equally distributed across each of the following relationships: 1) commit comment cross- references another commit, 2) commit comment cross-references an issue or pull request, 3) an issue or pull request comment cross-references a commit, and 4) an issue or pull request comment cross-references another issue or pull request. Thus, there were 50 random comments selected from each of these types of cross-references. This manual content analysis [35] was performed by two people familiar with software development practices. They classified a comment as a technical de- pendency if the comment described a work dependency, either direct or indirect, between the two projects. For each dependency, they also noted if the depen- dency was direct (between the two projects) or indirect (both projects depend on a third project). The same two people further examined the comments that were classified as

13 technical dependencies to identify the types of dependencies that are identified using the Reference Coupling method. The dependencies were classified using common dependency types that can be declared in issue tracking systems:

• Duplicate: the issue/commits on the two projects are duplicates of each other. Within a project, duplicate issues would describe the same problem. Across projects, an example of a duplicate issue could be both projects have created issues to deal with a breaking API change from a shared dependency.

• Blocking: an issue/commit on one project is blocking work in the other project. For example, one project is waiting for the other project to release a promised API change before they can finalize a new feature that will be enabled by that API change.

• Resolving: an issue/commit on one project resolves an issue in the other project. For example, one project had security issues which it has inherited from a project it depends on. Once the other project fixes its security issues, the issue will be resolved in the dependent project as well.

• Affecting: an issue/commit on one project is impacted by an issue/commit on the other project. In other words, changes need to be made in the first project due to changes made in the other project. For example, a project deprecates an old API which would cause any projects using that old API to update to a more recent API.

When a dependency did not fit one of these categories, open coding was used to identify the type of dependency [36]. The cross-references that did not match one of the pre-defined dependency types were reviewed and conceptually similar comments were grouped into categories. This resulted in two new dependency categories being introduced, Leveraging and Updating, which are described in Section 5. Each coder independently did the manual analysis and coding. Then the two coders met to discuss their results and try to come to a consensus. The coders were able to come to a consensus for all items after this discussion,

14 Table 2: Inter-coder Reliability: Kohen’s Kappa Initial Agreement Initial Cohen’s Kappa Final Agreement Final Cohen’s Kappa Existance of Dependency 99% 0.828 100% 1 Direct/Indirect 97% 0.807 100% 1 Affecting 88.5% 0.678 100% 1 Blocking 98.5% 0.911 100% 1 Duplicate 96.5% 0.895 100% 1 Leveraging 98.5% 0.660 100% 1 Resolving 92% 0.826 100% 1 Updating 98.5% 0.816 100% 1 resulting in 100% agreement. Table 2 shows the inter-coder reliability using Kohen’s Alpha for each of the categories for the intial independent coding and the final agreed upon codes. Cohen’s alpha was calculated using ReCal [37]. RQ1b: Does the Reference Coupling method identify inter-project technical dependencies on IBM work item comments? If so, what are the characteristics of these dependencies? To validate the Reference Coupling method identifies true technical depen- dencies between the IBM projects, we are able to compare the dependencies identified using the Reference Coupling method to the developer declared de- pendencies. For each work item, the metadata contains information on depen- dencies between the work items captured by the developers along with the type of dependency. IBM developers can choose between 26 established dependency classifications such as Depends On, Blocks, Duplicate Of, and Resolves.2 Of the 26 classification, only 14 are used by the IBM developers in our dataset. These 14 dependency classifications are described in Table 3. We grouped the classifications into types, since there are pairs of classifications which represent reciprocal relationships. We analyze these developer specified dependencies to see how many were also identified by the Reference Coupling method. In addition, we also manu- ally analysed the cases where the Reference Coupling method identified a depen- dency, but the IBM developers did not indicate the dependency within the work

2https://jazz.net/help-dev/clm/index.jsp?topic=%2Fcom..team.concert.sdk.doc%2Ftopics%2Fr link domains.

15 Table 3: Types of Dependencies Dependency Type Classification Description Blocks The work item blocks work item X Blocking Depends on The work item depends on work item X Resolves The work item resolves work item X Resolving Resolved by The work item is resolved by work item X Duplicated by The work item is duplicated by work item X Duplicate Duplicate of The work item is a duplicate of work item X Affected by Defect A work item is affected by a defect Affecting Affects Item The work item impacts plan item X Parent The work item is a parent of work item X Parent/Child Children The work item is a child of work item X Related The work item has a general relationship with work Related item X Related Change Request The work item is related to a change request item Contributes to The work item contributes to work item X Planning Tracks The work item tracks work item X item. We analysed these cases to determine if the Reference Coupling method identifies cases of true technical dependencies that had not been marked as such by the IBM developers. In the case that the manual analysis revealed a technical dependency, we also categorized the type of dependency using the same cate- gories as used by the IBM developers to understand what types of dependencies Reference Coupling captures.

4.3.2. Reference Coupling: Method Application RQ2: What ecosystems exist across GitHub-hosted projects and what is their structure? To illustrate the applicability of our method, we constructed a network of the technical dependency relationships established through Reference Coupling as described in Section 3. The Dependency Network is defined as a directed graph Gd =< V, E >. The set of vertices, denoted by V , is all GitHub projects involved in at least one cross-reference. There are 18,533 projects in this set. The set of edges, denoted by E, is a set of node pairs E(V ) = {(x, y)|x, y ∈

V }. If the project represented by node xi cross-referenced the project repre-

16 sented by node yj, there is a directed edge from xi to yj. The weight of each edge is the count of cross-references for the pair of projects. We filtered the edges to only consider dependencies between nodes if the pair of projects have been cross-referenced two or more times to capture only the stronger dependencies. It is important to note that this directed graph captures the direction of the cross-referencing comments and not the direction of the dependencies that those comments imply. A project could cross-reference another project because it is blocked by that project or because it is blocking that project. The nuances of the dependency direction are not captured by our method. To identify ecosystems across projects hosted on GitHub, we used the pop- ular Louvain community detection method [15] on the Dependency Network established through Reference Coupling. The Louvain method is a greedy opti- mization method that aims to partition a network into communities of densely connected nodes and optimize the modularity of the network. Modularity is de- fined as “the number of edges falling within [communities] minus the expected number in an equivalent network with edges placed at random [38].” High mod- ularity scores indicate that there are dense connections within the communities but sparse connections across communities, showing that an optimal solution has been found. When high modularity scores are obtained, the communities have significant real-world meaning [15]. The Louvain method is comprised of two steps. It first optimizes modularity locally by looking for small communi- ties. Then it aggregates the nodes in each small community and builds a new network with these aggregated nodes. It iterates on these two steps until the modularity is maximized. The Louvain method outperforms all other commu- nity detection methods in terms of both the modularity that is achieved and the computation time [15]. In our network, the identified communities represent sets of projects densely connected by technical dependencies. Since dependencies that exist between projects define the structure of an ecosystem [5], these communities represent software ecosystems. To identify properties of the identified ecosystems, we:

17 • Analyzed visualizations of the Dependency Network. Visualizations of each ecosystem detected by the Louvain community detection method (as described above) were manually reviewed. We used the Gephi [39] graph- ing tool to create these visualizations. One of the authors inspected these visualizations of the network to visually identify patterns. The identified patterns were cross-checked by two of the other authors.

• Computed network statistics for each of the ecosystems, such as in-degree and out-degree of the nodes.

• Examined the types of projects involved in the ecosystems by reviewing the GitHub README files of the most well-connected project node in each of the ecosystems.

RQ3: Do the project owners’ and contributors’ social behaviours align with the technical dependencies? To complement our investigation of technical dependencies and connect- edness of projects in GitHub, we also sought to understand the social be- haviour of project owners/contributors in relation to the ecosystems we identi- fied. We studied two of GitHub’s social relationships, following users and star- ring projects. On GitHub, users can follow other users to receive notification on their activity and star a repository to it or indicate interest in the project. To understand how the social behaviour of project owners/contributors relates to the identified ecosystems, we examine the alignment between social and technical connections between the projects. To answer RQ3, we construct project-to-project networks based on the fol- lowing and starring activity of the project owners and contributors. Project Owners. We constructed two networks using the following and star- ring relationships by considering the actions of the project owners. The Owner

Stars Network, Gos =< V, E >, and the Owner Follows Network, Gof =< V, E >, are both undirected graphs whose set of vertices is all GitHub projects involved in at least one cross-reference. For the Owner Follows Network, there is an edge from nodes xi to yj if the owner of project xi follows the owner of

18 project yj. There is an edge from xi to yj in the Owner Stars Network if an owner of any project in our dataset has starred both project xi and project yj. Project Contributors. We constructed two additional networks using these following and starring relationships by considering the actions of the project contributors (users who have made commits on the project or are members of the project). The Contributor Stars Network, Gcs =< V, E >, and the

Contributor Follows Network, Gcf =< V, E >, are also undirected graphs whose set of vertices is all GitHub projects involved in at least one cross-reference. The

Contributor Follows Network has an edge from nodes xi to yj if a contributor of project xi follows a contributor of project yj. The Contributor Stars Network has an edge from xi to yj if a contributor to any project in our dataset has starred both project xi and project yj. To compare the social connections with the technical dependencies, we com- pare the edge weights of these two networks with the edge weights of the De- pendency Network constructed to answer RQ2. Pearson correlations were used since the data was normally distributed. The edge weights of these networks represent the following:

• Dependency Network Gd: Number of technical dependencies, measured through Reference Coupling, between the two project nodes.

• Owner Follows Network Gof : 0 if neither project owner follows the other, 1 if one project owner follows the other project owner, and 2 if both project owners follow each other.

• Owner Stars Network Gos: Number of project owners who have starred both projects.

• Contributor Follows Network Gcf : Number of contributors with following relationships for the pair of projects.

• Contributor Stars Network Gcs: Number of project contributors who have starred both projects.

19 5. Results

5.1. Method Validation on GitHub Data

RQ1a: Does the Reference Coupling method identify inter-project technical dependencies on GitHub issue, pull request, and commit comments? If so, what are the characteristics of these dependencies? Of the 200 examined cross-references, we obtained 96.5% precision as 193 were found by the two manual coders to be true technical dependencies. We are unable to calculate recall of our method since we do not have a ground truth set of all dependencies. Of these 193, 176 (91%) were direct dependencies between the two projects and 17 (9%) were indirect dependencies where the two projects both depend on a third project. Dependency between the two projects. The most common type of dependency found was a direct technical dependency between the two projects. An example of a direct technical dependency is when an issue created in one project depends on a fix/update in another project. Another example is when a project needs to be updated based on changes made in another project. Below we provide three examples of cross-reference comments that are in- dicative of direct technical dependencies between the two projects. Project names follow the pattern user/repository where user is the owner’s GitHub lo- gin and repository is the name of the project repository. Issue #449 on the sensu/sensu project describes an issue that is the result of the interaction between the sensu/sensu code and the ruby-amqp/amq-client . The comment references a commit on the ruby-amqp/amq-client that fixes the issue.

“I verified that the problem is still the one referenced in ruby-amqp/amq- client#14. This fix is not merged with amq-client’s ‘0.9.x-stable’ branch. This is why I am still hitting it. The commit ruby-amqp/amq- client@60f1c59 is the fix but it resides only in the master branch.”

20 Issue #8 on the tsujigiri/axiom project notes that changes must be made to the code base to allow an upgrade to the latest release of the ninenines/cowboy project.

“Upgrade Cowboy: After Cowboy 0.6.1 Cowboy’s http req record was made opaque and can not be used directly anymore. I didn’t really have the time yet to look into it, but it looks like we just need to remove all references to the record from the documentation and add directions on how to access cowboy req:req() via the cowboy req func- tions. See ninenines/cowboy#266 and ninenines/cowboy#267.”

Commit 81bbbec21c04b6392f6892f7735243387d295337 on the joyent/node project closes isaacs/node-graceful-fs issue #6, which describes a problem in the isaacs/node-graceful-fs code stemming from the use of joyent/node. GitHub allows automatic closure of issues through commit comments, even when the commit is in a different repository.3

“This fixes isaacs/node-graceful-fs#6.”

Both projects depend on a third project. We also identified some cases where the comments describe a dependency on a third project that is not cross- referenced. For example, everzet/capifony’s pull request #376 cross-references composer/composer’s issue #1453, but the problem stems from the use of the /symfony project. After identifying the source of the problem, a new is- sue (#411) is created on the everzet/capifony project that identifies the changes that need to be made to the way the symfony environment is set so that the composer/composer code executes correctly.

“As described in #376 capifony should execute composer with the right symfony environment set. Currently, with --no-scripts op- tion removed in #376, composer is always executing symfony scripts with default dev environment.”

3https://github.com/blog/1439-closing-issues-across-repositories

21 Table 4: Types of Dependencies Identified by Reference Coupling Dependency Type Count (Ratio) Resolving 74 (38.3%) Duplicate 43 (22.3%) Affecting 43 (22.3%) Blocking 18 ( 9.3%) Updating 9 ( 4.7%) Leveraging 6 ( 3.1%)

5.1.1. Dependency Categories Most (178 of 193 or 92%) of the dependencies were able to be assigned to one of the four pre-existing dependency types (duplicate, blocking, resolving or affecting). The remaining cross-references were examined and open coding was used to generate additional dependency categories. Only two new categories were created by the two manual coders, both of which are specific to cross- project dependencies:

• Leveraging: the two projects depend on a third project and both experi- ence the same issue due to this shared dependency. One of the projects leverages a solution to this problem that has been generated by the other project.

• Updating: one of the projects depends on the other project and is updating to a more recent version of the other project.

Table 4 shows the breakdown of how often each dependency type appeared in our data. Resolving is the most common dependency type.

Answer to RQ1a: The Reference Coupling method does identify inter- project technical dependencies on GitHub pull request, issue and commit comments; 96.5% of manually analyzed comments revealed technical de- pendencies between the cross-referenced projects. Of these, most are direct

22 Table 5: Types of Dependencies Not Identified by IBM Developers. Dependency Type Count Duplicate 57 Related 42 Parent/Child 28 Affecting 10 Blocking 6 Planning 2 Resolving 1

dependencies between the two projects, but some are a shared dependency on a third project. We further classified these dependencies and found that the Reference Coupling method identifies a variety of dependency types. The most common type of dependency identified is Resolving: where a change in one project resolves an issue in the other project.

5.2. Validating Method on IBM Data

RQ1b: Does the Reference Coupling method identify inter-project technical dependencies on IBM work item comments? If so, what are the characteristics of these dependencies? The Reference Coupling method identified dependencies between 2108 pairs of work items, 635 of which are inter-project dependencies. Of the inter-project dependencies, 146 (23%) were marked as technical dependencies by the IBM developers. Table 5 shows the types of dependencies that were both identified by Reference Coupling and the IBM developers. The most common types of dependencies that were identified by both the Reference Coupling method and the IBM developers are duplicates, related, and parent/child relationships. However, there were 489 inter-project dependencies identified by the Ref- erence Coupling method which were not marked as dependencies by the IBM

23 developers. To validate that our Reference Coupling method was, in fact, identi- fying dependencies, despite the fact that not all were marked as such within the repository, two coders familiar with software development practices reviewed a random sample of 111 of the 489 work items and manually assessed if the Reference Coupling method identified a valid technical dependency. The two coders achieved a 95% inter-coder reliability. They discussed any cases where their coding did not align and came to a concenus. Of the 111 work items, all were found to have dependencies with other projects by the coders. The coders also categorized these 111 dependencies using the dependency types in Table 3. They also used an additional type (Unknown) to label dependencies that did not easily fit into one of the 26 pre-established dependency types. Table 6 shows the types of dependencies that were evident among the pairs of work items. The most common dependency type that was identified by our Reference Coupling method but was not marked as a dependency by the IBM developers was ‘Affected by Defect’. For example, this comment describes how a defect, which is associated to a different project, caused problems on the current project.

“The Validate is not defined error from comment 2 was fixed in defect 162650.”

Another common dependency type identified by Reference Coupling but not flagged as a dependency by the developers is ‘Related’. For example,

“This is related to item 125838.”

The links between projects are not as clearly seen in the textual comments in IBM CLM, since the project associated with the work item is only available in the metadata on the work item. In these cases, it would be quite easy for developers to be unaware that a coordination need with another project exists since the fact that these work items are part of another project is not apparent.

24 Table 6: Types of Dependencies Not Identified by IBM Developers. Dependency Type Count Affecting 73 Related 30 Planning 1 Blocking 1 Unknown 6

Answer to RQ1b: The Reference Coupling method does identify inter- project technical dependencies on IBM work item comments. It identifies a wide variety of dependency types. The most common type of dependency that is found with the Reference Coupling method but missed by the IBM developers is the ‘Affecting’ category.

5.3. Applying Reference Coupling to identify and examine GitHub ecosystems

5.3.1. Ecosystem Identification RQ2: What ecosystems exist across GitHub-hosted projects and what is their structure? Figure 3 shows the full Dependency Network, though for visibility we only display nodes with degree of 3 or greater. As visible on the graph, most of the nodes (10,484 of 18,533 projects or 57%) are part of the largest connected component (commonly referred to as the giant component [40]), which is the largest subgraph in which every node is connected to every other node by some path. The connected components isolated from the giant component are primar- ily comprised of same owner communities in which all nodes in the connected component are projects owned by the same GitHub user or organization. For example, the second largest connected component is comprised of 65 nodes, of which, all but two are owned by GitHub user deathcap.4 Most of the nodes

4https://github.com/deathcap

25 With Outliers Outliers Removed 8 2500 7 6 2000 5 1500 4 1000 3 500 2 1 0

Figure 2: Number of unique cross-referenced projects for all projects that make at least one cross-reference to another project on GitHub. isolated from the giant component are connected to only a small number of nodes. In fact, 75% of nodes not in the giant component are connected to only one other node. Figure 2 shows boxplots for the number of unique cross-references for each project which cross-references at least one other project. These boxplots show that while some projects make cross-references to many other projects, most projects have only a small number of other projects which they cross-reference. Since we are most interested in studying the popular GitHub ecosystems, we focus our analysis on the interconnected part of the network or the giant com- ponent. Figure 4 shows the giant component. The color of the nodes represent

26 Figure 3: All GitHub projects with cross-references. The largest connected component (or giant component) is easily identified as the well-connected subgraph appearing in the center of the graph. communities as detected by the Louvain method. We obtained a modularity score of 0.913 (out of a possible range of 0 to 1). This high modularity score indicates that the detected communities are much more tightly connected by technical dependencies than would appear in a random graph. There were 43 ecosystems identified in this network. Nodes are sized ac- cording to their authority to display the nodes that are more prominent in each ecosystem. When a node has a high number of cross-reference relationships pointing to it, it has a high authority value [41]. Table 7 shows the most well- connected project node (highest Authority value) in each of the ecosystems. Properties of GitHub Ecosystems Ecosystems revolve around one central project. As depicted in Figure 4, each ecosystem appears to revolve around one main project. In Table 7, the most well-connected project node in each ecosystem is listed along with a description

27 daliwali/fortune json-/json-apirpflorence/ember-qunit

stefanpenner/ember-inflector basho/erlang_jsbasho/riaknostic sproutsocial/es6-import-validate tricknotes/ember-datapeter-murach/github basho/giddyup basho/riak_search basho/bitcask

anl/kitchen-gce stefanpenner/ember-app-kit toranb/ember-data--rest-adapter thomasboyt/defeatureify basho/eleveldb emberjs/data basho/riak--client GroupTalent/epf basho/cuttlefish kevinykchan/knife-joyent emberjs/ember-devemberjs/website /cucumber-js basho/riak_ensemblebasho/riak_core novalis/bitcask

basho/riak_api basho/leveldb fog/fog myronmarston/vcr cucumber/multi_test fog/fog-core tricknotes/ember.js basho/riak_control indirect/ember-rails ebryn/backburner.js heartsentwined/ember-auth ebryn/ember-model aidahz/riak_core basho/riak_dt emberjs/ember.js wkhtmltopdf/wkhtmltopdf.github.io basho/riak_test cucumber/cucumber-rails basho/riakbasho/riak_pipe basho/riak_kv rumblelabs/asset_sync flyerhzm/switch_user basho/sidejob spagalloco/em-twittergeemus/excon basho/riak_pb tildeio/router.js basho/lager owncloud/chat KentBeck/junit basho/riak-python-client bblimke/webmock zgramana/router.js Addepar/ember-table cucumber/cucumber-ruby-core basho/yokozuna fruux/sabre-dav drbrain/net-http-persistent bergie/dnode-php cucumber/cucumber-html basho/node_packagebasho/riak-ruby-client vmeurisse/node-phantom wkhtmltopdf/wkhtmltopdf intridea/tweetstream cowboyd/libv8 McNetic/PHPZipStreamer sferik/twitter basho/riak-erlang-http-clientbasho/basho_docs cowboyd/therubyracer stereobooster/execjs lostisland/faraday vcr/vcr cucumber/cucumber-jvm owncloud/music chobie/php-uv basho/stanchion basho/riak-erlang-client technoweenie/faraday owncloud/appframework jnunemaker/twitter wkhtmltopdf/qt owncloud/ jlogsdon/php-cli-tools n1k0/casperjs typhoeus/ethon cucumber/gherkin laurentj/slimerjs rubygems/rubygems-aws rzezeski/yokozuna trinidad/trinidad seattlerb/minitest karmi/rubygems-aws owncloud/contacts luislavena/-compilerjingweno/msgpack_rails owncloud/media owncloud/documentsowncloud/apps wp-cli/package-indexgharlan/propelorm.github.com cldwalker/hirb emberjs/ember-rails cucumber/cucumber basho/riak_cs sferik/t s17t/owncloud reactphp/react mono/monomac owncloud/gallery ariya/ simplepie/simplepie wp-cli/wp-cli.github.com

chancancode/rails igrigorik/em-http-request test-unit/test-unit whit537/www.gittip.com bjeanes/squash-web nemerle/red_tint owncloud/news zimbatm/raven-ruby anarchivist/archivesspace owncloud/bookmarks owncloud/mozilla_sync postrank-labs/goliath test-unit/test-unit-activesupport yaauie/cliver ruby-korea/ruby-korea.github.io sennza/Chassis myint/perceptualdiff tiagohillebrandt/builds marcj/Propel2 mono/mono-tools jDataView/jBinary plentz/jruby_report mathnet/mathnet-numerics /eventmachine luislavena/rb-readline wojsmol/wp-cli.github.com owncloud/android cedriclombardot/Propel owncloud/mirallowncloud/core /jruby-rack seattlerb/minitest-bacon paulcbetts/ModernHttpClient gma/nestacms.com IronLanguages/main benbalter/wordpress-plugin-tests bradt/wp-migrate-db ohler55/oj SynoCommunity/spksrc jnr/jnr-constants rweng/-rails owncloud/documentation mishoo/UglifyJS2 wycats/thor freerange/mochatyphoeus/typhoeus jruby/bytelist tmm1/http_parser.rb rubygems/rubygems.org wp-cli/builds propelorm/Propel2 zendframework/ZFTool JabbR/JabbR fsharp/FSharp.Data zendframework/zf-web carlhuda/bundler-site rlinehan/dotfiles wp-cli/server-command topazproject/topaz nickl-/composer propelorm/propelorm.github.com pry/pry-coolline jpcamara/jruby celluloid/dcell tenderlove/psych jhelwig/technosorcery.net wp-cli/wp-cli

mishoo/UglifyJS celluloid/celluloid-io rubygems/bundler-api jruby/warbler guzzle/guzzle3 brianmario/yajl-ruby bootstraponline/gollum-lib adhearsion/adhearsion /mruby github/markup nareshv/ UnionOfRAD/framework propelorm/PropelBundle staabm/Propel2 bundler/bundler-site jonleighton/poltergeist zendframework/Component_ZendMail flori/json zendframework/Component_ZendView michaelficarra/esfuzz gma/nesta bundler/bundler-features ruby-amqp/rubybunny.info sikachu/rails macgitver/libHeaven owncloud/3rdparty craftsmen/dad zendframework/zf1 celluloid/reel jruby/jruby-ossl /gitmono/mono mhitza/Propel2 adhearsion/adhearsion-i18n naderman/composer adhearsion/punchblock pry/pry zendframework/Component_ZendServiceManagerzendframework/ZF2Package /migrations rayo/xmpp ffi/ffi jbarnette/isolate gollum/gollum zendframework/Component_ZendDebug digitalkaoz/composer phpspec/phpspec2-sitephpspec/prophecy Shopify/active_utils UnionOfRAD/manual macgitver/DiffViewer oneclick/rubyinstaller borland/ironlanguages-main zendframework/Component_ZendEscaper intridea/multi_jsonmeskyanichi/hirefire-resource thorin/redmine_ldap_syncShopify/active_fulfillment macgitver/MacGitverModules zendframework/ZendOAuth sebastianbergmann/-selenium h5bp/server-configs-gae jnicklas/capybaralsegal/rubydoc.info zendframework/Component_ZendModuleManager webbj74/-fitbit cboden/Ratchet ruby-amqp/bunny gollum/gollum-lib Ocramius/OcraDiCompiler Sylius/SyliusResourceBundle spree/spree-product-assembly capnregex/spree_auth_devise basho/webmachine zendframework/Component_ZendConfig nesquena/rabl celluloid/celluloid kachick/ sebastianbergmann/dbunitUnionOfRAD/lithium Sylius/Sylius-Docsphpspec/phpspec zendframework/Component_ZendValidatorguzzle/guzzle Ocramius/DoctrineModule Respect/Loader h5bp/server-configs-nginx jnicklas/celluloid-zmq gds-operations/puppet-elasticsearch rubyspec/rubyspec bickerstoff/zf2-documentation lautis/uglifier spastorino/rails-api dahlbyk/libgit2sharp macgitver/libGitWrap sensiolabs/security-advisories rails-api/active_model_serializers macgitver/macgitver zendframework/zf2-documentation /bolt-docs 200Creative/spree_bootstrap_frontend jruby/jruby-launcher rubysl/rubysl-socket freerunningtechnologies/spree_reviews rubinius/rubinius-ast libgit2/pygit2 zendframework/Component_ZendText propelorm/Propel Sylius/Sylius-Standard galymzhan/composer bundler/bundlernobu/mruby lsegal/ zendframework/Component_ZendPermissionsAcl zendframework/Component_ZendDom zendframework/Component_ZendTag cristianosistemas/phpunit-documentation Shopify/active_merchant Ocramius/hiphop-php vakata/jstree Modernizr/-modernizr drublic/h5bp-stylus spree/spree_active_shipping spree/spree-guides enricosada/IronLanguages spree/spree_wishlist ruby/rubyjruby/jrubyjnr/jnr-posix dahlbyk/posh-git zendframework/Component_ZendEventManager zf-fr/zfr-oauth2-server-modulesebastianbergmann/php-code-coveragecomposer/satis padraic/mockery richardhinkamp/bolt-docs carlhuda/bundler thekid/behave Sylius/Sylius libgit2/rugged nulltoken/libgit2sharp zendframework/Component_ZendProgressBar

radar/spree zendframework/Component_ZendMath spree/spree_i18n ms-ati/docile rtomayko/posix-spawn zendframework/Component_ZendCache rubygems/guides zendframework/Component_ZendCrypt Netpositive/ndeploy h5bp/node-build-script libgit2/libgit2 Majkl578/nette iloveitaly/spree_volume_pricing mperham/sidekiq zendframework/Component_ZendMvc symfony/DoctrineBridge sensiolabs/-Connect ghedo/p5-Git-Raw doctrine/DoctrineORMModule sonata-project/SonataDoctrineORMAdminBundle AgilTec/spree_fancy resolve/refinerycms-blogtechnicalpickles/-spies sebastianbergmann/phpunit-documentation aFarkas/html5shiv parndt/seo_meta xaviershay/rspec-fire zendframework/Component_ZendDb doctrine/dbal hacken-in/puppet-hackenin libgit2/libgit2.github.com snarfed/oauth-dropins zendframework/ZendSkeletonApplication h5bp/server-configs-apacheh5bp/server-configs zendframework/Component_ZendFile superdweebie/mongodb-odm ntomka/bolt spree/spree_gatewayRubyMoney/money rubygems/rubygems apmasell/vapis zendframework/Component_ZendLoader openid/ruby-openid rubinius/rubinius-melbourne zendframework/Component_ZendServer jdorn/sql-formatter silexphp/Silex bolt/bolt git-tfs/git-tfs tzinfo/tzinfo-data brainspec/enumerize zendframework/ZendDeveloperTools thedarkone/rails-dev-boostspree/spree zendframework/Component_ZendInputFilter polyfractal/athletic rubinius/rubinius doctrine/DoctrineModule /hiphop-php refinery/refinerycms-blogmarkevans/dragonflymathie/broken_cucumber_demo yorah/libgit2 snarfed/webutil zendframework/Component_ZendDi sebastianbergmann/phpunit-mock-objectscomposer/packagist Modernizr/Modernizr GeekOnCoffee/spree_product_sort vmg/clar piuccio/ariatemplates hacken-in/website Ocramius/OcraServiceManager jmikola/mongo-php-driver sonata-project/SonataMediaBundle spree/spree_auth_devise rspec/rspec-mocks macournoyer/thin zendframework/Component_ZendPaginatorzendframework/Component_ZendI18n rixbeck/bolt libgit2/objective-git swiftmailer/swiftmailer h5bp/ant-build-script mperham/dalli agrobbin/uclass sonata-project/SonataUserBundle radar/spree_auth_devise rails/strong_parameters Ocramius/LazyMap adam-gegotek/symfony drapergem/draper weierophinney/ZendSkeletonApplication guard/guard-rspec sebastianbergmann/phpunit necolas/normalize. gitextensions/gitextensions zendframework/Component_ZendLog spree/spree_reviews refinery/refinerycms zendframework/zf2zendframework/Component_ZendFilter snarfed/bridgy zendframework/Component_ZendPermissionsRbac yujinakayama/transpec snarfed/activitystreams-unofficial Exercise/FOQElasticaBundle svenfuchs/globalize3 hhvm/packaging composer/installers spree/spree_fancy airblade/paper_trail thoughtbot/suspenders zendframework/Component_ZendTest ariatemplates/ariatemplates Modernizr/modernizr.com zendframework/Component_ZendFeed sonata-project/SonataAdminBundle ernie/squeel zendframework/Component_ZendSerializer composer/composer rspec/rspec-railsgithub/github-services libgit2/libgit2sharp voloko/sdoc KnpLabs/KnpMenuBundle oxpeck/spree robin850/coffee-rails nukomeet/coworfing rails/protected_attributes silexphp/Silex-WebProfiler nulltoken/libgit2 fabpot/Silex spree/deface artsy/garner kartikprabhu/activitystreams-unofficial zendframework/Component_ZendUridoctrine/DoctrineMongoODMModule zendframework/Component_ZendHttp sonata-project/SonataDoctrinePhpcrAdminBundle ariatemplates/usermanual rails/journey rspec/rspec-support fabpot/Pimple sstephenson/ruby-build zendframework/Component_ZendForm doctrine/doctrine2 h5bp/-boilerplate parndt/decorators doctrine/DoctrineBundle rspec/rspec-expectations Uncommon/Xit t-8ch/urllib3 zendframework/Component_ZendNavigation nishigori/nyancat-phpunit-resultprinterdoctrine/mongodb h5bp/mobile-boilerplate jonatack/polyamorous ayosec/adminful zendframework/Component_ZendStdlib schmittjoh/JMSSerializerBundle substack/browser-pack thoughtbot/shoulda sensio/SensioFrameworkExtraBundle facebook/hhvmzendframework/Component_ZendMemory rails/arel rails/coffee-rails rspec/rspec-activemodel-mocks arthurschreiber/rugged zendframework/Component_ZendCaptcha rowanj/gitx h5bp/html5boilerplate.comKatello/hammer-cli-katello brianmario/mysql2 seancribbs/webmachine-ruby zendframework/Component_ZendLdap fabpot/Twig /discourse rspec/rspec-its composer/getcomposer.org jkbr/httpie kennethreitz/httpbin zendframework/Component_ZendJson KnpLabs/KnpMenu thoughtbot/shoulda-matchers requests/requests-oauthlib Ocramius/ProxyManager sensio/SensioGeneratorBundle FriendsOfSymfony/FOSRestBundle spraints/git-tfs zendframework/Component_ZendSession sensio/SensioDistributionBundle versgui/Form substack/node-detective drublic/mobile-boilerplate pwnall/rails5738 zendframework/Component_ZendSoap akeneo/pim-community-standard visionmedia/jade yellow5/foreigner-matcher rspec/rspec-corehalostatue/diff-lcs rails/activeresource norman/friendly_id zendframework/Component_ZendAuthentication symfony-cmf/MenuBundle mongoid/moped zendframework/Component_ZendMime doctrine/mongodb-odm indirect/haml-rails headius/thread_safe yola/demands symfony/symfony-standard Alamoz/docrails astropy/astropy-api Ocramius/common xdissent/karma-browserify /rdocrails/rails mongoid/mongoid-site globalize/globalize sensiolabs/SensioDistributionBundle aeolus-incubator/bundler_ext zendframework/Component_ZendXmlRpc doctrine/DoctrineMongoDBBundle symfony/Icu gammapy/tevpy zendframework/Component_ZendBarcode alrra/server-configs daviddavis/katello-cli 1up-lab/FOSMessageBundle taldcroft/asciitable zendframework/Component_ZendVersion sferik/rails_admin dabeaz/ply kennethreitz/requests mongoid/origin symfony/Console /enhanced-resolve use-init/init lifo/docrails jnicklas/turnip schmittjoh/serializer rsim/oracle-enhanced merk/Citadel Katello/katello rack/rack skaes/rvm-patchsets zendframework/Component_ZendCodezendframework/Component_ZendConsole mongoid/mongoid schmittjoh/JMSI18nRoutingBundle symfony-cmf/MediaBundle amatsuda/kaminari gagern/urllib3 symfony/HttpFoundation radar/spree_gateway kennethreitz/grequests doctrine/commonOcramius/doctrine2 symfony/SwiftmailerBundle symfony/WebProfilerBundle rails/docrails sstephenson/rbenv yahonda/rails-dev-box-runs-oracle spacetelescope/stsynphot yiisoft/ substack/node-browserify globalize/globalize-versioning astropy/astropy-website liberfa/erfa yiisoft/yii2-framework kizu/bemto Katello/katello-installer mikel/ symfony/MonologBundle smockle/smockle rails/cache_digests fnichol/-rvm mongoid/mongoid.github.com astropy/astropy-APEs shtylman/node-browser-resolve tzinfo/tzinfo ryanb/cancan rvm/gem-wrappers yiisoft/yii2 symfony/DomCrawler webpack/core stof/DoctrineMongoDBBundle fabpot/sphinx-php yii2-chinesization/yii2-zh-cn FriendsOfSymfony/FOSFacebookBundle tenderlove/racc wayneeseguin/rvm-test doctrine/mongodb-odm-documentation symfony-cmf/CreateBundle tPl0ch/symfony-docs PythonCharmers/python-future danez/symfony-docs pllim/pysynphot ryanmcgrath/twython Katello/runcible astropy/coordinates-benchmark shazow/urllib3 geophysics/MoPaD bcardarella/client_side_validationshaml/haml rails/sprockets-rails CanCanCommunity/cancancan astropy/photutils yiiext/trash-bin-behavior symfony-cmf/Routing jeremy/mail statbit/composite_primary_keys Pybonacci/poliastro rvm/rvm-binary gammapy/gammapy mpapis/rubygems-bundler doctrine/DoctrinePHPCRBundle symfony/EventDispatcher webpack/jade-loader rspec/rspec-dev symfony/symfony Katello/katello-cli substack/module-deps peerigon/alamid roboguice/roboguice phpDocumentor/Reflection kevarch/symfony-docs symfony/symfony-docs MacRuby/MacRuby yahonda/oracle-enhanced Ocramius/mongodb-odm symfony-cmf/symfony-cmf travis-ci/travis-rubies orgsync/active_interaction obspy/obspy doctrine/annotations FriendsOfSymfony/FOSUserBundle webpack/worker-loader le-phare/BazingaJsTranslationBundle daviddavis/katello spacetelescope/pysynphot doctrine/phpcr-odm plataformatec/simple_formsstephenson/sprockets astropy/ccdproc cython/cython jvperrin/date_validator Seldaek/monolog scambra/devise_invitable diaspora/diaspora lyrixx/symfony megies/obspy Orange-OpenSource/YACassandraPDO kbond/symfony-standard symfony/Filesystem diaspora/diaspora-project-site alphagov/govuk_content_models slang800/jade-book astropy/astropy schmittjoh/JMSDiExtraBundle astropy/astroquery scipy/scipy obspy/sandbox craue/CraueFormFlowBundle jmreidy/grunt-browserify assaf/vanity gildegoma/travis-build symfony-cmf/SearchBundle eregon/racc meskyanichi/backup gregbell/active_admin astropy/package-template /sinatra wayneeseguin/rvm claes08/symfony-standard dcestari/TextSecure-iOSpuppetlabs/puppetlabs-mongodb marsuboss/yii2 rails/sass-rails jruby/activerecord-jdbc-adapter rvm/rvm-site StemerdinkIT/SensioFrameworkExtraBundle webpack/webpack sinatra/sinatra-contrib mpapis/-hooks Ocramius/phpcr-odm symfony/AsseticBundle symfony/Serializer cup-of-giraf/symfony dpb587/symfony-docs yyuu/homebrew spacetelescope/PyFITS scikit-learn/scikit-learnandrenarchy/krypy nipy/nitime michaelficarra/commonjs-everywhere alexmmm/json-schema PyTables/PyTables ryanb/nested_formhaml/html2haml bobthecow/psysh symfony/Validator ivanov/matplotlib numpy/numpy symfony-cmf/RoutingAutoBundle symfony-cmf/RoutingBundle sinatra/sinatra.github.com mpapis/bundler-unload pigoz/mpv plataformatec/devise fredrik-johansson/mpmath RubenVerborgh/N3.js ennova/postmarkdown SciTools/iris-code-generators dmascia/RemixChart webpack/less-loader twitter/bootstrap achempion/simple_form gildegoma/travis-cookbooks rkh/dpl facebook/ mne-tools/mne-python Ocramius/annotations sm/sm-libraries FriendsOfSymfony/FOSCommentBundlekriswallsmith/assetic karmi/tire MagLev/ henrikhodne/travis-assets symfony-cmf/SimpleCmsBundle EppO/rolify schmittjoh/JMSSecurityExtraBundleRapotOR/ConsoleBundle sympy/sympy-bot susestudio/studio-helpstatsmodels/statsmodels webpack/grunt-webpack Joshua-Anderson/dpl SciTools/cartopy joblib/joblib symfony-cmf/symfony-cmf-docs webpack/webpack.github.com wvengen/foodsoft yyuu/pyenv-virtualenv cloudhead/less.js karmi/tire-contrib henrikhodne-test/travis-test danielholmes/symfony-docs symfony/TwigBundle copiousfreetime/hitimes Joshua-Anderson/travis-core nipy/PySurfer symfony/FrameworkBundle symfony-cmf/CoreBundle gibiansky/IHaskell SciTools/biggus fussl/passme thetallgrassnet/thetallgrass.net eurodev/symfony mattwildig/sinatra.github.com tiennou/nested_form blakewatters/TransitionKit netdna/bootstrap-cdn matplotlib/matplotlibpydata/pandas SciTools/iris php/php-src spree/spree_social yabawock/bootstrap-sass-rails ernie/ransack alexaitken/guard-konacha lastguest/Twig-extensions gregkare/jekyll rtfd/readthedocs.org rafaelfranca/simple_form-bootstrap sympy/sympy magiclabs/alchemy_cms elia/activeadmin-mongoid travis-ci/travis-assets sympy/sympy-live JuliaStats/Distributions.jl flyboarder/AirOS sensiolabs/SensioFrameworkExtraBundle ianwhite/orm_adapter justinfrench/formtastic schmittjoh/JMSDebuggingBundle webpack/enhanced-require minrk/ipython_extensions JuliaLang/libuv SciTools/iris-sample-dataalphagov/panopticon cloudhead/lesscss.org symfony-cmf/ContentBundle symfony/Form symfony-cmf/Testing dmathieu/travis-web henrikhodne/travis-core cordoval/symfony-standard dart-lang/dartlang.org eugeneo/brackets JuliaLang/julialang.github.com jimhester/knitrBootstrap gordonslondon/FOSUserBundle travis-ci/travis-web alphagov/travel-advice-publisher MaxCDN/bootstrap-cdn codahale/bcrypt-ruby gregbell/arbre projecthydra/sufia autozimu/julia_zh_cn genemu/GenemuFormBundle ranmocy/guard-rails starsquare/Buzz bsmr-ruby/travis-boxes JuliaLang/ZMQ.jl tkf/--notebook WouterJ/symfony nex3/sass jm81/dm-devise travis-ci/travis-ci.github.com jlong/sass-twitter-bootstrap JuliaStats/DataArrays.jl mongodb/mongo-php-driver travis-ci/travis-cirossant/ipycache gotcha/ipdb yyuu/pyenv JuliaStats/GLM.jl uq-eresearch/aorra twitter/recessrkh/rack-protection chriseppstein/compass Kodowa/Light-Table-Playground alphagov/publisher loopj/-tokeninput travis-ci/docs-travis-ci-com SciTools/iris-test-data troopjs/troopjs-contrib-browser JuliaOpt/JuMP.jl leafo/scssphp henrikhodne/travis-cookbooks ipython/nbconvert troopjs/troopjs-net karmi/retire agrobbin/active_admin krlmlr/r-travis stevengj/PyCall.jl njx/brackets-bower twbs/bootstrap projectblacklight/blacklight alphagov/govuk_content_api swcarpentry/website travis-ci/travis-api travis-ci/blog-travis-ci-com JuliaLang/julia-ipython /MinkBrowserKitDriver ericam/susy travis-ci/travis-images xp-framework/rest alphagov/gds-api-adapters travis-ci/travis-build HarlanH/DataFrames.jl thomas-mcdonald/bootstrap-sasscablegram/guard-rails doctorjnupe/cookbook-elasticsearch JuliaStats/DataFrames.jl cfjedimaster/brackets-jshint xp-framework/xp-language todc/todc-bootstrap travis-ci/travis JuliaLang/Cairo.jl ipython/ipython-components alphagov/maslow rouge8/Font-Awesome FortAwesome/Font-Awesome alphagov/govuk_need_api Asquera/elasticsearch-rake-taskselasticsearch/elasticsearch adobe-research/theseus ipython-contrib/IPython-notebook-extensions sul-dlss/spotlight besanek/PresenterTester JuliaLang/IJulia.jl caarlos0/bootstrap-sass travis-ci/travis-worker ipython/ipython JuliaDSP/DSP.jl alphagov/rummager garethr/vagrantboxes-heroku JuliaLang/openlibm troopjs/troopjs-data ruflin/Elastica minrk/ipython adobe/brackets-updates mjgallag/meteor-bootstrap-3 twalpole/sass madebymany/sir-trevor-js travis-ci/travis-coretravis-ci/travis-cookbooks alphagov/frontend inkling/Subliminal ivanov/ipython troopjs/troopjs-utils lusis/chef-kibana adobe/brackets-shell xp-framework/xp-framework guard/listen zeromq/zeromq4-x JuliaLang/METADATA.jl marijnh/CodeMirror alphagov/whitehall ivanov/-ipython thekid/xp-experiments alphagov/signonotron2 sparklemotion/ swcarpentry/site xp-framework/xp-runners middleman/middleman-guides alphagov/design-principles porada/middleman-autoprefixer elasticsearch/kibana yakaz/elasticsearch-action-updatebyquery /padrino-framework swcarpentry/boot-camps IcecaveStudios/archer JuliaLang/juliadcjones/Gadfly.jl guard/guard travis-ci/travis-boxes adobe/brackets.io ipython/nbviewer s-mage/rush stevengj/PyPlot.jl adobe/brackets-registry xp-lang/ troopjs/troopjs-composer NUBIC/ncs_mdes zeromq/pyzmq craigcitro/r-travis thekid/stomp vtjnash/ODE.jl karbarcca/Datetime.jl troopjs/troopjs middleman/middleman jlong/sass-bootstrap jgm/pandoc-templates numpy/numpydoc xianyi/OpenBLAS troopjs/troopjs-requirejs YorickPeterse/oga swcarpentry/bc phpBB-Blog/phpBB-Blog-for-3.1 toivoh/Debug.jl adobe/brackets-app alphagov/feedback elasticsearch/elasticsearch-ruby msysgit/Git-Cheetah mikey179/xp-framework steveluscher/sass-brunch FabienPennequin/scalastic openlayers/openlayers.github.io lindahua/NumericExtensions.jl alphagov/govuk_frontend_toolkit thekid/xp-runners ivanov/IHaskell laktek/jQuery-Smart-Auto-Complete middleman/middleman-blog padrino/padrino-recipes jgm/pandoc alphagov/government-service-design-manual adobe/brackets-phonegap yakaz/elasticsearch-query-indices2416 travis-ci/dpl railsinstaller/railsinstaller-nix andrew/node-sass marcodaniel/mdframed jsdoc3/jsdoc3.github.com minrk/zguide andrioni/MPFI.jl JuliaLang/IterativeSolvers.jl xp-framework/rfc PaulKinlan/yeoman.io bryanwb/chef-ark JuliaLang/ODE.jl xp-framework/core openlayers/ol3 JuliaStats/StatsBase.jl adobe/brackets zjhiphop/troopjs-core padrino/padrino-docs troopjs/troopjs-browser middleman/middleman-sprockets Mange/roadie elasticsearch/cookbook-elasticsearch loladiro/REPL.jl xp-framework/remote BanzaiMan/travis-api zeromq/libzmq alphagov/business-support-finder msysgit/msysgit tanmaykm/DataFrames.jl alphagov/static alphagov/smart-answers gettalong/kramdown msysgit/git adobe/brackets-edge-web-fonts peterflynn/svg-preview embarkmobile/android-maven-example nose-devs/nose thehogfather/brackets-code-folding bhollis/middleman-sprockets nanoc/nanoc aaronbushnell/generator-tmproject travis-ci/travis-tasks lukesarnacki/travis-web gildegoma/travis-images roryk/ipython-cluster-helper pao/Monads.jl alphagov/slimmer troopjs/troopjs-core mojombo/jekyll tomaz/appledoc chapmanb/homebrew-cbl timholy/HDF5.jl Behat/Mink yeoman/yeoman-generator-list vangdfang/git marijnh/tern_for_vim troopjs/troopjs-bundle ilpaijin/generator-chromeapp distler/syntax arq5x/gemini le717/brackets-html-skeleton phlipper/chef-monit chapmanb/bcbio-nextgen IainNZ/MathProg.jl thekid/xp-contrib jekyll/help davidavdav/NamedArray thekid/xp-language alphagov/transformation-dashboard sleeper/generator-backbone hadley/devtools alphagov/ci-puppet troopjs/troopjs-jquery brainstorm/ipython-cluster-helper JuliaLang/DataStructures.jl hcatlin/sass-spec simonster/MAT.jl alphagov/prototyping jekyll/jekyll-import drublic/css-modal IcecaveStudios/woodhouse chapmanb/gemini marijnh/tern ngbp/ng-boilerplate lkcampbell/brackets-ruler geoadmin/mf-chsdi3 robotology/gazebo_yarp_plugins hcatlin/libsass porterjamesj/ipython camptocamp/c2cgeoportal simonster/DataFrames.jl adobe/CodeMirror2 frozenice-/yeoman-generator-list fgrehm/-lxc romanbsd/fast-stemmer Shopify/liquid jsdoc3/jsdoc chapmanb/cloudbiolinux camptocamp/cgxp robotology/yarp thekid/xp-framework imathis/octopress eztierney/tern xp-framework/xp-contrib alphagov/licence-finder openlayers/openlayers romanzenka/cloudbiolinux heuermh/cloudbiolinux peterflynn/brackets-commands-guide jekyll/jekyll Behat/MinkSelenium2Driver rosa-abf/vagrantboxes-heroku yeoman/yeoman.ionicksp/generator-bbb rstudio/shiny jts/sga gds-operations/puppet-graphite troopjs/troopjs-todos joshdmiller/ng-boilerplate marijnh/acorn hcatlin/sassc TwP/directory_watcher sleuthkit/sleuthkit libLAS/libLAS simonster/HDF5.jl github/pages-gem rstudio/ggvis geoadmin/mf-geoadmin3 ruby-/ruby-llvm sjackman/homebrew chapmanb/bcbio-nextgen-vm benlemasurier/stormfs fgrehm/bindler ivantsepp/jekyll dtao/safe_yaml SLaks/SLaks.Blog mcuadros/homebrew-hhvm intimidate/generator-webapp SLaks/WebEssentials2013 wking/email-issue-reply-testing oclint/oclint phusion/baseimage-docker mozilla/rhino brandon-fryslie/ember-rest.coffee tombh/deis-cookbook Behat/MinkSeleniumDriver Behat/MinkSahiDriver yeoman/yo openlayers/cla alphagov/planneralphagov/puppet-jenkins Homebrew/linuxbrew dotcloud/gordon tianon/dockerfilesmarijnh/tern_for_sublime net-ssh/net-ssh madskristensen/vswebessentials.com /angular.js defunkt/hub sjackman/abyss Behat/SahiClient yeoman/yeoman Homebrew/homebrew-boneyard Valloric/YouCompleteMe iml/homebrew-science WinRb/vagrant-windows nicksp/generator-angular zedshaw/mongrel2 ultravideo/kvazaar am11/WebEssentials2013 am11/CssSorter s-u/Cairo npm/npmjs.org adamv/homebrew-alt fgrehm/vagrant-cachier yeoman/generator-angular am11/zencoding coreos/coreos-vagrant saltstack/salty-vagrant madskristensen/CssSorter bhollis/maruku njh/redstore marijnh/CodeMirror2 Behat/MinkExtension SirVer/ultisnips hegemonic/catharsis swig/swig opdemand/deis anba/rhino cebe/markdown rabbit-shocker/rabbiter tianon/brew Supacoco/generators madskristensen/zencoding mojombo/github-flavored-markdown aik099/qa-tools mdevils/node-jscs vmg/redcarpet josegonzalez/homebrew-php angular-ui/ui-utils phoenixsong6/homebrew-php dotcloud/docker-registry kykyev/almond schisamo/vagrant-omnibus yeoman/grunt-usemin Homebrew/homebrew-scienceHomebrew/homebrew carsomyr/rbenv-bundler lovanwubing/r.js cambridge-healthcare/dockerfiles jlertle/generator-ember skivvies/angular.js alessandro277/homebrew-php RiotGames/buff-config kevin-smets/angular-bleed Behat/MinkZombieDriver stof/MinkBrowserKitDriver berkshelf/solve teropa/build-your-own- chevex/homebrew yeoman/generator-karma dotcloud/docker-py rolandwalker/homebrew-cask jawshooah/homebrew lra/mackup opdemand/deis-cookbook yeoman/generators palantir/tslint /camlp4 less/less.js chrmoritz/homebrew-binary npm/npm-registry-mock Parallels/vagrant-parallels madskristensen/WebEssentials2013 koraktor/steam-condenser-phpcmus/cmus josh/brew-pip Homebrew/homebrew-binaryzimbatm/homebrew-versions npm/read-package-json yeoman/generator lyridious/homebrew Homebrew/homebrew-dupes dotcloud/docker jrburke/require-cs berkshelf/vagrant-berkshelf schweikert/fping kr/pty kasperisager/vanilla-bootstrap angular/angular.dart jrburke/r.js angular-ui/bootstrap karma-runner/grunt-karma mroth/lolcommits Homebrew/homebrew-versions steeve/boot2docker mxcl/homebrew yanniks/linuxbrew RiotGames/ridley Sparks-Creative-Limited/angular.js npm/npm-www realworldocaml/examples mistydemeo/tigerbrew keithw/mosh jpetazzo/dind progrium/dokku mitchellh/vagrant angular-ui/angular-ui-docs angular-ui/ui-router heroku/heroku-buildpack-nodejs petermichaux/maria yeoman/generator-webapp ocamllabs/ocaml-ctypes scraperwiki/tang rodyhaddad/angular.js koraktor/steam-condenser-ruby yeoman/generator-backbone PromyLOPh/pianobar berkshelf/berkshelf /knockout avsm/mirage-tcpip jrburke/requirejs passy/generator-canjs ichuan/bower-angular-latest koraktor/steam-condenserBrewTestBot/homebrewprotomouse/homebrew-versions npm/read-installed Homebrew/homebrew-headonly jrburke/almond netj/homebrew-boneyard snkinard/steam-condenser-java requirejs/xrayquire RiotGames/berkshelf npm/npm drewwells/lolsrv boot2docker/boot2docker Homebrew/homebrew-games phinze/homebrew-cask stucki/docker-cyanogenmod progrium/buildstep getlantern/statshub getlantern/lantern-uihall5714/angular-ui-bootstrap nomiddlename/log4js-node ocaml/opam-repository dsheets/ocaml-sodium yeoman/generator-mocha getlantern/www.getlantern.org caskroom/homebrew-versions Incubaid/baardskeerder mitchellh/vagrant-rackspace ocaml/opam dotcloud/stackbrew progrium/pluginhook fengmk2/node-gyp noflo/noflo.github.io getlantern/laeproxy mletterle/rust-http mirage/mirage-www ocamllabs/rwo-comments RiotGames/berkshelf-api mitchellh/vagrant-aws Kroisse/rust-mustache sdegutis/zephyros avsm/ocaml-cohttp RiotGames/vagrant-berkshelf vojtajina/testacular karma-runner/karma JeremyLetang/rust-sfml elixir-lang/elixir-lang.github.com gkz/prelude-ls ocamllabs/opamfu SlexAxton/require-handlebars-plugin gruntjs/grunt-contrib-requirejs ForbesLindesay/component-website forresto/dataflow-noflo rlidwka/yapm koraktor/steam-condenser-java realworldocaml/scripts kevinwallace/qemu-dockerboot2docker/boot2docker-cli angular-ui/angular-ui.github.com dennisreimann/ioctocat pivotal/pivotal_workstation janestreet/bin_prot getlantern/lantern caskroom/homebrew-fonts dbashford/mimosa-server ivaynberg/select2 getlantern/lantern_aws rlane/glfw-rs tknerr/vagrant-plugin-bundler paulmillr/chokidar evanlucas/npm-registry-mock g2p/blocks jonnor/microflo dbashford/mimosa-copy AngryLawyer/rust-sdl2 samoht/opam-repository tmatilai/vagrant-proxyconf Nami-Doc/coco bflad/chef-docker adamfisk/LittleProxy kbhomes/google-music-mac mirage/mirage assaf/zombie pivotal-sprout/sprout dsheets/opam-repository kmcallister/servolosttime/steam-condenser-php samoht/htcaml dbashford/mimosa-combine michaelficarra/cscodegen satyr/coco ocaml/opam2web component/component dbashford/mimosa-coffeescript xforty/vagrant-drupaldotless-de/vagrant-vbguest noflo/fbp getlantern/lantern-common mzabaluev/grust poise/python angular-ui/ui-map paul90/Smallest-Federated-Wiki TooTallNate/node snrs/sonorous avsm/mirage-www dbashford/mimosajs.com mbrubeck/servo adridu59/rust-tuts pypa/pip pypa/virtualenv nicoulaj/vagrant-maven-plugin brunch/uglify-js-brunch OCamlPro/opam paylogic/pip-accel qqueue/html5chan mirage/opam-repository dbashford/logmimosa noflo/noflo-runtime-base getlantern/lantern-controller fredrikbonander/karma protobox/protobox angular-ui/angular-ui OCamlPro/opam2web vincent-hugot/iTeML mirage/mirage-tcpip dbashford/mimosa-lint gkz/LiveScript kmcallister/html5 /ansible clutchski/coffeelint crabtw/rust-bindgen component/builder2.js jroweboy/rust-mustache dbashford/mimosa-sass avsm/homebrew Constellation/escodegen Kroisse/rust-http avsm/mirage MatthewMueller/cheerio noflo/noflo iangreenleaf/brunch-with-chaplin graydon/rust-www avsm/melange dbashford/mimosa-require mozilla/rust genesis/ fb55/CSSselect BlueSpire/Durandal trustmaster/goflow brunch/brunch LearnBoost/socket.ioTooTallNate/node-gyp avsm/mirage-platform jck/uhdl fvdm/nodejs-youtube chris-morgan/rust-http dbashford/skelmimosa protobox/protobox-web kevva/image-min mirage/opam-repo-dev dbashford/mimosa-web-package component/builder.js paulmillr/brunch-with-chaplin michaelficarra/CoffeeScriptRedux dsc/coco jashkenas/coffee-script bjz/glfw-rs rust-lang/rfcs CocoaPods/Xcodeproj dbashford/mimosa-bower chris-morgan/ncurses-rs mreid-moz/s3funnel jfirebaugh/openstreetmap-website chaplinjs/chaplin Constellation/esmangle capdevc/flycheck OCamlPro/opam-repository aredridel/html5 dbashford/mimosa-less noflo/noflo-ui alexcrichton/rust sebcrozet/kiss3d avsm/ocaml-github osmlab/openstreetmap-website dbashford/mimosa Polymer/more-elements xforty/chef- fb55/htmlparser2 mapnik/mapnik-packaging ionrock/cachecontrol dbashford/mimosa-minify meemoo/dataflow gruntjs/grunt-contrib-imagemin tenerd/CoffeeScriptRedux isaacs/cluster-master wanderview/node-tap hhatto/autopep8 jtriley/scp.py MayhemYDG/4chan-x dmajda/pegjs zargony/rust-fuse dbashford/mimosa-client-jade-static boto/boto Polymer/HTMLImports bnoordhuis/node-buffertools CocoaPods/guides..org osmlab/editor-imagery-index MicahChalmer/rust-fuse CocoaPods/CocoaPods greasemonkey/greasemonkey SimonSapin/servo mozilla/servo CocoaPods/cocoapods-integration-specs dbashford/mimosa-live-reload janestreet/core bnoordhuis/node CocoaPods/cocoapods-downloader the-grid/the-graph OpenStreetMap/iD Polymer/docs gravitystorm/openstreetmap-carto dbashford/mimosa-server-reload rvagg/nan alexcrichton/libuv dbashford/mimosa-minify-css kchmck/vim-coffee-script humdedum/homebrew ContinuumIO/anaconda-ec2 noflo/dataflow-noflo documentcloud/backbone discoproject/odisco jtriley/StarCluster Polymer/projects rstacruz/js2coffee brson/rust-sdl EricDunsworth/theme-gcwu-fegc tmpvar/jsdomdbashford/mimosa-testem-require bestiejs/lodash Nami-Doc/LiveScript tUrG0n/nconf sfackler/rust-postgrescadencemarseille/rust-pcre tomhughes/openstreetmap-website mapbox/carto 4chan/4chan-JS mapnik/mapnik pnorman/openstreetmap-carto mapbox/tilemill jcrocholl/pep8 OpenUserJs/OpenUserJS.org indutny/libuv joyent/libuv AFNetworking/AFNetworking mapbox/millstone Polymer/PointerEventsPolymer/PointerGestures NV/CSSOM dbashford/mimosa-import-source gruntjs/grunt-contrib-coffee willthames/ansible nodejitsu/haibu-carapace pjackson28/theme-base noflo/noflo-runtime dbashford/mimosa-iced-coffeescript felixge/node-formidable mozilla-servo/rust-httpmozilla-servo/rust-geom CocoaPods/Core EricDunsworth/GCWeb Polymer/platform documentcloud/underscore arashpayan/appirater kanso/couchtypes LightTable/LightTable cjdelisle/cjdns marionettejs/backbone.marionette gruntjs/grunt-contrib-internal saghul/pyuv bnoordhuis/node-iconv zerebubuth/openstreetmap-cgimap developmentseed/bones systemed/iD mapbox/tilemill-win-launcher gruntjs/grunt-contrib-qunit mapnik/carto-parser mapbox/tilelive-mapnik GoogleCloudPlatform/gsutil bnoordhuis/libuv featurist/spawn-cmd joyent/node-website HipByte/motion-cocoapods hotosm/HDM-CartoCSS Polymer/ShadowDOM RestKit/RestKit EricDunsworth/theme-gc-intranet CocoaPods/Specs wet-boew/theme-base Polymer/todomvc gruntjs/grunt-contrib-handlebars mozilla-servo/rust-mozjs joyent/nodenode-inspector/node-inspectorlaverdet/node-fibers shama/grunt mapbox/tilelive.js Polymer/polymer isaacs/node-graceful-fs pjackson28/theme-ogpl gruntjs/grunt-contrib-uglify glaszig/SZTextView wyuenho/backgrid-filter gravitystorm/mapnik-reference gruntjs/grunt-contrib-copy georgethomas/node-restify shirayuki/openstreetmap-websiteopenstreetmap/openstreetmap-website JustinTulloss/zeromq.node mapnik/mapnik-reference wet-boew/bookmark pjackson28/GCWeb gruntjs/grunt-contrib-jst joyent/http-parser isaacs/libuv edgecase/ECSlidingViewController Polymer/CustomElements kanso/kanso mcavage/node-fast mapnik/node-mapnik gruntjs/grunt-contrib-jade Uncodin/bypass isaacs/read-package-json wet-boew/wet-boew-xsl jlaws/JLAddressBook davidbkemp/node-gyp plepe/pgmapcss malfaux/http-parser barmstro/web-animations-tools developmentseed/node-get jashkenas/backbone bnoguchi/hooks-js bajtos/libuv cowboy/grunt jwarkentin/node-monkey wet-boew/wet-boew noflo/noflo-gestures vtsvang/grunt-contrib-nodeunit mapnik/mapnik-support DieBuche/openstreetmap-website agnat/node_mdns andreacremaschi/SpatialDBKit jzaefferer/jquery-validation creationix/nvm isaacs/npm gruntjs/grunt-contrib pjackson28/theme-gc-intranet developmentseed/node-sqlite3 mapbox/wax wet-boew/theme-gc-intranet cowboy/node-exit developmentseed/node-blend gruntjs/gruntjs.com rogerwang/blink osmlab/openstreetmap-upcoming-features wet-boew/theme-ogpl Polymer/polymer-ui-elements substack/node-mkdirp trondn/libuv robertkowalski/osenv openstreetmap/mod_tile gruntjs/grunt-contrib-watchmongodb/node-mongodb-native d3/d3-geo-projection Polymer/labs wyuenho/backgrid jbcrail/Haywire meryn/normalize-package-data gruntjs/grunt mapbox/tm2 gruntjs/grunt-contrib-sass gruntjs/grunt-contrib-compass mbostock/d3 mbostock/smash mapbox/carmen wet-boew/GCWeb Polymer/toolkit-ui nodejitsu/node-http-proxy isaacs/npmconf benmills/robotskirt robertkowalski/normalize-package-data wet-boew/theme-gcwu-fegc gruntjs/grunt-contrib-clean mapbox/tilelive-vector vtsvang/grunt-contrib-uglify mikeal/request pjackson28/wet-boew

shama/grunt-hub yohanboniface/Leaflet.Storage wet-boew/wet-boew-styleguide shama/gaze v8/v8 isaacs/npm-www zcbenz/ d3/d3-plugins square/crossfilter the-grid/the-behavior rbranson/node-ffi rogerwang/chromium.src wpreul/node-fork NickQiZhu/dc.js jquery/jquerymobile.com scottgonzalez/grunt-wordpress gruntjs/grunt-initLearnBoost/mongoose KiNgMaR/libuv isaacs/npmjs.org jquery/jquery-mobile gruntjs/grunt-cli pjackson28/theme-gcwu-fegc Polymer/polymer-elements einaros/ws rogerwang/node- gruntjs/grunt-contrib-stylus ShaunK/node-soap jquery/api.jquerymobile.com mapbox/mapbox.js jquery/demos.jquerymobile.com shawnbot/aight nodejitsu/forever yohanboniface/django-leaflet-storage jmeas/grunt jquery/blog.jquery.com-theme timmywil/grunt-bowercopy Circadio/circadio-issues gruntjs/grunt-contrib-htmlmin seebees/node-1 vjeux/jDataView Leaflet/Leaflet jquery/jquery.org LearnBoost/stylus gruntjs/grunt-contrib-livereload visionmedia/express brianc/node-postgres rogerwang/WebKit_trimmed isaacs/npm-registry-mock wyuenho/backgrid-select-all johnmdonahue/npmjs.org isaacs/rimraf gruntjs/grunt-contrib-compress mbostock/topojson RedWolves/jquery-wp-content isaacs/nopt Wizcorp/distribute mapbox/markers.js gruntjs/grunt-contrib-yuidoc kartena/Proj4Leaflet

gruntjs/grunt-contrib-jshint whs/chromium.src jquery/web-base-template jquery/events.jquery.org naholyr/doggybag leobalter/jquery-wp-content senchalabs/connect rogerwang/node JamesMGreene/qunit-assert-html Leaflet/Leaflet.markercluster aheckmann/mquery isaacs/osenv isaacs/read-installed visionmedia/nib gruntjs/grunt-contrib-nodeunit isaacs/init-package-json davey3000/chromium jquery/jquery-wp-content yeoman/grunt-regarde jquery/jquery.com cowboy/node-findup-sync jshint/jshint visionmedia/expresso mapbox/node-sqlite3 gulpjs/gulp mbostock/us-atlas jquery/api.qunitjs.com eclipsesource/jshint-eclipse visionmedia/expressjs.com jquery/jquery-release jquery/plugins.jquery.com JamesMGreene/qunit zcbenz/node stereobooster/jshintrb ecomfe/edp-package Leaflet/Leaflet.draw robrich/orchestrator jquery/learn.jquery.com ctalkington/node-archiver tkellen/node-liftoff ielgnaw/edp-doctor springmeyer/node-pre-gyp jquery/2012-dev-summit Mithgol/node-pre-gyp cosm/jshintrb jquery/jquery jquery/qunitjs.com yyx990803/npm jquery/api.jquery.com jquery/qunit nodejitsu/forever-monitor bower/bower terinjokes/gulp-uglify gulpjs/gulp-util acdha/openseadragon caolan/async jshint/site douglasduteil/ui-utils mapbox/node-pre-gyp fraction/fraction ecomfe/edp jquery/testswarm scottgonzalez/jquery leobalter/qunitjs.comjzaefferer/node-testswarm strongloop/loopback bower/decompress-zip strongloop/strong-supervisor ecomfe/edp-doctor jquery/jquery-ui cujojs/rest jquery/contribute.jquery.org kof/node-qunit ecomfe/edp-webserver strongloop/loopback-datasource-jugglerstrongloop/loopback-workspace yatskevich/grunt-bower-task stream-utils/raw-body

ecomfe/edp-build jquery/grunt-jquery-content jquery/bugs.jquery.com strongloop/loopback-angular ecomfe/edp-lint jquery/sizzle jquery/api.jqueryui.com cowboy/wesbos gruntjs/grunt-lib-phantomjs jquery/qunit-reporter-junit ecomfe/edp-bcs bower/bower.github.io strongloop/strong-remoting ecomfe/edp-core ecomfe/edp-project jquery/jqueryui.com

strongloop/sls-sample-app ecomfe/edp-test kswedberg/grunt-jquery-content

yellapuhari/web-base-template

ecomfe/edp-codegen ecomfe/edpx-ub-ria bdowling/jquery-ui bower/registryObvious/phantomjs ecomfe/edp-add ielgnaw/edp jquery/download.jqueryui.com ecomfe/edp-minify

Figure 4: Ecosystems in the largest connected component of GitHub-hosted projects. Project names follow the pattern user/repository where user is the owner’s GitHub login and repository is the name of the project repository. of the project, the number of stars the project has, the size of the associated ecosystem, and the node’s degree. Each of these projects has a higher in-degree than out-degree with the exception of the mxcl/homebrew project. On the other hand, low-degree project nodes are four times as likely to be dependent on another project than they are to have a project depend on them. This shows that ecosystems are being formed around a central project with the other projects in the ecosystem mostly depending on that central project. This results

28 Table 7: Ecosystems in GitHub. Details of the most well-connected node in each ecosystem. Project Description Stars Ecosystem Size Degree (in,out) joyent/node Framework 39,373 10.08% 69 (53,16) symfony/symfony Framework 10,985 8.46% 93 (53,40) rails/rails Framework 29,744 7.92% 93 (65,28) JuliaLang/julia 5,531 6.74% 51 (35,16) rubygems/rubygems Package Manager 1,304 6.04% 22 (14,8) mxcl/homebrew Package Manager 13,723 3.94% 48 (21,27) zendframework/zf2 Framework 5,841 3.88% 72 (65,7) travis-ci/travis-ci Development Tool ( Platform) 3,693 3.50% 70 (54,16) wet-boew/wet-boew Framework 688 3.34% 19 (15,4) twbs/bootstrap Framework 41,828 3.29% 9 (9,0) dbashford/mimosa Development Tool (Browser development) 472 2.43% 25 (20,5) h5bp/html5-boilerplate Framework 31,926 2.37% 19 (15,4) mitchellh/vagrant Framework 9,274 2.10% 23 (15,8) libgit2/libgit2 Library 5,161 2.05% 20 (11,9) Behat/Mink Development Tool (Testing) 673 1.99% 13 (9,4) OCamlPro/opam Package Manager 118 1.89% 9 (8,1) basho/riak Database 2,520 1.83% 27 (18,9) Polymer/polymer Library 8,787 1.83% 16 (11,5) mapnik/mapnik Development Tool (Toolkit for developing mapping applications) 1,003 1.78% 20 (12,8) mozilla/rust Programming language 5,604 1.78% 36 (29,7) alphagov/static Other (GOV.UK static files/resources) 67 1.73% 13 (10,3) adobe/brackets Development Tool (code editor) 23,921 1.46% 26 (16,10) CocoaPods/CocoaPods Development Tool (dependency manager) 5,711 1.46% 14 (9,5) yeoman/yeoman Development Tool (web development tools) 7,246 1.46% 18 (13,5) angular/angular.js Framework 42,950 1.40% 12 (8,4) dotcloud/docker Development Tool (application container engine) 14,270 1.35% 24 (19,5) emberjs/ember.js Framework 14,185 1.29% 20 (12,8) owncloud/core Other (personal cloud storage tool) 3,222 1.19% 26 (13,13) typhoeus/typhoeus Library 2,465 1.19% 6 (4,2) facebook/hhvm Other () 11,506 1.08% 15 (10,5) celluloid/celluloid Framework 2,855 0.86% 9 (6,3) xp-framework/rfc Framework 0 0.86% 16 (14,2) rogerwang/node-webkit Framework 19,737 0.86% 16 (11,5) ecomfe/edp Development Tool (front-end development platform) 264 0.86% 18 (15,3) kennethreitz/requests Library 13,812 0.81% 13 (10,3) documentcloud/underscore Library 7,135 0.81% 6 (4,2) middleman/middleman Development Tool (website generator) 4,179 0.75% 8 (5,3) elasticsearch/elasticsearch Other (search and analytics tool) 10,700 0.70% 11 (11,0) chapmanb/bcbio-nextgen Other (RNA-seq analysis tool) 173 0.59% 10 (9,1) wp-cli/wp-cli Development Tool (command line interface for WordPress) 1,968 0.59% 13 (9,4) cucumber/cucumber Development Tool (Testing) 5,142 0.49% 7 (4,3) jsdoc3/jsdoc Development Tool (API documentation generator) 2,909 0.49% 6 (3,3) propelorm/Propel Development Tool (Object-Relational Mapping) 893 0.49% 7 (7,0) in a star pattern. The twbs/bootstrap ego network (Figure 5) clearly depicts this pattern within the graph.

Predominant type of ecosystems is software development support. Interest- ingly, nearly all of the ecosystems are centered around projects whose purpose

29 netdna/bootstrap-cdn yabawock/bootstrap-sass-rails

dart-lang/dartlang.org

MaxCDN/bootstrap-cdn

uq-eresearch/aorra twitter/recess twbs/bootstrap

todc/todc-bootstrap rouge8/Font-Awesome FortAwesome/Font-Awesome

mjgallag/meteor-bootstrap-3

Figure 5: twbs/bootstrap Ego Network. Portraying a sample star pattern in the network.

Table 8: Ecosystem Types. Nearly all support software development. Type Count Software Development Tool 14 Framework 13 Library 5 Package Manager 3 Programming Language 2 Database 1 Other 5 is to support software development, such as frameworks, libraries and program- ming languages. In fact, as shown in Table 8, of the 43 ecosystems, there are only 5 whose purpose is not to support software development. The 14 software development tools include a testing tool, a continuous integration platform, and an API documentation generator. The type of each ecosystem is also shown in Table 7. Ecosystems are interconnected. The graph in Figure 3 shows two types of communities that occur in GitHub-hosted projects, those that are part of the largest connected component and those that are isolated from the largest con-

30 nected component. The majority of project nodes, 10,484 or 57%, are involved in the largest connected component, indicating that many ecosystems are con- nected to each other across the projects in our Dependency Network. The next biggest connected component in the graph is only 65 nodes indicating that the ecosystems that are isolated are small and have not attracted public attention. Figure 4 displays the interconnected part of the network, and the connec- tions between the ecosystems are apparent. As an example, Figure 6 shows the rubygems/rubygems ego network, clearly depicting its connection to the rails/rails project. This is not surprising, since the rubygems project is a pack- age management framework for the Ruby programming language and rails/rails is a framework written in Ruby. There is a direct connection between the rubygems/rubygems and rails/rails nodes. In addition, there are projects, like carlhuda/bundler and airblade/paper trail, which connect the two projects.

Answer to RQ2: The Reference Coupling method can be used to identify and examine ecosystems. The predominant type of ecosystems on GitHub is software development support. Ecosystems tend to revolve around one central project and be interconnected to other ecosystems.

5.3.2. Investigation of Socio-Technical Alignment within the Ecosystems RQ3: Do the project owners’ and contributors’ social behaviours align with the technical dependencies? Project Owners: Table 9 shows strong, positive correlations between the technical dependencies and the social behaviour of the owners. Along with these strong correlations, Figure 7 shows a pronounced star pattern in the Owner Follows Network. This indicates that the project owners in an ecosystem tend to follow the owner of the central repository.

Project Contributors. As shown in Table 10, the social behaviour of project contributors does not align with the technical dependencies. This indicates

31 whit537/www.gittip.com

freerange/mocha rubygems/rubygems.org

jruby/warbler

bundler/bundler-features

jbarnette/isolate

oneclick/rubyinstaller thorin/redmine_ldap_sync

webbj74/puppet-fitbit

gds-operations/puppet-elasticsearch

bundler/bundler carlhuda/bundler ruby/rubyjruby/jruby

ms-ati/docile rubygems/guides rubygems/rubygems rubinius/rubinius

airblade/paper_trail rspec/rspec-rails

rails/railsheadius/thread_safe rdoc/rdoc globalize/globalize

wayneeseguin/rvm

Figure 6: rubygems/rubygems Ego Network. Portraying connections between ecosystems.

Table 9: Project Owners: correlations between technical dependencies and social behaviour. Pearson Correlation p-value Technical Dependencies and Following 0.91 <0.001 Technical Dependencies and Stars 0.79 <0.001 that, while the project owners seem to follow the right people and are aware of the right projects based on the technical dependencies that exist in the ecosys-

32 Figure 7: The Owner Follows Network, Gof .

Table 10: Project Contributors: correlations between technical dependencies and social be- haviour. Pearson Correlation p-value Technical Dependencies and Following 0.0002 0.98 Technical Dependencies and Stars 0.001 0.88 tem, the social behaviour of project contributors is not aligned with project dependencies. Figure 8 shows the Contributor Follows Network. As shown, the structure is quite different than the Dependency Network. Communities do not have one central project and the network is much more densely connected.

Answer to RQ3: The Reference Coupling method can be useful for other investigations of software ecosystems where identification of inter- project technical dependencies is needed. We investigated social behaviour in software ecosystems. We found that the project owners social behaviours

33 Figure 8: The Contributor Follows Network, Gcf .

do align with the technical dependencies, but the project contributors social behaviours do not align with the technical dependencies.

6. Discussion

The method we proposed in this paper, Reference Coupling, identifies cross- references to other projects within a project’s ecosystem in the comments made by developers on project artifacts like issues, commits, pull requests or work items. We showed that the method outputs are a valid conceptualization of technical dependencies by analyzing the content of these cross-references and comparing the cross-references to dependency relationships identified by the development team. Our method adds to the important, but scarce, research that leverages the social aspects of work within software ecosystems [13]. Reference Coupling de- tects technical dependencies that may not manifest themselves in source code by

34 identifying tasks, issues, pull requests, or commits that rely on another project, and, therefore, it can identify dependencies not identified by other methods. Our results show that the cross-references identify many different types of de- pendencies including duplicates, affecting, blocking, and related relationships. Our method is analogous to the logical coupling method that detects de- pendencies within a project proposed by Gall et al. [42] except at the ecosys- tem level. Where logical coupling detects dependencies when artifacts have been worked on together, our method detects dependencies when issues, pull requests or commits have been worked in conjunction with another project (as evidenced through user-specified cross-references). Thus, the dependencies es- tablished through our method are those that are logical. Limiting the detected dependencies to those that are logical is important when using those dependencies to identify ecosystems. Methods that detect technical dependencies between projects through analysis of code or configu- ration files may not be best suited for identifying software ecosystems. For example, when one project uses another project, it does not necessarily mean the two software projects are evolving together in the same environment, espe- cially when the dependency is to an established, off-the-shelf software package. Thus, identifying all relationships that manifest in the source code or configura- tion files may result in dependencies that are not important for the identification of ecosystems. For the GitHub projects, we used Reference Coupling to identify and ex- plore ecosystems. To do this, we used a popular community detection algo- rithm [15] on the dependency network, which identifies clusters of nodes densely connected by technical dependencies. These detected communities represent software ecosystems. Through analysis of the resulting ecosystems found in GitHub-hosted projects, we showed that the ecosystems are centered mostly around projects that support software development through developing frame- works and toolkits. The predominant structure of the ecosystems is a star where one central project is the hub of the ecosystem. Our method also allows for the identification of dependencies across ex-

35 tremely large sets of projects. For example, we ran our method on all public projects hosted on GitHub. Other methods that detect dependencies are limited to analyzing a given project or set of projects. Analyzing the dependencies of a popular project through its source code or configuration files to identify its ecosystem would not identify projects that rely on that project. We saw that most ecosystems across the GitHub-hosted projects are centered around one main project and many projects depend on that project without a reciprocal relationship. These relationships would be missed if only the dependencies of the main project were studied to identify its ecosystem. Since the ecosystems are not always well-defined, it would be impossible to know which other projects to consider for analysis. Thus, our method is better suited to identifying ecosys- tems since it is not limited in the number of projects it can analyze.

6.1. A Research Agenda The ability to easily identify technical dependencies between a large set of projects opens the door for many interesting avenues of research.

Socio-technical analysis. Studies that have attempted to study how commu- nication aligns with dependencies across projects have been limited to studying well-defined ecosystems where dependency information is publicized in some way. For example, dependencies can be made available through a project’s configuration files, build files, or through publicly available dependency specifi- cations [1, 7, 8, 9, 10, 23, 11]. Our method allows the identification of technical dependencies more broadly across projects and opens the door to continuing the study of socio-technical alignment across a larger set of projects and their stakeholders. In this study, we found that when dependencies exist between a pair of projects, the project owners tend to be following the owner of the other project. Conway was the first to describe the possibility of an alignment between social connections and technical dependencies in software engineering projects, com- monly referred to as Conway’s law [43]. The transparent nature of GitHub could encourage technical connections between projects by providing an awareness of

36 activity across projects. An interesting future research question is understand- ing how and when these technical dependencies and social connections came to exist. Did the social connections exist first and result in a technical dependency or did the technical dependency exist first and result in a social connection? If the social connections existed first, what was the driver behind the creation of the technical dependency? Perhaps, the awareness of the other project, enabled through GitHub’s notifications, was enough to spur a technical dependency in- dicating that GitHub’s transparency is changing the of OSS projects. These research questions could be investigated in future research. While the project owners’ social behaviours (following users and starring projects) aligned with the technical dependencies in our study, we did not wit- ness such an alignment for all project contributors. The follower network of project contributors showed that there were no clear central projects and com- munities were densely connected. This is in contrast to the technical dependency network. These results align with recent research that found that the reasons behind following others extends beyond project coordination needs [44]. Fu- ture work should investigate the usefulness of following others for coordination purposes. It is also worth studying in more detail the coordination needs of developers on OSS projects. Perhaps the mere existence of a technical dependency does not imply a coordination need, especially given the transparent environment of GitHub. Our previous work [45] begun this investigation, but coordination needs at the ecosystem level are also worthy of investigation.

Ecosystem emergence and evolution. The most prominent nodes in Fig- ure 4 are not always the most popular projects on GitHub when considering the number of stars each project has. In fact, the two projects with the most stars, angular/angular.js and twbs/bootstrap, have significantly smaller ecosys- tem size and lower degree than other projects. Future work can investigate how and why ecosystems emerge and why some projects become popular without growing a large ecosystem. Such a study could include a temporal analysis of

37 the composition of the ecosystem and density of connections together with a temporal analysis of project history information such as number of contribu- tors, forks, stars, etc. It would also be worth triangulating results with other information on important project events now commonly available through blogs and wikis. Such a study on the evolution of ecosystems can be a first step in understanding when and why projects accumulate an ecosystem. Previous research has examined ecosystem growth [46], but this analysis was focused on the size of the code base (measured in lines of code) of all projects within an organization. We propose for future research to expand this view by considering the growth of an ecosystem also based on the number of projects that the ecosystem comprises of when considering technical dependencies.

Ecosystem size and strength of connections and project success. On many open source projects, volunteers are crucial to project success as they rely on volunteers to submit new features and fix bugs. As a project accumulates more projects in its ecosystem, it is also likely to increase its contributions as devel- opers on dependent projects will be more likely to fix bugs that they encounter through their dependency. Future research could investigate this relationship to identify if the size of a project’s ecosystem is a good predictor of various project health and success metrics like the number of contributions it receives or the number of forks it has.

Automatic detection of inter-project dependencies. Another avenue for future research is creating tools to make developers aware of their technical dependen- cies outside of their own project. It is important for developers to know who they need to coordinate with across the ecosystem and to understand how their tasks fit into the big picture. For example, when a developer on GitHub creates a cross-reference to an issue on another project, it could be useful for the de- veloper to be made aware of other projects that have also cross-referenced that same issue. The issue may be causing problems in many projects and those project could minimize duplicated effort by being more aware of each other. Such a tool could increase awareness of coordination needs that extend outside

38 Figure 9: A proposed improvement to GitHub’s Flavored Markdown, which would not only create a link to the pull request, issue or commit referenced in the comment, but would also allow users to click to see what other comments in other projects have also made the same reference. project boundaries. For example, in Figure 9, we show a prototype we developed that improves GitHub’s Flavored Markdown by allowing users to click on the “Get Suggestions” button to see a list of other comments from other projects that have made a reference to the same pull request, issue or commit. If there is a large number of other projects that share the dependency, natural language processing techniques could be used to summarize the information. This will allow developers to easily review the details from the other projects that share the same dependency. Similarly, other tools, including those used at IBM, could be modified to help developers become aware of inter-project dependencies that are automatically detected by our Reference Coupling method. Reference Cou- pling could be used to automatically create dependency relationships between issues or work items in a project’s issue tracking system.

Automatic detection of inter-project dependencies can also be used by soft- ware engineering researchers in future unrelated studies. For example, a study of the effects of multi-tasking across multiple projects could include an analysis on the dependencies that exist between the projects to better understand the

39 reasons for multi-tasking or the amount of context switch that occurs. Many other software engineering research studies can benefit from an easy way to identify inter-project dependencies.

Automatic detection of ecosystems. With the ability to automatically detect individual technical dependencies between projects, it would also be useful to automatically detect and visualize ecosystems. Such a tool could help developers gain a better view of the ecosystem surrounding their project. It could also help researchers in future studies on software ecosystems.

6.2. Threats to Validity

One threat stems from our selection of the GHTorrent dataset to obtain the GitHub data. GHTorrent may not be a full copy of all GitHub data [32]. Nevertheless, it is a best-effort approach that has been widely accepted in the research community as evidenced by its inclusion as the dataset for the MSR 2013 Mining Challenge [47] and the many recent papers that utilize its data in their analysis. The GHTorrent dataset used in our analysis is from 2014. The structure and dynamics of ecosystems could have changed since this dataset was captured. Fu- ture studies should reevaluate ecosystems using more recent datasets. Such a study can also consider ecosystem evolution since our results provide a - shot that can be compared against. In this direction, Zhang et al. have used our Reference Coupling method to identify the GitHub projects in the Rails ecosystem [48]. They found that developers tend to make more cross-references to other projects over time. Our manual exploration of cross reference comments illustrates a variety of types of technical dependencies found in GitHub cross-references, but these results can not be generalized. While we achieved saturation in our results, our results could be impacted by selection bias. To mitigate this, we ensured an equal number of comments for each source (commit, issue, pull request) were included in our sample. However, the types of dependencies identified seem reasonable for any software project. Future work can continue this investigation

40 by examining the content of cross-reference comments across a wide range of projects and code hosting environments. Cross-references could appear more frequently when there are a higher the number of shared contributors between the two projects. We have not controlled for this since the presence of shared contributors does not diminish the existence of the dependencies. However, it could mean that dependencies on projects where there are no shared contributors are not easily found using this method. Future work should investigate this.

7. Conclusion

In this paper, we proposed a new method for detecting technical dependen- cies between projects, called Reference Coupling, which utilizes user-specified cross-references between projects. We validated this method on datasets from GitHub and IBM. We found that Reference Coupling identifies many dependen- cies which appear to be untracked by developers. The most common type of dependency that is untracked by developers but found with Reference Coupling is ‘Affected by defect’, which indicates that a task is impacted in some way by another defect. In these cases, especially, it could be useful for other develop- ers to be aware of other projects that are impacted by the same defect. This could allow these projects to coordinate their efforts in creating workarounds or negotiating completion of the defect removal. The Reference Coupling method enables tools to be developed that can help developers become aware of these types of shared dependencies. We also used our Reference Coupling method to identify ecosystems in GitHub-hosted projects by using an existing community detection algorithm to identify densely connected clusters of projects. Through an analysis of the identified ecosystems, we find that most ecosystems are centered around a single project. While small, unpopular ecosystems remain isolated, most ecosystems are interconnected. The isolated ecosystems tend to contain projects owned by the same GitHub user or organization. The popular ecosystems are mostly

41 centered around tools that support software development. Our Reference Coupling method opens the door for future research in soft- ware ecosystems including studying the socio-technical relationships, evolution, health and success of ecosystems.

Acknowledgment

This work was partly funded by NSERC Canada. Thanks to Sunny Wang and Diksha Sharma for their assistance in the manual content analysis.

References

[1] . F. Lungu, Reverse engineering software ecosystems, Ph.. thesis, Uni- versity of Lugano (2009).

[2] D. Cubranic, G. . Murphy, J. Singer, K. S. Booth, Hipikat: A project memory for software development, Transactions on Software Engineering 31 (6) (2005) 446–465.

[3] O. Franco-Bedoya, D. Ameller, D. Costal, X. Franch, Open source software ecosystems: A systematic mapping, Information & Software Technology 91 (2017) 160–185. doi:10.1016/j.infsof.2017.07.007. URL https://doi.org/10.1016/j.infsof.2017.07.007

[4] J. Ossher, S. Bajracharya, C. Lopes, Automated dependency resolution for open source software, in: Proceedings of 7th Working Conference on Mining Software Repositories, IEEE, 2010, pp. 130–140.

[5] M. Lungu, R. Robbes, M. Lanza, Recovering inter-project dependencies in software ecosystems, in: Proceedings of the International Conference on Automated Software Engineering, ACM, 2010, pp. 309–312.

[6] J. Businge, A. Serebrenik, M. van den Brand, Survival of Eclipse third- party plug-ins, in: Proceedings of 28th International Conference on Soft- ware Maintenance, IEEE, 2012, pp. 368–377.

42 [7] F. W. Santana, C. M. L. Werner, Towards the analysis of software projects dependencies: An exploratory visual study of software ecosystems., in: Pro- ceedings of International Workshop on Software Ecosystems, Citeseer, 2013, pp. 7–18.

[8] J. M. Gonzalez-Barahona, G. Robles, M. Michlmayr, J. J. Amor, D. M. German, -level software evolution: A case study of a large software compilation, Empirical Software Engineering 14 (3) (2009) 262–285.

[9] G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella, How the apache community upgrades dependencies: An evolutionary study, Empir- Software Engineering (2014) 1–43.

[10] D. M. German, J. M. Gonzalez-Barahona, G. Robles, A model to under- stand the building and running inter-dependencies of software, in: Pro- ceedings of 14th Working Conference on Reverse Engineering, IEEE, 2007, pp. 140–149.

[11] S. Raemaekers, A. van Deursen, J. Visser, Measuring software library stability through historical version analysis, in: Software Maintenance (ICSM), 2012 28th IEEE International Conference on, IEEE, 2012, pp. 378–387.

[12] A. Mockus, Amassing and indexing a large sample of version control sys- tems: Towards the census of public source code history, in: Proceedings of 6th Working Conference on Mining Software Repositories, IEEE, 2009, pp. 11–20.

[13] T. Mens, M. Goeminne, Analysing the evolution of social aspects of open source software ecosystems, in: S. Jansen, J. Bosch, P. R. J. Campbell, F. Ahmed (Eds.), Proceedings of the Third International Workshop on Software Ecosystems, Brussels, Belgium, June 7th, 2011, Vol. 746 of CEUR Workshop Proceedings, CEUR-WS.org, 2011, pp. 1–14. URL http://ceur-ws.org/Vol-746/IWSECO2011-1-InvitedPaper-MensGoeminne.pdf

43 [14] T. Mens, B. Adams, J. Marsan, Towards an interdisciplinary, socio- technical analysis of software ecosystems health, in: Proceedings of the 16th edition of the BElgian-NEtherlands software eVOLution symposium, Antwerp, Belgium, December 4-5, 2017., 2017, pp. 7–9. URL http://ceur-ws.org/Vol-2047/BENEVOL 2017 paper 2.pdf

[15] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment 2008 (10) (2008) P10008.

[16] K. Blincoe, F. Harrison, D. Damian, Ecosystems in GitHub and a method for ecosystem identification using reference coupling, in: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, 2015, pp. 202–211.

[17] J. Bosch, From software product lines to software ecosystems, in: Soft- ware Product Lines, 13th International Conference, SPLC 2009, San Fran- cisco, California, USA, August 24-28, 2009, Proceedings, 2009, pp. 111–119. doi:10.1145/1753235.1753251. URL http://doi.acm.org/10.1145/1753235.1753251

[18] S. Jansen, A. Finkelstein, S. Brinkkemper, A sense of community: A research agenda for software ecosystems, in: 31st International Confer- ence on Software Engineering, ICSE 2009, May 16-24, 2009, Vancou- ver, Canada, Companion Volume, 2009, pp. 187–190. doi:10.1109/ICSE- COMPANION.2009.5070978. URL https://doi.org/10.1109/ICSE-COMPANION.2009.5070978

[19] K. Manikas, Supporting the evolution of research in software ecosystems: Reviewing the empirical literature, in: Software Business - 7th Interna- tional Conference, ICSOB 2016, Ljubljana, Slovenia, June 13-14, 2016, Proceedings, 2016, pp. 63–78. doi:10.1007/978-3-319-40515-5 5.

[20] M. Lungu, Towards reverse engineering software ecosystems, in: 24th IEEE International Conference on Software Maintenance (ICSM 2008),

44 September 28 - October 4, 2008, Beijing, China, 2008, pp. 428–431. doi:10.1109/ICSM.2008.4658096. URL https://doi.org/10.1109/ICSM.2008.4658096

[21] M. Goeminne, T. Mens, A framework for analysing and visualising open source software ecosystems, in: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), Antwerp, Belgium, September 20-21, 2010., 2010, pp. 42–47. doi:10.1145/1862372.1862384.

[22] D. Dhungana, I. Groher, E. Schludermann, S. Biffl, Software ecosystems vs. natural ecosystems: Learning from the ingenious mind of nature, in: Pro- ceedings of the 4th European Conference on Software Architecture: Com- panion Volume, ACM, 2010, pp. 96–102.

[23] M. Syeed, K. M. Hansen, I. Hammouda, K. Manikas, Socio-technical con- gruence in the Ruby ecosystem, in: Proceedings of The International Sym- posium on Open Collaboration, ACM, 2014, p. 2.

[24] J. Kabbedijk, S. Jansen, Steering insight: An exploration of the Ruby software ecosystem, in: Software Business, Springer, 2011, pp. 44–55.

[25] S. Jansen, Measuring the health of open source software ecosystems: Be- yond the scope of project health, Information & Software Technology 56 (11) (2014) 1508–1519. doi:10.1016/j.infsof.2014.04.006.

[26] M. Lungu, J. Malnati, M. Lanza, Visualizing Gnome with the small project observatory, in: M. W. Godfrey, J. Whitehead (Eds.), Proceedings of the 6th International Working Conference on Mining Software Reposito- ries, MSR 2009 (Co-located with ICSE), Vancouver, BC, Canada, May 16-17, 2009, Proceedings, IEEE Computer Society, 2009, pp. 103–106. doi:10.1109/MSR.2009.5069487. URL https://doi.org/10.1109/MSR.2009.5069487

45 [27] K. Manikas, K. M. Hansen, Software ecosystems–a systematic literature review, Journal of Systems and Software 86 (5) (2013) 1294–1306.

[28] S. Syed, S. Jansen, On clusters in open source ecosystems, in: Proceedings of International Workshop on Software Ecosystems, Citeseer, 2013, pp. 19– 32.

[29] Y. Yu, G. Yin, H. Wang, T. Wang, Exploring the patterns of social behavior in GitHub, in: Proceedings of the 1st International Workshop on Crowd- based Software Development Methods and Technologies, ACM, 2014, pp. 31–36.

[30] F. Thung, T. F. Bissyand´e,D. Lo, L. Jiang, Network structure of social coding in GitHub, in: Proceedings of 17th European Conference on Soft- ware Maintenance and Reengineering, IEEE, 2013, pp. 323–326.

[31] J. MacFarlane, Commonmark spec. 2017, URL http://spec. commonmark. org/0.25.

[32] G. Gousios, D. Spinellis, GHTorrent: GitHub’s data from a firehose, in: Proceedings of the 9th Working Conference on Mining Software Reposito- ries, IEEE, 2012, pp. 12–21.

[33] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, D. Damian, The promises and perils of mining GitHub, in: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, 2014, pp. 92–101.

[34] Oracle, Java platform standard edition 7 api specification: Package java.util.regex, https://docs.oracle.com/javase/7/docs/api/java/util/regex/package-summary.html, accessed 29 November 2016.

[35] K. A. Neuendorf, The Content Analysis Guidebook, Sage, 2016.

46 [36] J. Corbin, A. Strauss, Basics of Qualitative Research: Techniques and Pro- cedures for Developing Grounded Theory, Sage, 2008.

[37] D. Freelon, Recal oir: Ordinal, interval, and ratio intercoder reliability as a web service., International Journal of Internet Science 8 (1).

[38] M. E. Newman, Modularity and community structure in networks, Pro- ceedings of the National Academy of Sciences 103 (23) (2006) 8577–8582.

[39] M. Bastian, S. Heymann, M. Jacomy, et al., Gephi: an open source soft- ware for exploring and manipulating networks., Proceedings of Interna- tional AAAI Conference on Web and Social Media 8 (2009) 361–362.

[40] M. Molloy, B. Reed, Critical subgraphs of a random graph, The Electronic Journal of Combinatorics 6 (R35) (1999) 2.

[41] J. M. Kleinberg, Authoritative sources in a hyperlinked environment, Jour- nal of the ACM 46 (5) (1999) 604632.

[42] H. Gall, K. Hajek, M. Jazayeri, Detection of logical coupling based on product release history, in: Proceedings of International Conference on Software Maintenance, IEEE, 1998, pp. 190–198.

[43] M. E. Conway, How do committees invent, Datamation 14 (4) (1968) 28–31.

[44] K. Blincoe, D. Damian, Implicit coordination supported by GitHub: A case study of the rails oss project, in: Open Source Systems: Adoption and Impact, Springer, 2015, pp. 35–44.

[45] K. Blincoe, G. Valetto, D. Damian, Do all task dependencies require co- ordination? the role of task properties in identifying critical coordination needs in software projects, in: Proceedings of the 9th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM, 2013, pp. 213–223.

47 [46] J. W. Paulson, G. Succi, A. Eberlein, An empirical study of open-source and closed-source software products, IEEE Transactions on Software Engi- neering 30 (4) (2004) 246–256. doi:10.1109/TSE.2004.1274044.

[47] G. Gousios, The GHTorent dataset and tool suite, in: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, 2013, pp. 233–236.

[48] Y. Zhang, Y. Yu, H. Wang, B. Vasilescu, V. Filkov, Within-ecosystem issue linking: a large-scale study of rails, in: Proceedings of the 7th International Workshop on Software Mining, ACM, 2018, pp. 12–19.

48