Using Centrality Measures to Identify Key Members of an Innovation Collaboration Network
Total Page:16
File Type:pdf, Size:1020Kb
Using Centrality Measures to Identify Key Members of an Innovation Collaboration Network John Cardente∗ Group 43 1 Introduction The rest of this paper is organized as follows. Sec- tion 3 reviews relevant prior work. Section 4 de- scribes the network dataset and graph models used Maintaining an innovative culture is critical to the on- to represent it. Section 5 compares the network’s at- going success and growth of an established company. tributes to other collaboration networks analyzed in It is the key to overcoming threats from market con- the literature. Section 6 presents centrality measures dition changes, competitors, and disruptive technolo- that may be effective in identifying key innovators. gies. A critical component to building an innovative Section 7 evaluates the effectiveness of these central- culture is identifying key innovators and providing ity measures in identifying key participants in the them with the support needed to improve their skills EMC collaboration network. Section 8 summarizes and expand their influence [5]. Identifying these in- the findings. novators can be a significant challenge, especially in large organizations. Prior research provides evidence that leading scien- 2 Previous Work tists, innovators, etc, occupy structurally significant locations in collaboration network graphs. If true, then such high-value individuals should be discover- Coulon [6] provides a comprehensive survey of re- able using network analysis techniques. Centrality search literature using network analysis techniques measures are frequently used to describe a node’s rel- to examine innovation theories and frameworks. He ative importance within a network graph. This pa- observes the frequent use of degree, closeness, and be- per’s primary hypothesis is that centrality measures tweenness centrality measures to describe the struc- can be used to identify important participants within ture of innovation networks. In [1] [2], Newman collaboration networks. examines scientific paper author collaboration net- works and introduces a weighted closeness metric that To test this hypothesis, this paper examines a col- appears to effectively identify prominent scientists. laboration network of participants in innovation con- Krebs [12] uses centrality measures to identify influ- tests held by EMC Corporation in 2010, 2011, and ential members of the 911 terrorist association net- 2012. These annual contests invite employees to sub- work. Fleming [15] highlights the importance of “bro- mit creative ideas, individually or in teams, that solve kers” and “gatekeepers” with particular structural at- difficult technical problems, create new or improved tributes in innovation networks. Similarly, Whelan products, or increase the company’s operational effi- [5] discusses the importance of “scouts” and “connec- ciency. They attract the company’s leading innova- tors”. Ahuja [17] examines a collaboration network tors and promote collaborations that span engineer- of patent co-inventors and finds that large number ing teams, business units, and geographic locations. of nodes with low cluster coefficients, called “span- If effective, using the collaboration network and cen- ners” or “brokers”, do not increase an organization’s trality measures to discover top innovators could be a innovative productivity. Various papers [19] [14] [16] valuable tool for managing the company’s talent and examine how small-word attributes contribute to the culture. innovative capacity of inventor and artistic networks. In [3] [4], Freeman presents the fundamental central- ∗[email protected] ity measures and interprets them in the context of 1 human social networks. White and Smyth [18] pro- Figure 1: Example undirected collabora- vide algorithms for determining relative node impor- tion graph. Nodes represent contest partic- tance in networks. Borgatti [10] identifies two classes ipants. Edges represent collaborations. Ex- of key network players, maximally connected and ar- ample node and edge weights indicate the number of contest entries and collabora- ticulation nodes, and discusses methods for select- tions respectively. ing the top k nodes of each. Oritz-Arroyo [8] builds on [10] and presents the use of entropy measures for 2 identifying sets of key nodes. Poulin [13] presents a 5 2 cumulative nomination centrality measure for multi- 3 component networks. Everett [7] presents centrality measures for network sub-groups and 2-mode net- 3 3 works. 3 3 3 Data and Models edge weights are provided in Section 6. The clique This paper analyzes data collected from the 2010, collaboration network has a structure similar to the 2011, and 2012 EMC Innovation Contests. It was ex- participant network. tracted from the database back-end of the web-portal used by employees to enter contest submissions. The data specifies the employees responsible for each con- test entry. This analysis only considers employees 4 Network Attributes that collaborated with one or more other employees on at least one entry. The rational being that prolific Table 1 provides summary statistics for the innova- lone innovators are much easier to identify then those tion contest data. embedded within a collaboration network. Two collaboration graphs are created from this data. Table 1: Innovation network summary The first is a single edged undirected graph represent- statistics. ing the collaborative relationships between individual Statistic Value contest participants. Participants are represented as Participants 3185 nodes. Two nodes are connected if the associated Collaborators 1822 participants submitted at least one contest entry to- Submissions 5317 gether. Node and edge weights are optionally used to Group Submissions 1275 reflect importance. Detailed discussions of node and Collaborations 16668 edge weights are provided in Section 6. Figure 1 is a Placing Participants 408 simple illustration of the collaboration network. Placing Collaborators 372 The second graph is a single edged undirected graph between the maximal cliques in the participant graph. The data in Table 1 indicates that 57% of contest It represents the collaborative relationships between participants collaborated with others, however, only teams of innovators. Since teams often participate in 24% of the submissions were from teams. The num- the contests as a single unit, this graph is essentially a ber of collaborations is large but counts each collab- dimensionally reduced form of the first graph without oration between participants individually (i.e. non- the edges between team members. Two cliques are unique collaborations). Of the 3185 participants, 408 connected if they have at least one member in com- placed in a contest by either winning or being a final- mon. While community detection techniques often ist for an award. The majority of those winners and require larger overlaps between cliques [24] to form an finalists, 91:2% , participated in teams. edge, a single-node overlap threshold is used in this case to capture relationships involving two-member Table 2 lists the attributes of an unweighted single teams. Node and edge weights are optionally used to edged undirected graph created from the participant reflect importance. Detailed discussions of node and collaborations. 2 Table 2: Participant collaboration graph scores suggest the possibility of structurally impor- attributes. tant nodes. Attribute Value Tables 4 and 5 provide the network attributes for the Nodes 1822 clique collaboration network. These attributes indi- Edges 5590 cate that the clique collaboration network is also a Avg. Degree 6.1361 scale-free, small-world network similar to other col- Diameter 17 laboration networks. Cluster Coef 0.8357 Average Path Length 5.9198 Table 4: Clique collaboration graph at- Components 259 tributes. Giant Component Size 868 Attribute Value Next Largest Component Size 38 Placing Nodes in GC 269 Nodes 637 Degree Distribution α 1.78 Edges 3606 Avg. Degree 11.3218 Diameter 16 The diameter reflects the longest path across all com- Cluster Coef 0.7044 ponents and ignores missing paths between uncon- Average Path Length 5.3102 nected nodes. The average path length also ignores Components 64 missing paths. Giant Component Size 439 Next Largest Component Size 38 The innovation network has attributes similar to Placing Nodes in GC 195 other collaboration networks [1] [2]. It contains a gi- Degree Distribution α 1.63 ant component containing 48% of all nodes and 72% of the placing collaborating participants. The next Table 5: Clique collaboration graph cen- largest component is substantially smaller, approxi- tralities. mately 2%. The average path length, 5:92, is short relative to the diameter, 17. The clustering coeffi- Centrality Value cient is high, 0:84. The network’s degree distribution Degree 0.0910 follows a power-law distribution with α ≈ 1:78. To- Degree (GC only) 0.1230 gether, these attributes indicate that the participant Closeness (GC only) 0.1950 collaboration network is a scale-free, small-world net- Betweenness (GC only) 0.4390 work suitable for analysis in the same manners as other collaboration networks. Table 3 provides the participant collaboration graph’s global centrality scores. 5 Centrality Measures Table 3: Participant collaboration graph centralities. 5.1 Definitions Centrality Value Let G be an undirected single-edged graph comprised Degree 0.0184 of the sets N nodes and E edges. A is an adjacency Degree (GC only) 0.0606 matrix such that aij = 1 if eij 2 E and 0 otherwise. Closeness (GC only) 0.1895 Let W be an edge weight matrix such that wij > 0 Betweenness (GC only) 0.4619 if aij = 1. Ti is the set of reachable nodes from node i. The shortest distance between two nodes fi; jg is In all cases, a value close to 1 indicates that the net- given by d(i; j). The shortest path between two nodes work is tightly organized around a relatively small fi; jg is represented as Sij. The number of shortest number of nodes with high values of the associ- paths between two nodes fi; jg is σij. The number of ated centrality metric. Although none of the values shortest paths between two nodes fi; jg that include are very close to 1, the closeness and betweenness node u is σij(u).