Using Measures to Identify Key Members of an Innovation Collaboration Network

John Cardente∗ Group 43

1 Introduction The rest of this paper is organized as follows. Sec- tion 3 reviews relevant prior work. Section 4 de- scribes the network dataset and graph models used Maintaining an innovative culture is critical to the on- to represent it. Section 5 compares the network’s at- going success and growth of an established company. tributes to other collaboration networks analyzed in It is the key to overcoming threats from market con- the literature. Section 6 presents centrality measures dition changes, competitors, and disruptive technolo- that may be effective in identifying key innovators. gies. A critical component to building an innovative Section 7 evaluates the effectiveness of these central- culture is identifying key innovators and providing ity measures in identifying key participants in the them with the support needed to improve their skills EMC collaboration network. Section 8 summarizes and expand their influence [5]. Identifying these in- the findings. novators can be a significant challenge, especially in large organizations. Prior research provides evidence that leading scien- 2 Previous Work tists, innovators, etc, occupy structurally significant locations in collaboration network graphs. If true, then such high-value individuals should be discover- Coulon [6] provides a comprehensive survey of re- able using network analysis techniques. Centrality search literature using network analysis techniques measures are frequently used to describe a node’s rel- to examine innovation theories and frameworks. He ative importance within a network graph. This pa- observes the frequent use of degree, closeness, and be- per’s primary hypothesis is that centrality measures tweenness centrality measures to describe the struc- can be used to identify important participants within ture of innovation networks. In [1] [2], Newman collaboration networks. examines scientific paper author collaboration net- works and introduces a weighted closeness metric that To test this hypothesis, this paper examines a col- appears to effectively identify prominent scientists. laboration network of participants in innovation con- Krebs [12] uses centrality measures to identify influ- tests held by EMC Corporation in 2010, 2011, and ential members of the 911 terrorist association net- 2012. These annual contests invite employees to sub- work. Fleming [15] highlights the importance of “bro- mit creative ideas, individually or in teams, that solve kers” and “gatekeepers” with particular structural at- difficult technical problems, create new or improved tributes in innovation networks. Similarly, Whelan products, or increase the company’s operational effi- [5] discusses the importance of “scouts” and “connec- ciency. They attract the company’s leading innova- tors”. Ahuja [17] examines a collaboration network tors and promote collaborations that span engineer- of patent co-inventors and finds that large number ing teams, business units, and geographic locations. of nodes with low cluster coefficients, called “span- If effective, using the collaboration network and cen- ners” or “brokers”, do not increase an organization’s trality measures to discover top innovators could be a innovative productivity. Various papers [19] [14] [16] valuable tool for managing the company’s talent and examine how small-word attributes contribute to the culture. innovative capacity of inventor and artistic networks. In [3] [4], Freeman presents the fundamental central- ∗[email protected] ity measures and interprets them in the context of

1 human social networks. White and Smyth [18] pro- Figure 1: Example undirected collabora- vide algorithms for determining relative node impor- tion graph. Nodes represent contest partic- tance in networks. Borgatti [10] identifies two classes ipants. Edges represent collaborations. Ex- of key network players, maximally connected and ar- ample node and edge weights indicate the number of contest entries and collabora- ticulation nodes, and discusses methods for select- tions respectively. ing the top k nodes of each. Oritz-Arroyo [8] builds on [10] and presents the use of entropy measures for 2 identifying sets of key nodes. Poulin [13] presents a 5 2 cumulative nomination centrality measure for multi- 3 component networks. Everett [7] presents centrality measures for network sub-groups and 2-mode net- 3 3 works. 3

3 3 Data and Models edge weights are provided in Section 6. The This paper analyzes data collected from the 2010, collaboration network has a structure similar to the 2011, and 2012 EMC Innovation Contests. It was ex- participant network. tracted from the database back-end of the web-portal used by employees to enter contest submissions. The data specifies the employees responsible for each con- test entry. This analysis only considers employees 4 Network Attributes that collaborated with one or more other employees on at least one entry. The rational being that prolific Table 1 provides summary statistics for the innova- lone innovators are much easier to identify then those tion contest data. embedded within a collaboration network. Two collaboration graphs are created from this data. Table 1: Innovation network summary The first is a single edged undirected graph represent- statistics. ing the collaborative relationships between individual Statistic Value contest participants. Participants are represented as Participants 3185 nodes. Two nodes are connected if the associated Collaborators 1822 participants submitted at least one contest entry to- Submissions 5317 gether. Node and edge weights are optionally used to Group Submissions 1275 reflect importance. Detailed discussions of node and Collaborations 16668 edge weights are provided in Section 6. Figure 1 is a Placing Participants 408 simple illustration of the collaboration network. Placing Collaborators 372 The second graph is a single edged undirected graph between the maximal in the participant graph. The data in Table 1 indicates that 57% of contest It represents the collaborative relationships between participants collaborated with others, however, only teams of innovators. Since teams often participate in 24% of the submissions were from teams. The num- the contests as a single unit, this graph is essentially a ber of collaborations is large but counts each collab- dimensionally reduced form of the first graph without oration between participants individually (i.e. non- the edges between team members. Two cliques are unique collaborations). Of the 3185 participants, 408 connected if they have at least one member in com- placed in a contest by either winning or being a final- mon. While community detection techniques often ist for an award. The majority of those winners and require larger overlaps between cliques [24] to form an finalists, 91.2% , participated in teams. edge, a single-node overlap threshold is used in this case to capture relationships involving two-member Table 2 lists the attributes of an unweighted single teams. Node and edge weights are optionally used to edged undirected graph created from the participant reflect importance. Detailed discussions of node and collaborations.

2 Table 2: Participant collaboration graph scores suggest the possibility of structurally impor- attributes. tant nodes. Attribute Value Tables 4 and 5 provide the network attributes for the Nodes 1822 clique collaboration network. These attributes indi- Edges 5590 cate that the clique collaboration network is also a Avg. Degree 6.1361 scale-free, small-world network similar to other col- Diameter 17 laboration networks. Cluster Coef 0.8357 Average Path Length 5.9198 Table 4: Clique collaboration graph at- Components 259 tributes. Giant Component Size 868 Attribute Value Next Largest Component Size 38 Placing Nodes in GC 269 Nodes 637 α 1.78 Edges 3606 Avg. Degree 11.3218 Diameter 16 The diameter reflects the longest path across all com- Cluster Coef 0.7044 ponents and ignores missing paths between uncon- Average Path Length 5.3102 nected nodes. The average path length also ignores Components 64 missing paths. Giant Component Size 439 Next Largest Component Size 38 The innovation network has attributes similar to Placing Nodes in GC 195 other collaboration networks [1] [2]. It contains a gi- Degree Distribution α 1.63 ant component containing 48% of all nodes and 72% of the placing collaborating participants. The next Table 5: Clique collaboration graph cen- largest component is substantially smaller, approxi- tralities. mately 2%. The average path length, 5.92, is short relative to the diameter, 17. The clustering coeffi- Centrality Value cient is high, 0.84. The network’s degree distribution Degree 0.0910 follows a power-law distribution with α ≈ 1.78. To- Degree (GC only) 0.1230 gether, these attributes indicate that the participant Closeness (GC only) 0.1950 collaboration network is a scale-free, small-world net- Betweenness (GC only) 0.4390 work suitable for analysis in the same manners as other collaboration networks. Table 3 provides the participant collaboration graph’s global centrality scores. 5 Centrality Measures

Table 3: Participant collaboration graph . 5.1 Definitions

Centrality Value Let G be an undirected single-edged graph comprised Degree 0.0184 of the sets N nodes and E edges. A is an adjacency Degree (GC only) 0.0606 matrix such that aij = 1 if eij ∈ E and 0 otherwise. Closeness (GC only) 0.1895 Let W be an edge weight matrix such that wij > 0 Betweenness (GC only) 0.4619 if aij = 1. Ti is the set of reachable nodes from node i. The shortest between two nodes {i, j} is In all cases, a value close to 1 indicates that the net- given by d(i, j). The shortest path between two nodes work is tightly organized around a relatively small {i, j} is represented as Sij. The number of shortest number of nodes with high values of the associ- paths between two nodes {i, j} is σij. The number of ated centrality metric. Although none of the values shortest paths between two nodes {i, j} that include are very close to 1, the closeness and betweenness node u is σij(u). K represents the set of entries across

3 the three innovation contests. Let Ki be the number This weight is inversely proportional to team size and of entries by node i. reduces the significance of collaborations with many participants. The accumulated weight of an edge is the sum of its weights across all entries. 5.2 Degree X w = χkχkw Degree centrality is a function of a node’s number of ij i j k k∈K edges. It reflects the node’s communication poten- tial within the network [3]. For this investigation, a The accumulated edge weight is proportional to the node’s Degree centrality is defined as, strength of the associated collaborative relationship - larger values indicate stronger relationships. For X Di = aij distance based metrics, the inverse of this weight can j6=i be used to yield a high cost for weak relationships. The following equations use this weight to define 5.3 Closeness weighted variants of Degree, Closeness, and Between- ness centralities. Closeness centrality is a function of a node’s average X distance to all other nodes in the network. It repre- DW i = aijwij sents the node’s level of communication independence j∈N [3]. In this paper, closeness is defined as, 1 CW = i P P 1 + P |N| j∈T e∈Sij we u3T 1 σw (i) Ci = P P X st d(i, j) + |N| BW i = w j∈Ti u3Ti σ s,t∈G,s6=t st

5.4 Betweenness w w Where σst and σst(i) select shortest paths based on Betweenness is a function of the number of shortest- weighted distances similar to CW . paths that pass through a node. It is regarded as a measure of a node’s control over communication flow The weighted degree centrality uses the weight di- [3]. For this investigation, betweenness is defined as, rectly and is equivalent to a node’s number of entries. Being distance measures, the weighted closeness and betweenness measures use the inverted weight to re- X σst(i) Bi = duce the distance between nodes with strong relation- σst s,t∈G,s6=t ships.

5.5 Weighted Degree, Closeness, and 5.6 PageRank Betweenness PageRank is a link analysis algorithm that estimates Newman [2] argues that collaborative relationships in a node’s importance based on the importance of the small teams are stronger than those in large teams. nodes that link to it. It was designed to improve To capture this effect, he proposes weighting each the accuracy of web searches and led to the founding collaboration based on the number of participants. of Google [21]. PageRank is a recursive algorithm k where nodes accumulate PageRank scores and pass Let χi = 1 if participant i collaborated on innovation contest entry k. The weight contributed to the edges fractional shares to their neighbors. The basic algo- between the collaborators of k is then, rithm is defined as,

1 X aij 1 w = PRi = β PRj + (1 − β) k P k Dj |N| i∈N χi − 1 j6=i

4 where PRj and Dj are the RageRank and degree of part to redundancies between top ranking nodes. For node j respectively. The β parameter and second example, two nodes connecting large sub-components term form a random teleportation feature that en- will both have high betweenness centralities. How- sures the total PageRank is conserved between itera- ever, the removal of one of the nodes negates the tions and doesn’t get accumulated in edge cycles. The importance of the other. Therefore, selecting both algorithm has been shown to converge to an equilib- nodes is inefficient as their dependent relationship rium of per-node PageRank scores after a finite num- makes them redundant. To overcome this problem, ber of iterations. Borgatti et al. present algorithms to identify optimal sets of key nodes. Although originally designed for directed networks, PageRank can be applied to undirected graphs [22]. In [8], Oritz-Arroyo builds on [9] [10] by introducing For this investigation, the PageRank algorithm is an entropy based algorithm to identify sets of key modified to use Newman’s weights introduced in players. Let θi be a centrality score for node i. The the last section by replacing the 1/dj term with a entropy of a graph G is then given by, weighted fractional share as follows, X H(G) = − θi log θi a w 1 2 X ij ji i∈G PRW i = β P PRj + (1 − β) aujwju |N| j6=i u6=j Using this definition, Oritz-Arroyo’s algorithm con- The modified algorithm divides the node’s PageR- sists of removing each node from the graph, calcu- ank value by the sum of the edge weights to create lating the resulting sub-graph’s entropy, and select- a minimum unit of PageRank to pass to its neigh- ing the k nodes that caused the greatest decrease in bors. It then multiplies the minimal unit by the as- graph entropy. Algorithm 1 provides the pseudo code sociated edge weights to determine the actual amount for finding the set Bk of k most important nodes in of PageRank passed to each neighbor. The result is a graph based on centrality measure θ. that nodes contribute a greater proportion of their PageRank to the neighbors with which they have the Algorithm 1 Oritz-Arroyo’s entropy based algo- strongest relationships. rithm for finding sets of key nodes in a graph H(G) = − P θ log θ PageRank can also be modified to give preferential v∈G v 2 v for v ∈ G do consideration to a subset of nodes. This is done by G0 = G0 ⊂ G, v 3 G0, ∀u 6= v ∈ G, G0 replacing the 1/|N| teleportation probability with a H (G0) = − P θ log θ non-uniform per-node probability, p with the con- i u∈G0 u 2 u i δ = H(G) − H (G0) straint PN p = 1 [23]. The vector of teleportation i i i i end for node probabilities is referred to as the Personalized B = max({δ , δ , . . . , δ }, k) Preference Vector (PPV). For this investigation, the k 1 2 n PPV is set to the ratio of the node entry counts to the global entry count. To accompany this algorithm, Oritz-Arroyo defines two centrality measures,

X aijwji |Ki| PRWP i = β P PRj + (1 − β) Dv u6=j aujwju |K| χ(v) = j6=i 2 ∗ |E| P t∈T σvt This PPV gives preference to the contest participants γ(v) = P σ that submitted the most entries. s,t∈G,s6=t st

χ(v) is called the connectivity probability distribu- 5.7 Entropy Sensitivity tion and, being based on degree, reflects a node’s communication potential. γ(v) is called the centrality In [9] [10], Borgatti et al. note that selecting the probability distribution, it reflects a node’s ability to nodes with the top k centralities isn’t the same as reach other nodes in the graph. The associated graph selecting the k most important nodes. This is due in entropy equations are,

5 0 where pi = 1/|N|. This process is iterated over X n HCO(G) = − χi log2 χi the nodes in a single component until max i(|pi − n−1 i∈G |pi |) < . The final cumulative nomination of a X node is given by, HCN (G) = − γi log2 γi i∈G n CN (i) = lim pi n→∞

Using HCO(G) with the algorithm defined above A growth rate for the amount of new nominations identifies the nodes that most effect the graph’s den- received by node i is defined as, sity when removed. In practical terms, these nodes serve as the optimal points for seeding information P n  aijp  into the network. When used with HCN (G), the CNGR(i) = lim 1 + j6=i j n→∞ n algorithm identifies the nodes that most effect the pi graph’s connectivity. These nodes serve as the criti- cal bridges that keep the network connected. These measures are combined into a multi-component In addition to using Oritz-Arroyo’s entropies, this pa- cumulated nominations centrality index suitable for per introduces two extensions that take into consid- comparing nodes between components, eration the Newman weights discussed above. C (i) MCN (i) = CN (i) CNGR(i) CSize w DWv |N| χ (v) = P 2 u∈N DWu X w w HCOW (G) = − χi log2 χi Where CCSize (i) is the relative size of the connected i∈G component containing node i.

P d (i, t) w t∈Ti w γ (v) = P 5.9 Group Centrality s,t∈G,s6=t dw(s, t) X w w HCNW (G) = − γi log2 γi Everett et al. [7] describe methods for establishing i∈G the centrality of groups of nodes. For this investi- gation, the primary groups of interest are the teams that submitted innovation contest entries together. Since these teams are cliques within the participant where dw(s, t) is the shortest distance between nodes collaboration network, this paper takes the simpler {s, t} based on the Newman distance weights. approach of considering the collaboration network be- tween cliques described in Section 3. 5.8 Cummulative Nominations

The centrality measures defined thus far don’t explic- 6 Evaluation itly consider that graphs can consist of multiple con- nected components of various sizes. Poulin [13] recog- 6.1 Computations nizes this and provides a centrality measure suitable for comparing nodes between components. Each of the centrality measures in Section 5 were The core of Poulin’s algorithm consists of an iterative computed for the participant and clique collabora- voting process similar to PageRank’s. tion graphs described in Section 3. The results for each centrality measure where then normalized to be- n P n tween 0 and 1 by subtracting the minimum and sub- n+1 pi + j6=i aijpj pi = h i sequently dividing by the maximum. For all of the P n P n u∈N pu + j6=u aujpj centrality measures a higher score is better.

6 6.2 Empirical Cumulative Distribu- The clique network exhibits high correlations between tions B:BW, C:CW, D:DW, and HCN:HCNW - each the unweighed and weighted variant of the same measure. To be useful in selecting important nodes, a centrality The agreement plots again support the correla- measure must have a skewed probability distribution. tion results. Figure 2 plots the empirical cumulative distribution Based on these findings, the remainder of this analy- functions for the centrality results of both graphs. sis only considers the DW, CW, BW, HCNW, HCO, Both sets of ECDF plots are similar with all of the HCOW, MCN, PRW, and PRWP measures. centrality scores exhibiting a degree of skew. Most of the measures are skewed towards either the high or low range. The Closeness (C) and Weighted Close- 6.4 Precision and Recall ness (CW) measures appear to have a bi-model dis- tribution with modes at the high and low extremes. Likewise, the Weighted PageRank (PRW) has a bi- Each innovation contest involves a rigorous peer- model distribution for the clique collaboration graph. review process to select the finalists and winners. The Degree (D) and Weighted Degree (DW) ECDF These outcomes provide some indication of the most curves are more gradual than the others suggesting innovative teams and participants. that these measures may be less effective at differen- The contest outcomes can be combined with the node tiating nodes. rank orderings to create Precision-Recall curves for Based on the ECDFs, all of the centrality measures each centrality measure. For this analysis, precision except perhaps D and DW appear to be candidates and recall at rank depth k are defined as, for identifying important network nodes. W ⊂ G, ∀u ∈ W, u = winner or finalist

6.3 Relative Rankings Sk ⊂ G, ∀u ∈ Sk, rank(u) ≤ k |Sk ∩ W | Pk = Using multiple centrality measures is only effective if |Sk| they identify different kinds of important nodes. To |S ∩ W | R = k determine the uniqueness of each centrality measure k |W | their relative node rankings are compared. For each measure, the nodes are sorted by their cen- trality score in decreasing order and secondarily by the associated participant’s ID. Figures 4 and 5 com- Figure 3 provides the resulting precision-recall curves. pare the rank orderings for the particpant and clique graphs respecitively. The lower triangle of both plots Ideal P-R curves reach the upper-right corner where provide the Spearman correlation coefficient between precision and recall are both 100%. None of the each measure. The upper triangle plots show the per- curves in Figure 3 meet this goal. Generally speak- centage of nodes in both ranking sets on the y axis ing, all of the participant P-R curves start with high as the rank depth increases on the x axis. precision at low recall and trend downward as recall increases. This suggests that at least some finalists For the participant network, high correlations are ob- are ranked highly by the centrality measures. It’s served between the measures B:BW, C:CW, C:HCN, unlikely that all of the placing participants are top C:HCNW, CW:HCNW, and HCN:HCNW. The rank innovators, therefore it seems reasonable that only a agreement plots confirm the correlations. Highly cor- subset are ranked highly. related measures (i.e. B:BW), have rank agreement curves that remain in the higher percentages over the The clique P-R curves exhibit a very different behav- range of ranks. The other measures exhibit an almost ior. They are all relatively flat and fail to reach a linear relationship suggesting that rankings prioritize precision above 40%. This poorer performance may different nodes and only come into agreement as the be an indication that too much information was lost set size grows. during the consolidation from participants to cliques.

7 6.5 Clustering Table 6: Participant k-means cluster de- tails.

One shortcoming of Precision-Recall curves is that 1 2 3 4 5 6 they rely on accurate classifications. Although the Size 639 31 893 35 61 163 contest outcomes allow the classification of some con- BW 0.00 0.03 0.00 0.16 0.00 0.02 test participants, they may not be sufficient for iden- CW 0.99 0.98 0.00 1.00 0.01 1.00 tifying all key innovators. It’s possible that some of DW 0.01 0.02 0.01 0.44 0.03 0.11 the highly ranked non-placing nodes are actually key HCNW 0.03 0.03 0.00 0.08 0.00 0.07 innovators that, for various reasons, haven’t been se- HCO 0.78 0.42 0.80 0.63 0.83 0.75 lected as a finalist or winner. HCOW 0.98 0.98 0.99 0.94 0.98 0.99 The incomplete labeling of top innovators motivates MCN 0.00 0.90 0.01 0.03 0.01 0.01 the use of an unsupervised k-means clustering algo- PRW 0.06 0.20 0.11 0.43 0.18 0.21 rithm to identify groups of nodes with similar central- PRWP 0.03 0.08 0.03 0.31 0.17 0.13 ity properties. If this paper’s hypothesis that top in- novators have abnormal centrality attributes is true, Table 7: Distribution of top 20 ranked then clustering should collect these nodes together nodes for each centrality measure across into one or more groups. partition k-means clusters. Figure 6 provides results from fitting a k-means model Method 1 2 3 4 5 6 with k = 6 clusters to the participant and clique col- BW 1 2 0 13 0 4 laboration graphs. In both cases, the value of k was CW 0 1 0 12 0 7 chosen by testing a range of values and selecting the DW 0 0 0 20 0 0 smallest k that yielded a within-sum-of-squares close HCNW 1 0 0 5 0 14 to the asymptotic minimum. The bar plots show how HCO 0 0 0 2 6 12 the number of placing nodes in each cluster compares HCOW 1 0 7 9 1 2 to the cluster’s size and overall number of placing MCN 0 20 0 0 0 0 nodes. The horizontal line in each plot represents the PRW 0 2 0 17 0 1 %-of-place level for a uniform distribution of placing PRWP 0 0 0 14 2 4 nodes across the clusters. For the participant network, the clusters {2, 4, 5} The clique network chart in Figure 6 also shows three stand out for having a higher proportion of internal to interesting clusters, {4, 5, 6}. Table 8 provides the external placing nodes. Table 6 provides the sizes and details. Here again, the three smallest clusters were centroids for the participant graph clusters. Bolded identified. Two of these clusters, 4 and 6, have cen- values represent the largest centroid value for each troids with high values in different centrality mea- centrality measure. The data reveals that clusters sures. Table 9 shows that clusters {4, 6} consist {2, 4, 5} are the smallest suggesting that they con- of top ranked nodes from a mix of centrality mea- tain the most anomalous nodes. In particular, clus- sures. Cluster 5, however, selected the top two ranked ter four’s centroid has the largest value for a number Weighed Betweenness nodes. These results are also of centrality measures. Cluster two appears to con- encouraging. tain the nodes with the highest MCN centrality score. The unique nature of the nodes in cluster five is less clear but may stem from these nodes having close 6.6 Visual validation PageRank and Weighted PageRank scores. Overall, the presence of three relatively small clusters with Figures 7 and 8 plot the network diagrams for the par- centroids having high values of different centrality ticipant and clique collaboration graphs respectively. scores is encouraging. In both plots, the nodes belonging to the anomalous clusters identified by the k-means analysis are high- Table 7 shows how the top 20 ranked nodes for each lighted. centrality measure are distributed across the clusters. It confirms that the k-means algorithm didn’t just The location of the highlighted nodes in Figure 7 pro- select the top nodes for each measure vides strong evidence that clustering against the cen-

8 Table 8: Clique k-means cluster details. sures can be used to identify important participants within collaboration networks. To test this hypoth- 1 2 3 4 5 6 esis, this paper examined a collaboration network of Size 211 198 128 36 21 43 participants in innovation contests held by EMC Cor- BW 0.00 0.00 0.03 0.12 0.03 0.04 poration in 2010, 2011, and 2012. Two collaboration CW 0.99 0.00 0.99 1.00 1.00 1.00 graphs were created from this data. A wide vari- DW 0.04 0.04 0.16 0.49 0.29 0.61 ety of point centrality measures were computed and HCNW 0.05 0.00 0.08 0.06 0.04 0.04 examined. The centrality measures were found to HCO 0.68 0.67 0.80 0.83 0.43 0.16 have skewed distributions suggesting they were suit- HCOW 0.96 0.96 0.98 0.90 0.97 0.99 able for identifying anomalous nodes. The outcomes MCN 0.02 0.01 0.01 0.02 0.61 0.89 from the innovation contests were used to determine if PRW 0.44 0.44 0.44 0.44 0.44 0.44 previously identified top-innovators were rank highly PRWP 0.05 0.12 0.17 0.46 0.12 0.25 by the measures. To overcome the incomplete label- ing problem, unsupervised cluster models were fit to Table 9: Distribution of top 20 ranked the centrality scores which identified small groups of nodes for each centrality measure across nodes with anomalous centrality attributes. Plots of clique k-means clusters. the collaboration networks with the anomalous nodes 1 2 3 4 5 6 highlighted strongly suggest that the centrality mea- sures and clustering models successfully identified sig- BW 0 0 8 7 2 3 nificant participants and teams within the collabora- CW 0 0 0 6 0 14 tion networks. A comparison to collaboration net- DW 0 0 0 3 0 17 work plots with top-ranked nodes highlighted indi- HCNW 4 0 12 3 0 1 cates that the k-means approach selected novel sets HCO 0 1 15 4 0 0 of nodes for consideration. HCOW 3 6 9 0 0 2 MCN 0 0 0 0 0 20 While the subjectivity of identifying key innovators PRW 0 20 0 0 0 0 prevents conclusive results. This paper’s findings pro- PRWP 0 4 0 15 0 1 vide compelling evidence for using innovation con- tests and centrality measures for identifying innova- tive employees. Using these techniques, organizations trality measures identified significant nodes within such as EMC can take explicit steps to grow and re- the participant collaboration network. Particularly tain key employees and ensure a healthy innovative surprising is the identification of nodes in the smaller culture. components which might have otherwise been ig- nored. The clique network graph provides similar confidence. For comparison, Figures 9 and 10 plot the two col- laboration networks with the top 20 ranked nodes for each centrality measure highlighted. Although sub- jective, a comparison between the cluster and top- ranked graphs suggests that the k-means approach identified a more novel set of nodes within the net- works.

7 Conclusion

Prior research provides evidence that leading scien- tists, innovators, etc, occupy structurally significant locations in collaboration network graphs. This pa- per’s primary hypothesis was that centrality mea-

9 Participant Clique B BW C CW B BW C CW 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 D DW HCN HCNW D DW HCN HCNW 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 HCO HCOW MCN PRW HCO HCOW MCN PRW 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 PRWP PRWP 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00

Figure 2: Centrality ECDFs for the par- ticipant and clique collaboration networks.

Participant Clique DW CW DW CW 1.00 0.75 0.4 0.50 0.25 0.2 0.00 0.0 BW PRW BW PRW 1.00 0.75 0.4 0.50 0.25 0.2 0.00 0.0 PRWP MCN PRWP MCN 1.00 0.75 0.4 0.50 0.25 0.2 0.00 0.0 Precision Precision HCO HCNW HCO HCNW 1.00 0.75 0.4 0.50 0.25 0.2 0.00 0.0 HCOW HCOW 1.00 0.75 0.4 0.50 0.25 0.2 0.00 0.0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Recall Recall

Figure 3: Precision-Recall curves for the participant and clique networks.

10 B

0.95 BW ***

0.45 0.40 C *** ***

0.46 0.42 0.95 CW *** *** ***

0.48 0.44 0.71 0.60 D *** *** *** ***

0.68 0.66 0.48 0.51 0.56 DW *** *** *** *** ***

0.49 0.47 0.92 0.92 0.61 0.46 HCN *** *** *** *** *** ***

0.38 0.37 0.83 0.83 0.54 0.32 0.95 HCNW *** *** *** *** *** *** ***

0.15 0.19 −0.17 −0.13 0.18 0.12 −0.05 −0.02 HCO *** *** *** *** *** *** *

0.33 0.33 0.01 0.09 −0.14 0.43 0.04 −0.03 −0.07 HCOW *** *** *** *** *** **

0.02 0.00 −0.21 −0.32 0.17 −0.04 −0.36 −0.43 0.17 −0.11 MCN *** *** *** *** *** *** ***

0.43 0.43 −0.08 −0.11 0.41 0.54 −0.10 −0.18 0.28 0.20 0.33 PRW *** *** *** *** *** *** *** *** *** *** ***

0.50 0.48 0.22 0.24 0.26 0.61 0.21 0.11 −0.03 0.54 0.02 0.49 PRWP *** *** *** *** *** *** *** *** *** ***

Figure 4: Participant network node rank correlation and aggrement plots

11 B

0.88 BW ***

0.66 0.50 C *** ***

0.58 0.48 0.93 CW *** *** ***

0.71 0.56 0.79 0.74 D *** *** *** ***

0.63 0.53 0.68 0.71 0.88 DW *** *** *** *** ***

0.60 0.55 0.72 0.72 0.54 0.46 HCN *** *** *** *** *** ***

0.44 0.41 0.54 0.49 0.38 0.25 0.88 HCNW *** *** *** *** *** *** ***

0.37 0.39 0.15 0.16 0.38 0.36 0.34 0.34 HCO *** *** *** *** *** *** *** ***

0.28 0.31 0.17 0.24 0.20 0.34 0.12 0.03 0.13 HCOW *** *** *** *** *** *** ** **

0.19 0.13 0.14 0.02 0.28 0.23 −0.32 −0.39 −0.28 0.06 MCN *** ** *** *** *** *** *** ***

0.03 0.03 0.02 0.02 0.03 0.03 0.02 0.02 0.04 −0.02 −0.01 PRW

0.45 0.41 0.29 0.36 0.59 0.80 0.11 −0.09 0.36 0.38 0.25 0.01 PRWP *** *** *** *** *** *** ** * *** *** ***

Figure 5: Clique network node rank cor- relation and aggrement plots

12 Participant Clique

40%

75%

30%

50% 20%

25% 10%

0% 0%

1 2 3 4 5 6 1 2 3 4 5 6 Cluster Cluster

Cluster Composition % is Placed % of Placed Cluster Composition % is Placed % of Placed

Figure 6: Composition of K-means clus- ters.

13 ●● ●● ● ● ● ●● ●● ●● ●● ●● ● ● ● ●●● ● ●● ● ●●● ● ●●● ● ● ● ●● ●● ● ●●● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●●●● ● ●●● ●● ● ● ●● ● ● ● ● ●● ●● ●● ● ●●●● ●● ●● ● ● ●● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ●●● ●● ● ● ●●● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ●●● ●● ● ●● ●● ●● ● ● ● ● ● ● ●●● ● ● ●● ●●●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ●● ●● ●●● ●● ●● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●●●● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ● ●● ● ● ●● ●● ●●●● ●● ● ●●● ● ● ● ● ● ●● ●●●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●●●● ● ●● ●● ● ●●● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ●● ●● ● ●●● ●● ●● ● ●●● ● ●●●●● ●●● ●● ● ●● ●●● ● ●●● ● ●●●● ●●● ●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●●●● ●●●● ● ●● ●●● ●● ● ● ●●● ●● ● ● ● ●● ●●● ● ●●●●●● ● ●●●● ● ● ● ● ● ● ●●● ● ●●● ● ●●●● ● ●●●●● ●● ● ●● ● ● ● ● ●●●● ● ● ● ● ●●● ● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●●● ●●● ● ● ●● ●●● ●●● ●● ● ● ● ●● ●●●●● ● ● ● ● ● ●●●●● ● ●●●● ●●●●● ●● ● ●● ●● ● ● ●● ● ● ●●●●●● ●● ●● ●●●●●●● ● ● ● ● ●● ● ● ●● ●●● ●● ●●●●● ● ●●●● ● ●● ●● ●●●●● ●●●●● ● ●●●●● ●● ●● ●●●● ●●●●● ● ●● ●● ●●●●● ●●●● ●●●●●● ●● ●● ●● ● ● ●●● ● ● ●● ●●● ●●●● ● ● ● ●●● ●●● ● ●●●●●● ● ●● ●●●● ● ● ● ● ●● ●●● ●● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ●● ●●● ●● ●●● ● ●●● ● ● ● ●● ●● ●● ● ●●● ● ●●●● ●●●●● ● ●●● ● ● ●● ● ● ●●●● ● ●●● ●●●●●● ●● ●●● ●●●● ●● ●●● ●●● ● ● ● ●●● ●●●●● ● ● ● ● ●● ●●● ● ●● ●● ●●●●● ●● ●● ● ● ● ●● ● ● ●● ●● ● ●●● ●●● ●● ●●●● ● ●● ● ● ● ●● ● ● ● ●●●●● ●● ● ● ●● ● ● ● ●● ●●●●●● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●●●●●●●●●● ● ●● ● ● ●●● ● ● ● ●●● ● ● ●● ●●●●●●●●● ● ●●●● ● ● ●● ● ●● ●● ● ●● ● ●●●●●●● ●●● ● ● ●●●● ● ● ●● ●● ● ● ●●●●●● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ●● ● ● ● ●●● ● ● ●● ●●● ● ● ●●●● ● ● ●● ●●●●●●● ● ● ● ●● ●●●●● ● ●● ●●●●●●● ● ● ●●● ● ●● ●●● ●● ● ● ●● ● ●● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ●● ●●● ● ●● ●● ●● ●● ● ● ●●● ●●●●● ● ● ● ● ● ● ● ● ● ●● ●● ●●● ●● ●● ●● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ●● ● ●● ● ● ●● ● ● ●●● ●● ●●● ●● ●● ●● ● ● ● ●● ●● ● ● ●● ● ●● ● ●● ●●●●●● ● ●●● ●●● ●●● ● ● ● ●● ● ●● ●●●●● ●●●● ●● ●●●● ● ●●● ● ●● ●● ●●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ●●●● ● ●● ●● ●● ● ● ●●● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ●● ●● ● ●●● ● ● ●● ●●● ● ● ● ●●● ●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ●● ●● ●● ● ●●● ● ●● ● ●● ● ●●● ● ●● ●● ● ●● ●● ●● ●●● ●● ● ●● ● ● ● ● ●● ● ●●● ● ●● ● ●●● ●● ● ●●● ● ● ●●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ● ●● ●● ● ●●● ● ● ● ●●● ● ●● ●● ●●● ●● ●●● ●●●● ● ● ●●● ● ●●● ● ● ● ● ● ● ●● ●●● ●●● ●● ●

● 2 ● 4 ● 5

Figure 7: Participant collaboration net- work graph with anomolous clusters high- lighted.

14 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ●● ●●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●● ● ●● ● ●●●●●●●●●●●● ●● ● ●● ● ●●●●●●●●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●●●● ● ● ●●● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● ●●●●● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●●●●● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ●● ●●●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● 4 ● 5 ● 6

Figure 8: Clique collaboration network graph with anomolous clusters highlighted.

15 ● ●● ●●● ●● ●● ●●● ●●● ●●● ● ●● ●● ●● ● ●● ●● ●● ●● ●● ● ●● ● ●● ●● ● ●● ● ●● ● ● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ● ● ●● ● ●●● ●● ●● ●● ●● ● ●● ● ● ● ● ●● ● ●●● ● ●● ●● ● ● ●●● ●● ●● ● ●●● ● ●● ● ● ●●●● ●● ●● ●●● ● ●●● ●●●● ●● ● ●●● ●●● ●● ● ●●●● ● ●●●● ● ●● ●● ●●● ●● ● ● ● ● ●●●● ●● ● ● ● ●● ●●● ● ● ● ● ●●● ● ● ● ● ● ●●●● ● ●● ●●● ● ●● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ●●●● ●●● ●● ● ●● ● ● ●●● ●● ● ● ● ●● ●● ● ● ●●●●● ● ● ● ● ●● ● ● ● ●●● ●●● ●● ● ● ●● ●●● ● ● ●●● ●● ●● ●● ● ●● ● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ●● ●● ●●●●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●● ●●● ● ● ●●● ● ●●●● ●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ●●● ● ●● ● ● ●● ● ●●● ●● ●●● ●● ●●● ●●● ●●● ● ● ●● ● ● ●●● ●●● ●● ●●● ●● ● ● ●●● ●● ● ●● ●●●●●●● ● ● ● ●● ●●● ● ● ● ●● ● ●●●● ● ●● ● ● ● ● ● ●● ●●● ●● ●● ● ● ● ● ●● ●●●● ● ● ● ●●● ●● ●●● ● ● ● ● ● ●●● ●● ●●●● ● ● ● ●● ●●●● ● ● ● ●● ● ● ● ●●●●●● ● ●●●● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ●●●●●●●●●●●● ● ● ● ● ●● ●● ● ●● ●●● ●●●● ●●●●●●●●●● ●● ● ●● ● ● ●● ●● ● ● ● ● ●●●● ● ● ● ●● ● ●● ●● ●●●●● ● ● ●● ●● ●●● ● ●●●● ● ● ●● ●● ●● ● ●● ●●●● ●● ●● ●● ● ● ● ●●●●●● ● ●● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ●● ●● ● ●●● ● ● ●● ●●● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ● ●● ●● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ●● ● ● ●● ● ●●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ●● ●●●● ● ● ● ● ● ●●● ● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ●●● ● ●● ●●● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ●●●● ● ● ●● ● ●●●●●● ●●● ●● ●● ●●● ● ● ● ● ●●● ●●● ●●●●●●● ● ● ● ●●● ●● ●●● ●●● ● ●●● ●●●●● ● ●●● ●●● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ●● ● ● ● ●● ● ● ●●●●●● ●● ●●●● ● ● ● ●●●●● ● ● ● ● ●●● ● ●● ● ●●●●●● ●●● ● ● ●●● ● ● ● ●● ● ●● ● ●●● ●●●● ●● ●● ● ●●● ● ● ●●●● ●●●●●●●● ●● ●● ● ●● ● ● ● ●●● ●● ●●●●●●●●● ●●● ●●● ● ● ● ●● ● ● ●●●● ● ●●●●●● ● ● ●● ● ● ●● ● ●● ●● ●●●●●● ●● ● ●● ●● ●● ●●●● ●●● ● ● ●●● ●● ● ●●● ● ● ● ●● ●●● ●●● ● ● ●● ● ●● ● ●● ●● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ●●● ●●●● ●● ●●● ●●●● ●●●● ●●● ● ●●● ● ●● ● ●●●● ● ●●●●● ●● ● ●●●● ● ● ● ● ● ●● ● ●●●● ● ●●●●●● ● ● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ●●●●● ● ● ● ● ●● ● ● ●●● ●●●● ● ●● ● ●● ●● ●● ●●●●●●● ● ● ●● ●● ● ● ●●●●● ● ●● ●● ● ● ●● ●●●●●●●●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●●●● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ●●● ● ●● ● ● ●● ● ●● ●● ●● ● ● ●●● ●● ●● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ●●● ●● ● ●●● ● ●● ●● ● ●● ●●● ●● ● ● ● ●● ●●● ●● ●●

Figure 9: Participant collaboration net- work graph with top 20 ranked nodes for each centrality measure highlighted.

16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ●●● ● ●● ●● ●● ●● ● ● ● ● ● ●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●● ● ● ● ●●●●●●●●●● ● ● ● ● ● ● ●●●●●●●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●●●●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ●●●● ● ● ●●● ● ●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ●●●● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●●●● ● ● ●● ● ●●●●● ● ● ● ●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●

Figure 10: Clique collaboration network graph with top 20 ranked nodes for each centrality measure highlighted.

17 References [15] E. Gudrais. Innovation At the Intersection. Har- vard Magazine, May-June 2010. [1] Mark E. J. Newman. Scientific Collaboration Net- [16] B. Uzzi, and J. Spiro. Collaboration and Creativ- works. Proc. Natl. Acad. Sci. USA, 98(2):404-409, ity: The Small World Problem. American Journal January 2001. of Sociology, 111.2 (2005): 447-504. [2] Mark E. J. Newman. Scientific Collaboration Net- [17] G. Ahuja. Collaboration networks, structural works: II. Shorest paths, weighted networks, and holes, and innovation: A longitudinal study. Ad- centrality. Physical Review E, 64:016132, 2001. ministrative science quarterly, 45.3 (2000): 425- [3] Linton C. Freeman. Centrality in social networks: 455. conceptual clarification. Social Networks, 1:215- [18] Scott White, and Padhraic Smyth. Algorithms 239, 1979. for estimating relative importance in networks. [4] Linton C. Freeman. A set of measures of centrality Proceedings of the ninth ACM SIGKDD inter- based on betweenness. Sociometry (1977): 35-41. national conference on Knowledge discovery and data mining. ACM, 2003. [5] E. Whelan, S. Parise, J. de Valk, and R. Aalbers. Creating Employee Networks that Deliver Open [19] J. Singh. Collaborative networks as determi- Innovation. MIT SLOAN Management Review, nants of knowledge diffusion patterns. Manage- Fall 2011 Vol.53 No.1. Reprint number 53108. ment science, 51.5 (2005): 756-770. [6] F. Coulon. The use of analysis [20] R. . Cross, R. D. Martin, and L. M. Weiss. in innovation research: A literature review. Un- Mapping the Value of Employee Collaboration. published paper. Lund University, Lund, Sweden The McKinsey Quarterly, 2006, Vol 3. (2005). [21] S. Brin and L. Page. The Anatomy of a Large- [7] M. G. Everett, and S. P. Borgatti. Extending Scale Hypertextual Web Search Engine. Proc. 7th centrality. Models and methods in social network International World Wide Web Conference, 1998. analysis, 35.1 (2005): 57-76. [22] V. Grolmusz. A Note on the PageRank of Undi- [8] D. Oritz-Arroyo. Discovering Sets of Key Players rected Graphs. CoRR, 2012, abs/1205.1960. in Social Networks In Computational Social Net- [23] T. H. Haveliwala. Topic-sensitive PageRank. work Analysis, (2010), pp. 27-47. Proceedings of the 11th international conference [9] M. G. Everett, and S. P. Borgatti. The centrality on World Wide Web (WWW 02). 2002, 517-526. of groups and classes. Journal of Mathematical [24] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Sociology, Vol. 23, No. 3. (1999), pp. 181-201 Uncovering the overlapping [10] S. P. Borgatti. The key player problem. Dynamic of complex networks in nature and society. Na- social network modeling and analysis: Workshop ture, 2005, Volume 435, Issue 7043, pp. 814-818 summary and papers. National Academy Press, 2003. [11] Stephen P. Borgatti, and Martin G Everett. Net- work analysis of 2-mode data. Social Networks, 19(3): 243-269. [12] V. Krebs. Uncloaking terrorist networks. First Monday, 2002 7(4). [13] R. Poulin, M-C. Boily, and B. R. Masse. Dynam- ical systems to define centrality in social networks. Social networks, 22.3 (2000): 187-220. [14] L. Fleming, C. King, and A. I. Juda. Small worlds and regional innovation. Organization Sci- ence, 18.6 (2007): 938-954.

18