As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs Appendix

Petko Bogdanov1, Ben Baumer2, Prithwish Basu3, Amotz Bar-Noy4, and Ambuj K. Singh1

1 University of California, Santa Barbara, CA 93106, USA, {petko,ambuj}@cs.ucsb.edu 2 Smith College, Northampton, MA 01063, USA, [email protected] 3 Raytheon BBN Technologies, 10 Moulton St., Cambridge, MA 02138, USA, [email protected] 4 The City University of New York New York, NY 10016–4309, USA, [email protected]

1 The choice of group score

We model a group as a collection of pairwise interactions. Ideally we would consider higher-order interactions such as subgroups of size 3,4 or more, and incorporate them into our model. However, this approach has several limitations: 1. Although graphs may not be the ideal framework for modeling higher-order interactions, and hypergraphs or simplicial complexes [3, 15, 7] may be the preferred approach, algorithms for the latter settings are computationally demanding. 2. To validate the developed theories, one would need empirical datasets fea- turing a sufficient number of instances where a subgroup has interacted, and such data is hard to find for higher-order interactions. To elaborate on the second above, data at the subgroup level becomes either sparser (in the case of sports) or is mostly unavailable (in the case of protein/gene interactions). For bigger subgroups in sports there are fewer games (observations) in which the same group participates, limiting one’s ability to measure subgroup performance with high statistical confidence. In the case of gene networks, major technologies allow for testing only pairwise interactions, while the overall goal is to understand the system at a complex/pathway level. We address both of the above challenges by requiring that all pairwise in- teractions/performances are strong (penalizing groups in which the minimum pairwise interaction is low). In a sense this is a weaker condition that seeks to approximate the group performance. Adopting this approximate condition based on the weakest link allows for scalable solutions and general methods that are applicable to multiple domains regardless of the scarcity of data at the subgroup level. In the gene interaction scenario, we show empirically that the mined cliques involve genes of similar function and hence are likely to capture 2 Petko Bogdanov et al. complexes/pathways. Our weakest link scoring function also correlates well with sports subgroups performance, making it an acceptable approximation. Arguably, higher-order relation modeling (based on hypergraphs and simpli- cial complexes) or optimizing different connectivity criteria (e.g. a good span- ning tree) is more critical in “heterogeneous” teams where there exists a leader- ship/hierarchical structure within the group, e.g., in military platoons or terrorist cells. In such scenarios, the strength of every pairwise connection may not be critical, since the organizational hierarchy is specifically designed to streamline communication among its members, e.g., a weak link between a general and a private or a terrorist cell leader and a terrorist does not threaten the effectiveness of the unit as a whole. Conversely, the addition of a dynamic leader to an other- wise poorly performing group might make the group highly effective even if the leader does not have a strong pairwise relationship with each in the group. For non-hierarchical (“flat”) group structures, the topic we focus on in this paper, it may not be necessary to score each individual higher-order subgroup separately as the score function is unlikely to be discontinuous with the addition of a node, as it might be in the terrorist cell example. Hence, we focus on modeling the strength of non-hierarchical “flat” subgroups based on our weakest link scoring scheme.

2 Proofs

2.1 NP-completeness (Theorem 4.1)

For any scoring function s() that maps a graph substructure to a non-negative real number, the decision problem corresponding to mDkC, namely: “Is there a set of m substructures A, each of size k, such that ds(A) ≥ B for some positive number B,” is NP-complete.

Proof. We show the NP-completeness for the special case when α = 0 by a reduction from the Set Cover problem. An instance of Set Cover contains a set S of subsets over a finite element universe U and an integer m. The problem asks if there is a subset S0 ⊂ S, |S0| ≤ m that covers the whole universe. From an instance of Set Cover we construct a graph G = (U, E, w), such that the node set corresponds to the elements in the universe U, the edge set is constructed by adding an edge between every two elements (ui, uj) that are both included in one of the sets in S. Weights are all set to 1 and the decision threshold B is set to |U|/k. If there is a Set Cover solution A of size m or less, then the corresponding set of cliques achieves the maximal ds(A) = |U|/k (possibly one may need to append the solution with extra cliques if the Set Cover solution is of size less than m, which will not change the score). The implication in the opposite direction holds as well: if there are a set of k-cliques that achieve a score of |U|/k (i.e. they have included all nodes in the graph) then the corresponding sets in the Set Cover instance will cover the universe. QED Mining Diverse Cliques in Weighted Graphs 3

2.2 Monotonicity and submodularity (Theorem 4.2) If k and α are fixed, the diversity score function ds(A) is: – Monotonic, i.e. for any subset A ⊆ B, ds(A) ≤ ds(B) – Sub-modular, i.e. for any sets A, B, ds(A) + ds(B) ≥ ds(A ∪ B) + ds(A ∩ B) Proof. To see monotonicity, suppose that A ⊆ B, such that A ∪ D = B, where D is some possibly empty set. Then,

α X 1 − α [ ds(B) = ds(A ∪ D) = s(C) + C k k C∈A∪D C∈A∪D

α X 1 − α [ ≥ s(C) + C = ds(A), k k C∈A C∈A where the inequality is justified by fact that s(C) ≥ 0 for any set C. Sub-modularity follows from a similar argument. For any two sets A and B, we have that:

α X 1 − α [ α X 1 − α [ ds(A) + ds(B) = s(C) + C + s(C) + C k k k k C∈A C∈A C∈B C∈B " # " # α X X 1 − α [ [ ≥ s(C) + s(C) + C + C k k C∈A∪B C∈A∩B C∈A∪B C∈A∩B = ds(A ∪ B) + ds(A ∩ B) . This follows from elementary set theory, since if a clique C is in either A or B but not the other, its nodes get counted once on the LHS of the inequality and once on the RHS (in the A ∪ B term). Conversely if C is in both A and B, then its nodes gets counted twice in the LHS (once in each term), and twice on the RHS (once in each term). QED

2.3 Maximum score improvement (Theorem 5.1) Let C, |C| ≤ k be a clique of size not exceeding k. The maximum improvement of ds score when adding any k super clique of C to a clique set A is bounded by: k − |(∪ B) ∩ C| δ(A,C) = ds (A ∪ C) − ds(A) = α min w(u, v) + (1 − α) B∈A , u,v∈C k where in the diversity part, the set (∪B∈AB) ∩ C is the intersection of nodes included in A and nodes in C. Proof. The highest-score super clique of C of size k (if existing) can improve the score part by not more than α minu,v∈C w(u, v) since this score assumes opti- mistically that no lower-weight edge than the ones included in C will be added in the super clique. Similarly, the diversity part improvement is maximum as it assumes that all unobserved nodes in the k-completion of C are not contained in A. QED 4 Petko Bogdanov et al.

3 Data sources and preparation

Team-based data These data sets contain whole team performance without explicit measure of the pairwise interactions strength. We compute pairwise in- teractions using a generic scoring model capturing performance significance. We consider the number of games won (highly-viewed movies) for each pair of team- mates who appeared in at least one game (movie). The weight (between 0 and 1) on each edge represents the probability of observing the number of wins in the games in which the pair participated. We assume that the true success prob- ability of a pair is fixed over a fixed time period. The expected pair winning percentage is estimated from a k-NN lookup within historical background data for similar pairs (based on personal success rate) and p-value of the current ob- servation is used to define the edge score (the score is simply 1−p-value as lower p-values correspond to more significant observations). NBA: This data set contains play-by-play data for 7, 139 games from the past six seasons (2005 − 2012) from the National Association (NBA) 5. We use the two most recent seasons (2010 − 2012) as a testing set, while the background set consists of games in the four previous seasons. The generic weight function for each edge represents the p-value of observing the number of wins for the pair of teammates (all games that included both players in the testing set), using the performance of similarly successful players in the background set as a null model. In order to alleviate issues with free-riders, this data was pre-filtered so that a minimum of 10 minutes of playing time in each game was required. MLB: Play-by-play data for 14, 577 games from the past eleven regular seasons (2000 − 2011) of Major League Baseball (MLB) was parsed 6. Similar to the basketball data, the two most recent seasons (2010-2011) were used as the testing set, and the generic edge scoring function was used. IMDB: Data for 4,054 Hollywood movies going back to 1915 is available from the IMDB 7. Each movie also contains a cast & crew list and user-generated IMDB rating. Among all movies with at least 10,000 user ratings on IMDB, a movie was said to be successful if more than the median number of voters (27,900 in this case) had rated it. Participation was limited to actors, actresses, directors, producers, writers, editors, and cinematographers who were listed among the top 20 in the credits. Pair-wise Data YeastNet: We obtain a gene interaction graph from [5]. Nodes correspond to genes and gene products, while edges measure the level of interaction, computed as the absolute value of the  score [5]. Stocks: We use the end of day stock price for all stock symbols for a period of 2 years (2008-2009) 8. We compute a correlation graph for every pair of stock symbol time series, where the weight on the edge is the absolute value of the

5 http://basketballvalue.com/ 6 http://www.retrosheet.org/ 7 http://www.imdb.org/ 8 http://www.cs.brown.edu/~pavlo/pennystocks/ Mining Diverse Cliques in Weighted Graphs 5

Pearson correlation coefficient. The graph is thresholded, keeping edges with correlation greater than 0.5.

4 Discovered subgroups in real-world networks and discussion of results

Table 1 lists the top 10 significant performance pairs in the NBA data set. The highest scoring pair, D. Gibson and O. Casspi of the Cleveland Cavaliers, played each at least 10 minutes in 32 games, winning 15 of the games. This modest record is highly unlikely given the composition of their team, which had an estimated winning probability of just 0.2 (based on our historical background model). Assuming this prior probability of 0.2, by winning 15 of 32 games the Cavaliers performed particularly well relative to expectations over the time pe- riod.

Player 1 Player 2 w g yˆ p-value Gibson, Daniel Casspi, Omri 15 32 0.198 0.001 Simmons, Bobby Young, Nick 8 8 0.417 0.001 Gibson, Daniel Gee, Alonzo 24 59 0.224 0.001 Uzoh, Ben Davis, Ed 6 7 0.251 0.001 Uzoh, Ben Anderson, Alan 6 7 0.251 0.001 Sessions, Ramon Casspi, Omri 17 39 0.212 0.001 Jamison, Antawn Casspi, Omri 22 57 0.211 0.002 Gibson, Daniel Irving, Kyrie 13 29 0.198 0.002 Casspi, Omri Gee, Alonzo 21 54 0.211 0.002 Hollins, Ryan Casspi, Omri 8 16 0.174 0.003 Table 1. Top performing NBA pairs.

However, the data in Table 1 includes only the highest significance pairs and disregards diversity. The list is almost exclusive to members of the Cavaliers and Toronto Raptors and there is high overlap among the pairs (4 players occupy 12 of the 20 listings). In addition, a free rider effect is likely present, since most observers would attribute the improvement of the Cavaliers to the addition of Kyrie Irving, the first overall pick in the 2011 NBA draft, and a nearly unani- mous selection as the 2011-2012 NBA Rookie of Year. When we consider groups of larger size and incorporate diversity, the reported results agree to a greater extent with conventional wisdom. Within the best scoring triples in NBA, we find a trio of particular interest (ranked third), comprised of (i) the aforemen- tioned Irving; (ii) T. Thompson – the fourth overall pick in the draft; and (iii) A. Jamison – the team’s most decorated veteran player. The team’s fortunes improved considerably late in the season when Thompson joined Irving and Jamison in the starting lineup, and our method was able to identify this trio for their unexpectedly strong performance. Additional validation for DiCliQ comes from the presence of , , and DeShawn Stevenson of the 2011 champion Dallas Maver- icks. Although the Mavericks had been perennial contenders for a decade with 6 Petko Bogdanov et al. their German superstar Nowitzki in tow, they had never made the NBA Finals. The addition of Chandler, a defensive standout who became their starting cen- ter, was widely credited as being the acquisition that pushed them over the top. Even more interesting is the presence of Stevenson, a quality reserve who started every game of the Finals and defended opposing forward LeBron James to a se- ries of sub-par performances. Unlike many traditional performance metrics, our method is ignorant of individual statistics, and thus implicitly weights defensive and offensive contributions equally. Notably absent from this list are famed trios such as Paul Pierce, Kevin Gar- nett, and Ray Allen of the Boston Celtics, and LeBron James, Dwyane Wade, and Chris Bosh of the . In both cases, a history of high winning per- centages raises the bar for statistical significance, making it even more difficult to achieve a low p-value.

ds(A) Player 1 Player 2 Player 3 w g yˆ p-value 0.997751 Casspi, Omri Gibson, Daniel Gee, Alonzo 15 32 0.198 0.001 0.995061 Uzoh, Ben Gray, Aaron Anderson, Alan 5 6 0.253 0.005 0.991847 Jamison, Antawn Thompson, Tristan Irving, Kyrie 15 40 0.201 0.008 0.977488 Felton, Raymond Forbes, Gary Martin, Kenyon 6 6 0.526 0.021 0.973096 Jeffries, Jared Williams, Shelden Walker, Bill 4 4 0.405 0.027 0.970631 Hollins, Ryan Harangody, Luke Sessions, Ramon 8 17 0.225 0.022 0.96946 Bayless, Jerryd Okafor, Emeka Thornton, Brown 4 4 0.418 0.031 0.965735 Dooling, Keyon Mbah a Moute, Luc Salmons, John 27 63 0.305 0.025 0.952345 Hawes, Spencer Meeks, Jodie Iguodala, Andre 52 87 0.472 0.013 0.946515 Ellis, Monta Udoh, Ekpe Lee, David 38 79 0.384 0.049 0.943947 Gibson, Taj Asik, Omer Rose, Derrick 53 64 0.704 0.017 0.941515 Evans, Reggie Johnson, Amir Bargnani, Andrea 8 19 0.236 0.058 0.931264 Cunningham, Dante Speights, Marreese Mayo, O.J. 31 43 0.571 0.032 0.930877 Young, Nick Foye, Randy Paul, Chris 14 20 0.461 0.027 0.927572 Hayes, Chuck Thomas, Isaiah Outlaw, Travis 7 13 0.242 0.020 0.921864 Nowitzki, Dirk Chandler, Tyson Stevenson, DeShawn 37 43 0.665 0.003 0.92149 Ginobili, Manu Parker, Tony Diaw, Boris 11 12 0.692 0.076 0.915663 Brown, Shannon Blake, Steve Caracter, Derrick 5 5 0.610 0.084 0.90676 Fields, Landry Shumpert, Iman Smith, J.R. 17 26 0.475 0.052 0.902117 Parker, Anthony Samuels, Samardo Varejao, Anderson 4 7 0.172 0.020 Table 2. Output of DiCliQ (3, 20, 0.5): Several triples (Rows 1, 3, 6, and 20) are members of the Cleveland Cavaliers.

Of course, there may also be lurking variables at play. The performance of the New York Knicks trio of Landry Fields, Iman Shumpert and J.R. Smith may provide an example. Smith joined the team shortly after the sensational ascension of the point guard Jeremy Lin, who was not identified by DiCliQ. It may be the case that our method is too crude to isolate the contribution of Lin, but identifies his teammates instead. Mining Diverse Cliques in Weighted Graphs 7

When applied to MLB, one interesting triple obtained by DiCliQ includes M. Ramirez, C. Blake, and H. Kuo of the Los Angeles Dodgers. While Kuo was an effective relief pitcher for the team, the Dodgers acquired both Blake (who became the starting third baseman) and Ramirez (starting leftfielder) in late July of the 2008 season. Ramirez promptly went on an offensive tear, winning NL Player of Month for August, and led the Dodgers to the playoffs. With an otherwise unremarkable lineup, the Dodgers won 13 of the 14 games in the testing set in which these three players appeared. Although most people in the IMDB data set have participated in relatively few movies, the large number of participants in each movie make the network large and sparse. Among the actual pairs who have collaborated in at least five movies since 2000, several well-known couples are apparent. In particular, Johnny Depp and Helena Bonham Carter, as well as Ben Stiller and Owen Wil- son stand out. Also present are several people connected to the Harry Potter franchise, most notably lead-actor Daniel Radcliffe and author J.K. Rowling. However, given the sparseness of the network, there are very few triples that have appeared a non-trivial number of movies in the testing set. Nevertheless, along with screenwriter Steven Kloves, Radcliffe and Rowling formed the second highest scoring triple in the top 20 identified by DiCliQ, with seven consecutive successful movies. Also discovered are the actors Tobin Bell, Costas Mandy- lor, and Betsy Russell from the Saw franchise, with five consecutive successful movies.

5 Extended discussion on parameter setting

As argued in the main paper, it is hard to pre-select a universally good clique size for any application. When no prior domain knowledge is available, this parameter can be varied to aid exploratory analysis of a new dataset at hand. Interesting values of k will be those in which the solution changes substantially (i.e. cliques of size k do not tend to include the mined cliques of size k-1), since these will constitute interesting phase transitions in the results. Note, that similar challenges exist also in other widely used data mining techniques such as pre-selecting the number of clusters in clustering algorithms. The diversity weight as demonstrated in the experimental section may affect the result. It provides control over diversity versus clique strength (Fig. 3a, BUDiC curve in the main paper). Again, for a new dataset one can find critical values of qualitative changes in the result set and focus on those critical values for analysis (Fig. 3b, in the main paper). An appropriate number of mined cliques m can be estimated based on the delta increase of score for increasing m (assuming fixed k and α). As one increases m, the solution score improvement will gradually diminish (either due to high overlap among newly added cliques or low score). One can adopt a fixed threshold cut-off based on statistical p-value of this delta increase. To obtain an empirical distribution of delta score increases, one can randomize the network (exchanging 8 Petko Bogdanov et al. scores on edges) and select the cut-off score increase corresponding to p-value = 0.01 (a typical statistical cut-off).

6 Details on competing techniques

6.1 iMDV

We implement and compare DiCliQ against an iterative version of the MaD- Solver heuristic for weighted cliques by Bandyopadhyay et al [2] termed iMDV. MaDSolver is a heuristic for the highest cardinality clique with score exceeding a threshold δ. The score in [2] is defined as the minimum average weight of a node’s adjacent edges, where the minimum is taken over all nodes in the clique. In addition, the original method searches for only one clique. We augment the method to search for multiple cliques by repeated applications of MaDSolver, while removing the edges of the mined cliques. We also change the cliques score definition to the weakest link (MaDSolver can be applied “as is” for the scoring scheme based on the weakest edge).

6.2 CFinder

We obtained the implementation of CFinder [10, 1, 13] from the authors’ web- site9. When running CFinder for the purpose of comparison in the purity experi- ment we choose the clique size k = 6. When applied with its default parameters, the method does not scale to the YeastNet network due to its all-k-cliques enu- meration phase. Without additional filtering its memory footprint exceeds 6GB. We followed the recommendations of the authors on the CFinder software web- site and applied a clique intensity threshold of 0.6 and minimum edge weight threshold of 0.15. With these settings CFinder returns 37 communities of size 6 or larger. We re-sample communities larger than size 6 to obtain multiple m = 37, k = 6 solutions and average behavior to enable a fair comparison to BUDiC. The standard deviation deviation of the purity of all these instantia- tions is less than one percent of the average value (refer to the error bar in the purity experiment figure).

6.3 Random

In the purity experiment Fig.3(b) in the main manuscript, we compare the purity of the competing techniques to a control method, that we call Random. It samples nodes uniformly to compile groups that match the parameters for the competing techniques in the experiments k = 6, m = 37. We then measure and report the average purity (+ standard deviation) for 100 solution instantiations based on the random sampling.

9 www.cfinder.org Mining Diverse Cliques in Weighted Graphs 9

6.4 Baseline In the scalability analysis we compare to a naive Baseline method. Baseline also obtains a (1−1/e)-approximation for mDkC since it implements the same greedy strategy as DiCliQ. It enumerate all possible cliques of the desired size k and then greedily (based on best ds() improvement) compiles an m-size result set. While Baseline is feasible for small sparse networks (up to |V | = 500) and small values of k, however, the clique enumeration step quickly becomes a bottleneck as the input size increases due to its combinatorial nature. It fails to complete in less than 4 hours in all but our smallest network from the NBA.

7 Applications beyond teams and gene complexes

Our methods can be applied to other scenarious in which clique analysis is im- portant, but the discovered cliques are not necessarily interacting agents. For example, in analysis of financial markets and near-duplicate detection in digital image/video libraries such as Flickr and YouTube, it may be difficult to accu- rately observe “group success” or strength of pairwise interactions, but there is a desire to isolate “high-scoring” subgroups of entities based on user-defined pairwise similarity functions. In our experimental section we have applied the algorithms to a correlation of stock prices dataset.

8 More related work

Studies on group performance in both sports and education are also abundant. Group cohesion was shown to be positively correlated with the group’s perfor- mance and social density [11, 17]. “Fit” among teammates was also explored by Oliver et al [9] as a measure of team performance. While these studies corrobo- rate our “weakest link” hypothesis, they do not propose computational methods for efficient discovery of subgroups. In sports analysis, the structure of the player interaction network has been used to quantify individual and whole team per- formance [6, 12]. Similar to us, the authors of the latter hypothesize that strong connectivity is related to good performance. However, they focus on individual performance measured as centrality, as opposed to discovery of higher order core subgroups which is our goal. Quantifying the collective intelligence of groups, or merely understanding the dynamics of groups of individuals, is challenging. Woolley et al. [18] found evidence for a collective intelligence factor that was not strongly correlated to the average or maximum intelligence level of the members of the group, but rather to measures that affect the way they interact, such as their social sensitivity, distribution in turn-taking and proportion of females. Cebrian et al. [4] developed the notion of the collective potential of a population as the expected value of an individual over time. Olgun and Pentland [8] constructed Sociometric badges capable detecting human interactions, and used them to study well-performing teams, including the ability to predict group outcomes. 10 Petko Bogdanov et al.

References

[1] B. Adamcsek, G. Palla, I. J. Farkas, I. Derenyi, and T. Vicsek. CFinder: lo- cating cliques and overlapping modules in biological networks. Bioinformatics, 22(8):1021–1023, 2006. [2] S. Bandyopadhyay and M. Bhattacharyya. Mining the largest dense vertexlet in a weighted scale-free graph. Fundam. Inform., 96(1-2):1–25, 2009. [3] M. Brinkmeier, J. Werner, and S. Recknagel. Communities in graphs and hyper- graphs. In CIKM, 2007. [4] M. Cebri´an,M. Lahiri, N. Oliver, and A. Pentland. Measuring the collective potential of populations from dynamic social interaction data. J. Sel. Topics Signal Processing, 4(4):677–686, 2010. [5] M. Costanzo, A. Baryshnikova, J. Bellay, Y. Kim, E. Spear, C. Sevier, H. Ding, J. Koh, K. Toufighi, S. Mostafavi, et al. The genetic landscape of a cell. Science, 327(5964):425–431, 2010. [6] J. Duch, J. Waitzman, and L. Amaral. Quantifying the performance of individual players in a team activity. PloS one, 5(6):e10937, 2010. [7] T. Moore, R. Drost, P. Basu, R. Ramanathan, and A. Swami. Analyzing collabora- tion networks using simplicial complexes: A case study. In INFOCOM WKSHPS, pages 238–243, 2012. [8] D. Olgu´ınand A. Pentland. Assessing group performance from collective behavior. In Proc. of the CSCW, volume 10, 2010. [9] D. Oliver and M. Fienen. Importance of teammate fit: Frescoball example. J. of Quantitative Analysis in Sports, 5(1), January 2009. [10] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping com- munity structure of complex networks in nature and society. Nature, 435:814–818, 2005. [11] F. Peterson. Predicting group performance using cohesion and social network density: A comparative analysis. Master’s thesis, Air Force Inst. of Technology, 2007. [12] J. Piette, L. Pham, and S. Anand. Evaluating basketball player performance via statistical network modeling. In MIT Sloan Sports Analytics Conference, May 2011. [13] P. Pollner, G. Palla, and T. Vicsek. Parallel clustering with cfinder. Parallel Process. Lett., 2, 2012. [14] R Core Team. R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3-900051-07- 0. [15] R. Ramanathan, A. Bar-Noy, P. Basu, M. Johnson, W. Ren, A. Swami, and Q. Zhao. Beyond graphs: Capturing groups in networks. In INFOCOM WK- SHPS, 2011. [16] D. Sarkar. Lattice: Multivariate Data Visualization with R. Springer, New York, 2008. ISBN 978-0-387-75968-5. [17] M. Turner, A. Pratkanis, P. Probasco, and C. Leve. Threat, cohesion, and group effectiveness: Testing a social identity maintenance perspective on groupthink. Journal of Personality and Social Psychology, 63(5):781–796, 1992. [18] A. Woolley, C. Chabris, A. Pentland, N. Hashmi, and T. Malone. Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004):686–688, 2010.