<<

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)

Closeness for Networks with Overlapping Community Structure

Mateusz K. Tarkowski1 and Piotr Szczepanski´ 2 Talal Rahwan3 and Tomasz P. Michalak1,4 and Michael Wooldridge1 2Department of Computer Science, University of Oxford, United Kingdom 2 Warsaw University of Technology, Poland 3Masdar Institute of Science and Technology, United Arab Emirates 4Institute of Informatics, University of Warsaw, Poland

Abstract One aspect of networks that has been largely ignored in the literature on centrality is the fact that certain real-life Certain real-life networks have a community structure in which communities overlap. For example, a typical bus net- networks have a predefined community structure. In public work includes bus stops (nodes), which belong to one or more transportation networks, for example, bus stops are typically bus lines (communities) that often overlap. Clearly, it is im- grouped by the bus lines (or routes) that visit them. In coau- portant to take this information into account when measur- thorship networks, the various venues where authors pub- ing the centrality of a bus stop—how important it is to the lish can be interpreted as communities (Szczepanski,´ Micha- functioning of the network. For example, if a certain stop be- lak, and Wooldridge 2014). In social networks, individuals comes inaccessible, the impact will depend in part on the bus grouped by similar interests can be thought of as members lines that visit it. However, existing centrality measures do not of a community. Clearly for such networks, it is desirable take such information into account. Our aim is to bridge this to have a centrality measure that accounts for the prede- gap. We begin by developing a new game-theoretic solution fined community structure. Yet, to the best of our knowl- concept, which we call the Configuration semivalue, in order to have greater flexibility in modelling the community struc- edge, only one such measure has been developed to date ture compared to previous solution concepts from coopera- (Szczepanski,´ Michalak, and Wooldridge 2014), which ex- tive game theory. We then use the new concept as a building tends centrality to networks with community struc- block to construct the first extension of Closeness centrality ture. Despite this recent development, one important aspect to networks with community structure (overlapping or other- of real-life networks remains missing from existing central- wise). Despite the computational complexity inherited from ity measures: the ability to consider overlapping communi- the Configuration semivalue, we show that the corresponding ties. Take social networks, for example, where such overlaps extension of Closeness centrality can be computed in poly- are widespread due to the various affiliations and interests nomial time. We empirically evaluate this measure and our of the individuals involved (Kelley et al. 2012). Likewise, in algorithm that computes it by analysing the Warsaw public our example of transportation networks, a bus stop may be transportation network. on the route of multiple (i.e., overlapping) bus lines. If such a stop becomes inaccessible, then all the bus lines that visit Introduction it would no longer function properly. As such, the impor- One of the key problems in involves identi- tance of a bus stop clearly depends (at least partially) on the fying the most important (or central) nodes (Freeman 1979; importance of the bus lines to which it belongs. Dezso˝ and Barabasi´ 2002; Keinan et al. 2004; Page et al. In an attempt to define a centrality measure that accounts 1999). The four best-known centrality measures are De- for overlapping communities, we focused on game-theoretic gree, Betweenness, Closeness and Eigenvector centrality measures.1 The inspiration behind this line of re- (Bonacich 1972; Freeman 1979), each of which views cen- search comes from solution concepts in cooperative game trality from a different perspective, focusing on certain traits theory. In essence, given a set of players, a cooperative so- that make nodes important, or, “central,” to the function- lution concept typically defines a payoff for each player by ing of a network (Brandes and Erlebach 2005; Koschutzki comparing his or her contribution to the various groups of et al. 2005). Our focus in this paper is on Closeness cen- players (more on this in the next section). The rich repos- trality. This measure considers the important nodes to be itory of solution concepts has been extensively refined and those that are relatively close to all other nodes in the expanded over the past decades, making it an ideal toolkit network: the closer a node is to the others, the higher for quantifying the importance of individuals in a setting its centrality. Closeness centrality has many applications, where those individuals co-exist and operate in groups. In from coauthorship networks (Yan and Ding 2009), through the context of game-theoretic network centrality, the indi- tourism (Shih 2006), to social networks (Barabasi 2003; Karinthy 2006). 1See www.game-theoretic-centrality.com and Copyright c 2016, Association for the Advancement of Artificial www.network-centrality.com for an overview of this line Intelligence (www.aaai.org). All rights reserved. of research.

622 Solution Concept Degree Closeness Betweenness Game-Theoretic Concepts Shapley value 22 3 A cooperative game consists of a set of players N = Semivalues 44 3 {1, 2,...,n} and a characteristic function ν :2N → R 5 Owen value This Paper Open such that ν(∅)=0. This function assigns to each coali- 5 Coalitional semivalues This Paper Open tion of players its payoff (i.e., an indication of its perfor- Configuration value Open This Paper Open Configuration semivalues Open This Paper Open mance). We will henceforth refer to a game simply by ν.A coalition structure, CS = {Q1,Q2,...,Qm}, is a partition Table 1: The table outlines the papers that used various so- of N into disjoint coalitions. One of the key questions in lution concepts to extend Degree, Closeness, or Betweenness cooperative game theory is the following: Given a game ν centralities, and computed them in polynomial time. and a coalition structure CS that the players have formed, how do we divide the payoff of each coalition among its viduals correspond to the nodes of the network, while the members? In this context, assuming that CS = {N}, i.e., groups correspond to the subgraphs of the network. With assuming that the players have formed the grand coalition, this mapping, any solution concept from cooperative game Shapley (1953) proposed a solution concept—now known theory can be readily applied as a centrality measure, ex- as the Shapley value—to fairly divide the payoff from co- cept for one remaining obstacle: computing the solution con- operation among the players. Banzhaf (1965) proposed an- cept. In fact, most solution concepts are inherently hard to other solution concept—now known as the Banzhaf index— compute (Chalkiadakis, Elkind, and Wooldridge 2011). For- which is similar to the Shapley value except for a subtle tunately, however, in the context of network centrality, we difference in the way contributions are weighed. To gener- typically focus on a certain closed-form formula, specify- alize the aforementioned solution concepts, Weber (1979) ing how each group is evaluated. In certain cases, fixing proposed semivalues—a family of solution concepts that in- the group-evaluation function makes it possible to obtain cludes both the Shapley value and Banzhaf index. Formally, a polynomial-time algorithm that computes the resulting, let MC (C, i)=ν(C ∪{i}) − ν(C) be the marginal con- game-theoretic centrality measure. tribution of player i to coalition C ⊆ N \{i}. Denoting by With this in mind, we propose the first extension of Close- β(k) the probability that any player makes a marginal con- ness centrality to networks with either overlapping or non- tribution to a coalition of size k, the semivalue of i is: overlapping community structure. To this end, we propose |N|−1   four game-theoretic variants of Closeness centrality, three of k ψi(ν)= β(k)E MC (C ,i) , (1) which are based on existing solution concepts, namely: the k=0 Owen value (Owen 1977), the configuration value (Albizuri, k Aurrecoechea, and Zarzuelo 2006), and the coalitional semi- where C is a random variable over subsets of size k chosen value (Szczepanski,´ Michalak, and Wooldridge 2014). The from the set N \{i} with uniform probability, and E is the fourth and most general variant is based on a new solution expected value operator for this variable. The Shapley value concept proposed in this paper, which generalises the afore- and Banzhaf index are two semivalues, defined by β(k)= |N|−1 |N|−1 mentioned three concepts, and offers greater flexibility in 1/|N| and β(k)= k /2 , respectively. modelling the underlying community structure. We call it Importantly, all semivalues assume that CS = {N}, i.e., the Configuration semivalue. that the grand coalition is formed. To relax this assumption, Crucially, all of the aforementioned solution concepts are Owen (1977) introduced a solution concept—now known hard to compute given an arbitrary group-evaluation func- as the Owen value—that divides the payoff of any a pri- tion. Nevertheless, for the purpose of extending Closeness ori coalition structure CS. Now, when CS = {N} or centrality, we propose polynomial-time algorithms for com- CS = {{i}}i∈N , the Owen value is equivalent to the Shap- puting the corresponding game-theoretic extension. This re- ley value. As such, the Owen value is a generalization of the sult fills several gaps in the computational-complexity anal- Shapley value; one that does not generalize the β function ysis of game-theoretic centrality measures (see Table 1). (as semivalues do), but rather generalizes the assumed coali- Finally, to demonstrate the applicability of our approach, tion structure CS. Another step in this line of research was we apply it to the Warsaw public transportation network, taken by Szczepanski,´ Michalak, and Wooldridge (2014), identifying the most central stops and routes therein, from who proposed a generalisation combining both the Owen the perspective of Closeness centrality. value and semivalues; they called it coalitional semivalues. Formally, given a coalition structure, CS, and discrete prob- ability distributions: β : {0,...,|CS|−1}→[0, 1] and Basic Notation and Definitions αj : {0,...,|Qj|−1}→[0, 1] for all j ∈{1,...,|CS|}, coalitional semivalues are defined by: In the following two subsections, we introduce the relevant game-theoretic, and graph-theoretic concepts, respectively. |CS|−1 |Qj |−1    k l γi(ν, CS)= β(k) αj(l) E MC T ∪ C ,i 2 Michalak et al. (2013b) k=0 l=0 3 Szczepanski,´ Michalak, and Rahwan (2012, 2016) (2) 4 Szczepanski´ et al. (2015) where Qj is the coalition in CS that player i belongs to, k 5 Szczepanski,´ Michalak, and Wooldridge (2014) T is a random variable over subsets of size k chosen from

623 l (a) (c) CS \{Qj} with uniform probability, C is a random vari- v able over subsets of size l chosen from Qj \{i} with uni- v1 v3 1 v4 form probability, and E is the expected value operator. The coalitional semivalue is equivalent to the Owen value when (b) (d) v v β(k)=1/|CS| and αj(l)=1/|Qj|, ∀j ∈{1,...,|CS|}. 1 v5 1 v8

None of the solution concepts discussed thus far considers v3 v7 v9 v4 overlapping coalitions. To address this issue, Albizuri, Aur- v v v v recoechea, and Zarzuelo (2006) generalised the Owen value 2 6 2 10 to situations where the a priori coalition structure CS con- Figure 1: Sample networks. In (c) and (d), communities are tains overlapping coalitions; they called this generalisation highlighted by same-coloured edges. the Configuration value. Formally, it is defined as follows, where Ti = {j : j ∈ N and Qj ∈ CS and i ∈ Qj}: is to others, whereas the Shapley value-based variant evalu-        ates the role that a node plays in bringing other nodes closer χi(ν, CS)= λMC T ∪ C, i , together. To illustrate this difference, consider networks (a) T ⊆CS j∈Ti C⊆Qj and (b) from Figure 1. Here, according to harmonic Close- T ∩T =∅ i∈C i ness, v1 is relatively more important in (b) than in (a) since |T |!(|CS|−|T |−1)! |C|!(|Qj |−|C|−1)! where λ = (3) it is close to more nodes in (b) than in (a). In contrast, ac- |CS|! |Qj |! cording to the Shapley-value based Closeness, v1 is actually more important in (a), since the removal of v1 from (a) has a greater impact on the distances between the other nodes, Graph-Theoretic Concepts compared to the removal of v1 from (b). A network is a graph, G =(V,E), comprised of a set of We present in Table 2 nodes V = {v0,v1,...,vn−1} and a set of edges E ⊆ Harmonic SV-based the harmonic and Shap- V × V . A path is simply a chain of connected nodes. The ley value closeness rank- dist 1. v7, v9 v7, v9 ings for Figure 1 (d).6 between two nodes, s and t, denoted by (s, t) 2. v1,v2 v1,v2 is the length of the shortest path between the two (we as- 3. v5, v6, v8, v10 v3, v4 The Configuration value sume that dist(v, v)=0). Given a node v ∈ V and a set of 4. v3, v4 v5, v6, v8, v10 closeness ranking is as nodes C ⊆ V , we say that dist(C, v) is equal to the mini- follows: v9, v7, v1, v6, Table 2: Harmonic closeness mum distance between any node u ∈ C and v (this implies v3, v8, v5, v2, v10, v4. that if v ∈ C then dist(C, v)=0). and Shapley value rankings The configuration value Closeness centrality quantifies the importance of nodes for Figure 1 (d). makes use of community based on their average distance to other nodes (Freeman information, promoting nodes v9, v1, v6 and v3. This rank- ing is also more fine-grained (i.e., there are no ties), because 1979). In its most general form, it is formulated as follows: it draws upon community information, which is different for closeness(v)= f(dist(v, u)), most nodes in the example. u∈V where the function f : N → R determines how the distance Our Centrality Measures influences the centrality. When f(k)=k, we obtain the classical Closeness centrality, where the smaller the value, As stated earlier, no centrality measures to date can readily the more central the node. In this paper, we focus on an al- be applied to networks with overlapping community struc- ternative formulation, where f(k)=1/k and throughout the tures. An example is depicted in networks (c) and (d) from 1 Figure 1. Specifically, in network (c), nodes v3 and v4 are paper 0 =0. The resulting centrality is known as harmonic centrality (Boldi and Vigna 2013). With this modification, symmetric except that v3 belongs to a seemingly-important the greater the value, the more central the node (which is in community; one that connects the two parts of the network. line with most centrality measures). Arguably, when taking this additional information into con- Everett and Borgatti (1999) introduced group Closeness sideration, v3 should be considered more important than v4. centrality, which extends the notion of Closeness to groups Moving on to network (d), node v1 belongs to more com- munities than v2, and the communities of v1 seem to be of nodes as follows:  dist equally important, if not more important, than those of v2. νc(C)= f( (C, u)). (4) This could mean, for example, that more bus, tram or train u∈(V \C) routes visit (and rely on) the bus stop v1 than the bus stop v2, Building upon this formula, the first game-theoretic ex- implying that v1 should be more central once the underlying tension of Closeness centrality was introduced by (Micha- community structure is taken into consideration. lak et al. 2013b). In particular, the authors defined a game Configuration Semivalues: We now propose a family of in which the players are the nodes of the network, and the solution concepts, which we believe to be the most general characteristic function is νc. The centrality of each node was then determined using the Shapley value. The result- 6Ranking does not capture the subtlety of valuations. For exam- ing game-theoretic centrality measure is called the Shap- ple, harmonic closeness ranks v7 as slightly more important than ley value-based Closeness centrality. Roughly speaking, the v1, however Shapley value closeness ranks it as much more impor- harmonic Closeness centrality evaluates how close a node tant.

624 of its kind to date, as it not only allows for an arbitrary β, α Proof of Theorem 1: Starting from Equation (5), which and CS, but also allows for overlapping coalitions. We call defines the configuration semivalue, let us first replace the it Configuration semivalues, and define it as follows, where arbitrary characteristic function therein, i.e., ν, with that of k T is a random variable over subsets of size k of CS \Ti group Closeness centrality (defined in Equation 4). We get: and E is the expected value operator:  |CS|−1 |Qj |−1   |Q |−1 φv(ν, CS)= αj (l)β(k) |CS|−1  j      k l j∈T k=0 l=0 k l φi(ν, CS)= β(k) αj (l) E MC T ∪ C ,i v T ⊆CS\Tv C ⊆Qj \{v} |T k|=k |Cl|=l k=0 j∈Ti l=0 (5) f(dist( T k ∪ Cl ∪{v},u)) − f(dist( T k ∪ Cl,u)) u∈V    . In particular, given a community structure CS in which |CS|−1 |Qj |−1 k l communities do not overlap, this family of solution concepts is equivalent to coalitional semivalues.GivenCS = {N},it Although this may seem inconsequential, our next step is to rearrange the summation over u and bring it to the forefront: is equivalent to semivalues. Further restrictions on β and αj (as discussed in the preliminaries) lead to the Shapley value,   |CS|−1 |Qj |−1   the Banzhaf index, or the Owen value. Compared to the con- φv(ν, CS)= αj (l) u∈V j∈T k=0 l=0 k l figuration value, our configuration semivalues offer greater v T ⊆CS\Tv C ⊆Qj \{v} control over the contributions of players, due to a probabil- |T k|=k |Cl|=l ity distribution over the number of communities to which f(dist( T k ∪ Cl ∪{v},u)) − f(dist( T k ∪ Cl,u)) β(k)    . a player contributes and a distribution over the number of |CS|−1 |Qj |−1 nodes from his own community that he can contribute to. In k l applications such as countterrorism (Lindelauf, Hamers, and Next, for v ∈ Qj we will split the equation into: Husslage 2013; Michalak et al. 2013a), this can represent the   + expected size of an attack on a network or the number of tar- MC k,l(v, u, j)= αj (l)β(k) k l geted communities. T ⊆CS\Tv C ⊆Qj \{v} |T k|=k |Cl|=l Configuration Semivalue Community Index: Whereas to f(dist( T k ∪ Cl ∪{v},u)) measure the importance of a community using the Owen    , and (6) |CS|−1 |Qj |−1 value it suffices to sum up the power of the nodes comprising k l it, this is not the case for the Configuration value. In particu-   lar, the power of a node may be the result of its membership − MC k,l(v, u, j)= αj (l)β(k) to many communities. For this reason, the distinction must k l T ⊆CS\Tv C ⊆Qj \{v} be made as to which of the player’s marginal contributions |T k|=k |Cl|=l are made because of which community. To this end, we pro- f(dist( T k ∪ Cl,u)) pose the following measure of community strength:    , (7) |CS|−1 |Qj |−1 |CS|−1 |Qj |−1     k l  k l CPj (ν, CS)= β(k) αj (l) E MC T ∪ C ,i , k l with the additional constraint on T and C such that: i∈Qj k=0 l=0 k l k l l k dist( T ∪ C ,u)) = dist( T ∪ C ∪{v},u)). (8) where C ⊆ Qj \{i} and T ⊆ CS \ Qj are random vari- ables. Although an axiomatic characterisation of this com- We can now state the following: munity index is outside the scope of this paper, we mention   |CS|−1 |Qj |−1 that in the case of the Configuration value, the index is ef- CS ficient, and in the case of the Owen value, it is equal to the φv(ν, )= sum of the Owen values of the members of a community. u∈V j∈Tv k=0 l=0 + − Configuration Semivalue Closeness Centrality: Our ex- MC k,l(v, u, j) − MC k,l(v, u, j). (9) tension of closeness centrality (which accounts for overlap- The constraint in Equation (8) simply allows us to avoid ping and non-overlapping community structures) involves redundant computations (the contribution in the opposite using our Configuration semivalue (see Equation 5) with the case is trivially zero, since by entering such a coalition, v characteristic function for group Closeness centrality (see does not change the distance to u). The remainder of the Equation 4). More formally, it is: φv(νc,CS). proof will focus on computing Equations (6) and (7). We will first focus on Equation (6). Algorithms Note that due to the constraint in Equation (8), we can be l We show that any configuration semivalue of group Close- sure that f(dist( Q∈T k Q ∪ C ∪{v},u)) = f(dist(u, v)), ness centrality is computable in polynomial time (Theo- since by entering the coalition, v must have brought it closer rem 1), and obtain better time complexity for the configu- to u. Thus, it suffices to count the number of coalitions from l ration value (Theorem 2). Qj (of size l)—C —and coalitions of communities (of size k l Theorem 1 Any configuration semivalue of group Close- k)—T —such that dist( Q∈T k Q ∪ C ,u)) > dist(v, u), ness centrality for all nodes in a weighted graph G = since only then Equation (8) will hold. To do this, let us in- (V,E,ω) can be computed in O(|V |4|CS|) time. troduce the following notation:

625 • Com∼d(u)={Q : Q ∈ M and dist(Q, u) ∼ d)}; • j { ∈ dist ∼ } Sketch of Proof of Theorem 2: The key idea, is to use an Nod∼d(u)= s : s Cj and (s, u) d) , alternate formula for the configuration value: where ∼ will be one of <, >, ≤, ≥ or =. These sets are fairly    simple to precompute, and will allow us to count the number φv(ν, CS)= j∈Tv Π∈Π(CS) π∈Π(Q ) of required coalitions. In this particular case, we will use j Com>dist(u,v) to count the number of communities farther (dist(Π ∪ ∪{ } )) − (dist(Π ∪ )) u∈V f |Qj π|v v ,u f |Qj π|v,u j , from u than v. Similarly, Nod>dist(u,v) counts the number |CS|!|Qj |! of nodes in the community Qj that are farther from u than Com>dist(u,v) where Π(X) refers to a permutation of the set X, and the distance from u to v. Altogether, there are k Nodj  Π|x is the set of elements preceding x in the permu- >dist(u,v) coalitions of communities and l coalitions from tation Π. Next, we group computations for both k and MC + MC + Qj satisfying the requirements. Let d = dist(u, v). Finally: l as follows: (v, u, j)= k,l k,l(v, u, j) and − − j MC (v, u, j, d)= k,l MC k,l(v, u, j, d). By counting the + Nod>d Com>d MCk,l(v, u, j)= f(dist(v, u)). permutations that satisfy the constraints from Theorem 1, we l k reach the following: |CS| | | As for Equation (7), we divide computations as follows: MC + f(d)( )!( Qj )!  (v, u, j)= j , and − − Nod≤d(u)Com≤d(u) MC k,l(v, u, j)= MC k,l(v, u, j, d), where d MC −(v, u, j, d)=f(d)     − MC (v, u, j, d)= αj(l)β(k) (|CS|)!(|Qj|)! (|CS|)!(|Qj|)! k,l − .  k l j j T ⊆CS\Tv C ⊆Qj \{v} Nodd, we have the following: computes marginal Configuration Semivalue Closeness  j  contributions by mov- − Com≥d(u) Nod≥d(u) ing backwards (largest 24 MC k,l(v, u, j, d)= k l to smallest) through 20   j the possible distances, 16 Com>d(u) Nod>d(u) − . and also uses dy- 12 k l Time (s) namic programming. 8

To explain this, we first compute the number of coalitions of The data computed 4 communities of size k that are at distance d or farther from thus-far is held in the 0 u. We do the same for coalitions of nodes from Qj.How- variable prev val and 0 50 100 150 200 250 Graph size ever, we must take away the number of occurrences when no accumulated in φv. l k Figure 2: Empirical running nodes from C and communities from T are at distance d s merge(a, b) is a times for Algorithms 2 and 3. from u, in order to satisfy the constraint from Equation (10). function that takes − Computing all MC k,l(v, u, j) variables is the most time- two sorted sets (descending order) and returns a sorted set, consuming, as it takes O(|V |4|CS|) time. This may be such that any items in b that are smaller than the smallest − item in a are not included. Finally, the array CP must be counter-intuitive, since computing all MC k,l(v, u, j, d) vari- | |5|CS| mentioned, which collects the power of communities as a ables would imply O( V ), but dynamic programming whole. If a marginal contribution is made by a node through eliminates this need. Precomputations are implemented in Qj CPj  community , then is incremented by the value of the Algorithm 1 and marginal contributions in Algorithm 2. contribution. Theorem 2 The configuration value of group Closeness We conclude by noting that the configuration value is a centrality for all nodes in a weighted graph G =(V,E,ω) generalisation of the Shapley value. Algorithm 3 can com- can be computed in O(|V |2[log(|V |)+|CS|]+|V ||E|) time. pute the Shapley value-based Closeness centrality (Michalak

626 et al. 2013b) with the same time-complexity as the best cur- Algorithm 1: Precomputations for Configuration Semi- rently known algorithm (computing distances between all value Closeness. nodes limits computation time). Empirical running times are G =(V,E,ω) presented in Figure 2. input : Graph , Closeness function f : R → R, Overlapping Community Structure CS, Probability distribution functions rank Harmonic SV-based CV-based β :0, 1,...|V |−1 → R, ∀ | |− → R 1. Swietokszyska´ Politechnika Centrum 0≤j≤|CS|−1αj :0, 1,... Qj 1 2. Ratusz PKP Falenica Dw.Wilenski´ output: Configuration Semivalue 3. Ron.Starzynskiego´ Erazma z Zakr. PKP Falenica 1 dist[V,V ] ← Johnson(G, ω); 4. Dw.Wilenski´ PKP Rados´c´ Swietokszyska´ 2 for v ∈ V do 3 COM[v] ←∅; Table 3: Top four hubs in the Warsaw transportation net- work, according to different Closeness measures. 4 for v ∈ V do 5 φv ← 0; 6 distances[v] ← empty ordered set; node Harmonic SV CV lines 7 c dists[v] ← empty ordered set; Centrum 234 0.23 4.96 68 8 for Qj ∈ CS do Swietokszyska´ 235 0.50 4.22 29 9 CP[j] ← 0; com dist[j][v] ←∞; Politechnika 224 1.04 3.07 17 10 g dists[j][v] ← empty ordered set; PKP Falenica 108 1.00 4.91 8 11 for u ∈ Qj do PKP Rados´c´ 122 0.80 3.45 8 12 COM[u] ← COM[u] ∪ Qj; Table 4: Properties of hubs in Warsaw. 13 com dist[j][v] ← min(dist(v, u), com dist[j][v]); 14 distances[v] ← distances[v] ∪u, dist[u, v]; Warsaw Public Transportation Network 15 c dists[v] ← c dists[v] ∪Qj, com dist[j][v]; Algorithms 1, 2 and 3 were implemented in Java. The 16 sort desc(c dists[v]); sort desc(distances[v]); Configuration-based Closeness centrality (i.e., CV-based ∈ Closeness centrality) was used to analyse the Warsaw public 17 for u V do  ∈ transportation network.7 Edge-weights were defined as the 18 for v, d distances[u] do ∈ average travel times between nodes. In total, the weighted 19 for Qj COM[v] do ← ∪  network consists of 1425 nodes, 2135 edges and 380 com- 20 g dists[j][u] g dists[j][u] v, d ; munities formed by bus lines, trams, and underground and suburbia trains. We note that the ranking of nodes according 21 for u ∈ V do to CV-based Closeness differs significantly from the Har- 22 m1 ← largest distance in c dists[u]; prev ← m1; monic Closeness and Shapley value-based (i.e., SV-based) 23 Com>prev(u) ← 0; Com≥prev(u) ← 0; Closeness rankings. We present the four most prominent 24 for Qj,d∈c dists[u] do nodes according to these centralities in Table 3. 25 m2 ← largest distance in g dists[j][u]; Configuration-based Closeness ranks Centrum (or “city 26 if m2 >m1 then ← ← center”) as most important. This result is intuitive and makes 27 Com>m2(u) 0; Com≥m2(u) 0; sense, since this stop is a large hub in downtown War- 28 if d = prev then saw, where passengers can choose from 68 bus, tram and 29 Com>d(u) ← Com≥prev(u); train connections, and one metro line (see Table 4). The top 30 Com≥d ← Com≥prev(u); prev = d; ranked nodes according to the other centrality measures are 31 Com≥d(u) ← Com≥d(u)+1; also near the city center, but provide less connections that 32 prevnod ← largest distance in distances[u]; are not as important as those in the city center. j ← j ← The most surprising rankings are for PKP Falenica and 33 Nod>prevnod(u) 0; Nod≥prevnod(u) 0; PKP Rados´c´, which are railway stations far from downtown 34 for v, dnod∈distances[u] do Warsaw. They are important because—for certain source- 35 if dnod = prevnod then j j destination pairs—it is almost impossible to find routes that 36 Nod>dnod(u) ← Nod≥prevnod(u); omit these stations. Additionally, trains play an important j j 37 Nod≥dnod(u) ← Nod≥prevnod(u); role in shortening the travel time between distant nodes, 38 prevnod = dnod; since they are the fastest means of transportation. Interestingly, the fifth node according to CV-based Close- 39 if Qj ∈ COM(v) then j j ness, Płowiecka, is not in the top ten according to the other 40 Nod≥dnod(u) ← Nod≥dnod(u)+1; measures. Harmonic Closeness misses the fact that this stop is not easily replaced. Shapley value-based Closeness misses 41 for u ∈ V do 42 remove repeated distances(c dists[u]); 7The data, experiment results and programs can be downloaded from https://github.com/szczep/gtna.

627 Algorithm 2: Efficient Algorithm for Configuration Algorithm 3: Efficient Algorithm for Configuration Semivalue Closeness (continued from Algorithm 1). Value Closeness (continued from Algorithm 1). 1 for u ∈ V do 1 for u ∈ V do 2 for Qj ∈ CS,k ∈ [0, |CS|),l ∈ [0, |Qj |) do 2 for Qj ∈ CS do − − 3 prev d ←−1; prev val ← 0; MC ← 0; 3 prev d ←−1; prev val ← 0; MC ← 0; 4 for x, d∈s merge(g dists[j][u],c dists[u]) do 4 for x, d∈s merge(g dists[j][u],c dists[u]) do 5 if x ∈ V and prev d = −1 then 5 if x ∈ V and prev d = −1 then 6 Com≥d(u) ← Com≥prev d(u); 6 Com≤d(u) ← Com≤prev d(u); 7 Com>d(u) ← Com>prev d(u); 7 Comd(u) >d ( ) k l f d ; (Nodj (u))(Com (u)) ; ≤d ≤d

11 if x ∈ V then 11 if x ∈ V then  Nodj (u) + f(d) + Com>d(u) >d 12 MC ← 12 MC ← f(d) ; (Nodj (u))(Com (u)) ; k l ≤d ≤d MC +−MC − + − 13 φx ← φx + β(k)αj (l) |Q |−1 ; 13 φx ← φx + MC − MC ; (|CS|−1)( j ) k l + − 14 j ← j + MC − MC MC +−MC − CP CP ; 14 CPj ← CPj + β(k)αj (l) ; (|CS|−1)(|Qj |−1) k l

We conducted an experiment on a YouTube social net- that the bus lines that stop at Płowiecka are very impor- work with ground-truth communities (Mislove et al. 2007) tant. Even if the stop is omitted, a traveller must often in order to see the impact of overlapping communities on centrality. We chose the 80 first communities from a list of still use bus lines that visit Płowiecka for further travel. As 8 for communities—unsurprisingly—long routes that bring in the 5000 most important ones and studied the sub-network commuters from all of Warsaw (including the metro line consisting of these communities. Figure 6 shows the four M1) are ranked as most important. most important nodes. Harmonic centrality indicates that node 5274 is topologically most central, whereas SV-based Harmonic Closeness centrality indicates that it also brings other nodes closer to- centrality prioritises the Line Community Index gether. The configuration value ranking promotes node 5, topologically most cen- since it belongs to important communities, and it is im- tral nodes. The SV-based 401 4.35 portant for bringing nodes within these communities closer centrality considers a new ZS1 3.68 together. Moreover, it brings other communities closer to- dimension, promoting irre- 409 3.61 M1 3.51 gether, which is imperative for inter-community information placeable nodes. We go one transfer. step further and provide a Table 5: Important lines. new CV-based measure that promotes hubs with powerful connections, while taking into account the previous two Conclusions and Future Work considerations. Importantly, only our measure is able to We have developed a general solution concept, namely the detect the most central and popular hub in Warsaw. Configuration semivalue, that encompasses both coalitional semivalues (Szczepanski,´ Michalak, and Wooldridge 2014) Information Diffusion in Social Networks and the configuration value (Albizuri, Aurrecoechea, and Zarzuelo 2006). We have used this value in order to de- Borgatti (2006) advocated velop the first network centrality measure that accounts for the use of group closeness rank Harmonic SV CV an overlapping community structure. We based our central- centrality for information ity on the notion of Closeness, developed polynomial-time diffusion. However, this 1. 5274 5274 5 2. 311 35879 823 algorithms for its computation and used it to analyse the approach does not yield a 3. 404 12367 72 Warsaw public transportation network. This research also comprehensive ranking of 4. 727 27977 11287 fills a gap in the complexity analysis of game-theoretic cen- nodes, is computationally trality measures. An interesting direction for future work is intractable and does not Table 6: Top four hubs in to complete the missing entries in Table 1. Finally, although account for communities. the Youtube subnetwork. the configuration value has been axiomatised, configuration Recently, Lin et al. (2015) semivalues in general and community indices need further noted that communities are important for information dif- study. fusion, since intra-community diffusion is much faster than inter-community diffusion. 8Available at https://snap.stanford.edu/data/com-Youtube.html

628 Acknowledgements Lin, S.; Hu, Q.; Wang, G.; and Yu, P. 2015. Understand- Tomasz Michalak and Michael Wooldridge were sup- ing community effects on information diffusion. In Cao, T.; ported by the European Research Council under Ad- Lim, E.-P.; Zhou, Z.-H.; Ho, T.-B.; Cheung, D.; and Mo- vanced Grant 291528 (“RACE”). This work was also sup- toda, H., eds., Advances in Knowledge Discovery and Data ported by the Polish National Science Centre grant DEC- Mining, volume 9077 of Lecture Notes in Computer Science. 2013/09/D/ST6/03920. Springer International Publishing. 82–95. Lindelauf, R.; Hamers, H.; and Husslage, B. 2013. Cooper- References ative game theoretic centrality analysis of terrorist networks: The cases of jemaah islamiyah and al qaeda. EUR J OPER Albizuri, M.; Aurrecoechea, J.; and Zarzuelo, J. 2006. Con- RES 229(1):230–238. figuration values: Extensions of the coalitional owen value. Michalak, T. P.; Rahwan, T.; Szczepanski,´ P. L.; Skibski, GAME ECON BEHAV 57(1):1 – 17. O.; Narayanam, R.; Wooldridge, M. J.; and Jennings, N. R. Banzhaf, J. F. 1965. Weighted Voting Doesn’t Work: A 2013a. Computational analysis of connectivity games with Mathematical Analysis. RUTGERS LAW REV 19:317–343. applications to the investigation of terrorist networks. IJ- Barabasi, A.-L. 2003. Linked: How Everything Is Connected CAI’13. to Everything Else and What It Means. Plume. Michalak, T.; Aaditha, K. V.; Szczepanski,´ P. L.; Ravindran, Boldi, P., and Vigna, S. 2013. Axioms for centrality. CoRR B.; and Jennings, N. R. 2013b. Efficient computation of the abs/1308.2140. shapley value for game-theoretic network centrality. JAIR 46:607–650. Bonacich, P. 1972. Factoring and weighting approaches to status scores and identification. J MATH SOCIOL Mislove, A.; Marcon, M.; Gummadi, K. P.; Druschel, P.; and 2(1):113–120. Bhattacharjee, B. 2007. Measurement and Analysis of On- line Social Networks. In IMC’07. Borgatti, S. 2006. Identifying sets of key players in a . COMPUT MQTH ORGAN TH 12(1):21–34. Owen, G. 1977. Values of games with a priori unions. In Henn, R., and Moeschlin, O., eds., Mathematical Eco- Brandes, U., and Erlebach, T. 2005. Network Analysis: nomics and Game Theory, volume 141 of Lecture Notes in Methodological Foundations (Lecture Notes in Computer Economics and Mathematical Systems. Springer Berlin Hei- Science). Secaucus, NJ, USA: Springer-Verlag New York, delberg. 76–88. Inc. Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1999. Chalkiadakis, G.; Elkind, E.; and Wooldridge, M. 2011. The pagerank citation ranking: Bringing order to the web. Computational aspects of cooperative game theory. Mor- Technical Report 1999-66, Stanford InfoLab. gan & Claypool Publishers. Shapley, L. S. 1953. A value for n-person games. In Kuhn, Dezso,˝ Z., and Barabasi,´ A.-L. 2002. Halting viruses in H., and Tucker, A., eds., In Contributions to the Theory of scale-free networks. PHYS REV E 65:055103. Games, volume II. Princeton University Press. 307–317. Everett, M. G., and Borgatti, S. P. 1999. The centrality of Shih, H.-Y. 2006. Network characteristics of drive tourism groups and classes. J MATH SOCIOL 23(3):181–201. destinations: An application of network analysis in tourism. Freeman, L. 1979. Centrality in social networks: Conceptual TOURISM MANAGE 27(5):1029 – 1039. clarification. SOC NETWORKS 1(3):215–239. Szczepanski,´ P. L.; Tarkowski, M. K.; Michalak, T. P.; Har- Johnson, D. B. 1977. Efficient algorithms for shortest paths renstein, P.; and Wooldridge, M. 2015. Efficient computa- in sparse networks. JACM24(1):1–13. tion of semivalues for game-theoretic network centrality. In AAAI’15, 461–469. Karinthy, F. 2006. Chain-links. In M. Newman, A.-L. B., and Watts, D., eds., The Structure and Dynamics of Net- Szczepanski,´ P. L.; Michalak, T.; and Rahwan, T. 2012. A works. Princeton University Press. 21–26. new approach to based on the shap- ley value. In AAMAS’12, 239–246. Keinan, A.; Hilgetag, C. C.; Meilijson, I.; and Ruppin, E. 2004. Casual localization of neural function: the shapley Szczepanski,´ P. L.; Michalak, T. P.; and Rahwan, T. 2016. value method. NEUROCOMPUTING 58-60(0):215–222. Efficient algorithms for game-theoretic betweenness central- ity. ARTIFICIAL INTELLIGENCE 231:39 – 63. Kelley, S.; Goldberg, M.; Magdon-Ismail, M.; Mertsalov, Szczepanski,´ P. L.; Michalak, T. P.; and Wooldridge, M. K.; and Wallace, A. 2012. Defining and discovering com- 2014. A centrality measure for networks with community munities in social networks. In Thai, M. T., and Parda- structure based on a generalization of the owen value. In los, P. M., eds., Handbook of Optimization in Complex Net- ECAI. works, volume 57 of Springer Optimization and Its Applica- tions. Springer US. 139–168. Weber, R. J. 1979. Subjectivity in the Valuation of Games. Cowles Foundation Discussion Papers 515, Cowles Founda- Koschutzki, D.; Lehmann, K.; Peeters, L.; Richter, S.; tion for Research in Economics, Yale University. Tenfelde-Podehl, D.; and Zlotowski, O. 2005. Centrality in- dices. In Brandes, U., and Erlebach, T., eds., Network Anal- Yan, E., and Ding, Y. 2009. Applying centrality measures ysis, volume 3418 of Lecture Notes in Computer Science. to impact analysis: A coauthorship network analysis. JAM Springer Berlin Heidelberg. 16–61. SOC INF SCI TEC 60(10):2107–2118.

629