Modeling and Measuring Graph Similarity: the Case for Centrality Distance Matthieu Roy, Stefan Schmid, Gilles Trédan
Total Page:16
File Type:pdf, Size:1020Kb
Modeling and Measuring Graph Similarity: The Case for Centrality Distance Matthieu Roy, Stefan Schmid, Gilles Trédan To cite this version: Matthieu Roy, Stefan Schmid, Gilles Trédan. Modeling and Measuring Graph Similarity: The Case for Centrality Distance. FOMC 2014, 10th ACM International Workshop on Foundations of Mobile Computing, Aug 2014, Philadelphia, United States. pp.53. hal-01010901 HAL Id: hal-01010901 https://hal.archives-ouvertes.fr/hal-01010901 Submitted on 20 Jun 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modeling and Measuring Graph Similarity: The Case for Centrality Distance Matthieu Roy1,2, Stefan Schmid1,3,4, Gilles Tredan1,2 1 CNRS, LAAS, 7 avenue du colonel Roche, F-31400 Toulouse, France 2 Univ de Toulouse, LAAS, F-31400 Toulouse, France 3 Univ de Toulouse, INP, LAAS, F-31400 Toulouse, France 4 TU Berlin & T-Labs, Berlin, Germany ABSTRACT graph edit distance d from a given graph G, are very diverse The study of the topological structure of complex networks and seemingly unrelated: the characteristic structure of G has fascinated researchers for several decades, and today is lost. we have a fairly good understanding of the types and reoc- A good similarity measure can have many important ap- curring characteristics of many different complex networks. plications. For instance, a similarity measure can be a fun- However, surprisingly little is known today about models to damental tool for the study of dynamic networks, answering compare complex graphs, and quantitatively measure their questions like: Do these two complex networks have a com- similarity. mon ancestor? Or: What is a likely successor network for This paper proposes a natural similarity measure for com- a given network? While the topological properties of com- plex networks: centrality distance, the difference between plex networks have fascinated researchers for many decades, two graphs with respect to a given node centrality. Central- (e.g., their connectivity [1, 14], their constituting motifs [13], ity distances allow to take into account the specific roles of their clustering [18] or community patterns [3]), today, we the different nodes in the network, and have many interest- do not have a good understanding of their dynamics over ing applications. As a case study, we consider the closeness time. centrality in more detail, and show that closeness central- Our Contributions. This paper initiates the study of ity distance can be used to effectively distinguish between graph similarity measures for complex networks which go randomly generated and actual evolutionary paths of two beyond simple graph edit distances. In particular, we intro- dynamic social networks. duce the notion of centrality distance dC (G1,G2), a graph similarity measure based on a node centrality C. We argue that centrality-based distances are attractive Categories and Subject Descriptors similarity measures as they are naturally node-oriented. J.4 [Computer Applications]: Social and Behavioral Sci- This stands in contrast to, e.g., classic graph isomorphism ences based measures which apply only to anonymous graphs; in the context of dynamic complex networks, nodes typically General Terms do represent real objects and are not anonymous! We observe that the classic graph edit distance can be Algorithms seen as a special case of centrality distance: the graph edit distance is equivalent to the centrality distance where C is Keywords simply the degree centrality, henceforth referred to as the Complex Networks; Graph Similarity; Centrality; Dynam- degree distance dDC. We then discuss alternative centrality ics; Link Prediction distances and, as a case study, explore the closeness distance dCC (based on closeness centrality) in more detail. In particular, we show that closeness distance has interest- 1. INTRODUCTION ing applications in the domain of dynamic network predic- How similar are two graphs G1 and G2? Surprisingly, tion. As a proof-of-concept, we consider two dynamic social today, we do not have good measures to answer this ques- networks: (1) An evolving network representing the human tion. In graph theory, the canonical measure to compare mobility during a cocktail party, and (2) a Facebook-like two graphs is the Graph Edit Distance (GED) [8]: infor- Online Social Network (OSN) evolving over time. We show mally, the GED dGED(G1,G2) between two graphs G1 and that actual evolutionary paths are far from being random G2 is defined as the minimal number of graph edit operations from the perspective of closeness centrality distance, in the that are needed to transform G1 into G2. The specific set of sense that the distance variation along evolutionary paths is allowed graph edit operations depends on the context, but low. This can be exploited to distinguish between fake and typically includes some sort of node and link insertions and actual evolutionary paths with high probability. deletions. Examples. To motivate the need for graph similarity While graph edit distance metrics play an important role measures, let us consider two simple examples. in computer graphics and are widely applied to pattern anal- ysis and recognition, we argue that the graph edit distance Example 1.1 (Local/Global Scenario). We is not well-suited for measuring similarities between natu- consider three graphs G1, G2, G3 over five nodes ral and complex networks. The set of graphs at a certain {v1,v2,...,v5}: G1 is a line, where vi and vi+1 are connected in a modulo manner; G2 is a cycle, i.e., G1 with Centralities are a common way to characterize complex an additional link {v1,v5}; and G3 is G1 with an additional networks and their vertices. Frequently studied centralities link {2, 4}. include the degree centrality (DC), the betweenness central- ity (BC) and the closeness centrality (CC), among many In this example, we first observe that G and G have 2 3 more. A node is DC-central if it has many edges: the degree the same graph edit distance to G : d (G ,G ) = 1 GED 1 2 centrality is simply the node degree; a node is BC-central if d (G ,G ) = 1, as they contain one additional edge. GED 1 3 it is on many shortest paths: the betweenness centrality is However, in a social network context, one would intuitively the number of shortest paths going through the node; and expect G to be closer to G than G . For example, in a 3 1 2 a node is CC-central if it is close to many other nodes: the friendship network a short-range “triadic closure” [10] link closeness centrality measures the inverse of the distances to may be more likely to emerge than a long-range link: friends all other nodes. Formally: of friends may be more likely to become friends themselves in the future. Moreover, more local changes are also ex- 1. Degree Centrality: For any node v ∈ V (G) of a network pected in mobile environments (e.g., under bounded human G, let Γ(v) be the set of neighbors of node v: Γ(v) = mobility and speed). As we will see, the centrality distance {w ∈ V s.t. {v,w} ∈ E}. The degree centrality DC of concept introduced in this paper can capture such differ- a node v ∈ V is defined as: DC(G,v)= |Γ(v)|. ences. Example 1.2 (Evolution Scenario). As a second 2. Betweenness Centrality: For any pair (v,w) ∈ E(G), artificial and very simple example, in this paper we will con- let σ(v,w) be the total number of different shortest paths between v and w, and let σx(v,w) be the number sider two graphs G1 and G2, where G1 is a line topology of shortest paths between v and w that pass through and G2 is a “shell network”, shown in Figure 1. We ask the question: what is the most likely evolutionary path that x ∈ V . The betweenness centrality BC of a node v ∈ V is defined as: BC(G,v) = σv(x,w)/σ(x,w). would lead from the G1 topology to G2? Px,w∈V As a slight variation from the classic definition, we as- Note that the graph edit distance does not provide us with sume that a node is on its own shortest path: ∀v,w ∈ 2 any information about the likely evolutionary paths from V ,σv(v,w)/σ(v,w) = 1. We adopt the convention: G1 to G2, i.e., on the order of the edge insertions: there are ∀v ∈ V,σv(v,v)/σ(v,v) = 0. The reason of this varia- many possible orders in which the missing links can be added tion will become clear in the next section. to G1, and these orders do not differ in any way. In reality, however, we often have some expectations on how a graph 3. Closeness Centrality: The closeness centrality CC may have evolved between the two given snapshots G1 and of a node v ∈ V is defined as: CC(G,v) = G . For example, applying the triadic closure principle to −d(v,w) 2 Pw∈V \v 2 . our example, we would expect that the missing links are in- troduced one-by-one, from left to right. A similar evolution By convention, we define the centrality of a node with no may also be predicted by a temporal preferential attachment edges to be 0.