Hiding in Social Networks

Total Page:16

File Type:pdf, Size:1020Kb

Hiding in Social Networks University of Warsaw Faculty of Mathematics, Informatics and Mechanics Marcin Waniek Hiding in Social Networks PhD dissertation Supervisor dr hab. Piotr Faliszewski Department of Computer Science AGH University of Science and Technology Auxiliary Supervisor dr Tomasz Michalak Institute of Informatics University of Warsaw March 2017 Author’s declaration: I hereby declare that this dissertation is my own work. March 15, 2017 . Marcin Waniek Supervisor’s declaration: The dissertation is ready to be reviewed. March 15, 2017 . dr hab. Piotr Faliszewski Auxiliary Supervisor’s declaration: The dissertation is ready to be reviewed. March 15, 2017 . dr Tomasz Michalak Abstract The Internet and social media have fuelled enormous interest in social network analysis. New tools continue to be developed and used to analyse our personal connections. This raises privacy concerns that are likely to exacerbate in the future. With this in mind, we ask the question: Can individuals or groups actively manage their connections to evade social network analysis tools? By addressing this question, the general public may better protect their privacy, oppressed activist groups may better conceal their existence, and security agencies may better understand how terrorists escape detection. In this dissertation we consider hiding from three different types of social network analysis tools. First, we study how both an individual and a group of nodes can evade analysis utilizing centrality measures, without compromising ability to participate in net- work’s activities. In the second study, we investigate how a community can avoid being identified by a community detection algorithm as a closely cooperating group of nodes. In the third study, we analyse how a presence of a particular edge in a network can be hidden from link prediction algorithms. For considered problems we analyse their computational complexity and prove that they are usually NP-hard. However, we also provide polynomial-time heuristic solutions that turn out to be effective in practice. We test our algorithms on a number of real-life and artificially generated network datasets. Keywords: Social networks; Hiding in networks; Centrality measures; Influence mod- els; Community detection; Link prediction; Computational complexity; Heuristic solu- tions. ACM classification: Security and privacy → Social network security and privacy. Theory of computation → Social networks. Theory of computation → Problems, reductions and completeness. Streszczenie Internet oraz media społecznościowe spowodowały ogromny wzrost zainteresowania meto- dami analizy sieci społecznych. Coraz bardziej zaawansowane narzędzia służą do analizy naszych powiązań z innymi ludźmi. Rodzi to poważne obawy związane z prywatnością. Mając to na uwadze, rozważamy następujące pytanie: Czy członek lub grupa członków sieci społecznej może aktywnie zarządzać swoimi połączeniami tak, aby uniknąć wykrycia przez narzędzia analizy sieci społecznych? Odpowiedź na to pytanie pozwoliłaby użyt- kownikom Internetu lepiej chronić swoją prywatność, grupom aktywistów lepiej ukrywać swoją działalność, a agencjom bezpieczeństwa lepiej rozumieć w jaki sposób organizacje terrorystyczne i kryminalne mogą unikać wykrycia. W tej pracy rozważamy ukrywanie się przed trzema różnymi narzędziami analizy sieci społecznych. Po pierwsze, badamy jak pojedynczy węzeł lub ich grupa może uniknąć wy- krycia przez miary centralności (ang. centrality measures), wciąż pozostając zdolnym do brania udziału w działalności sieci. Po drugie, analizujemy jak grupa węzłów może uniknąć zidentyfikowania przez algorytmy wykrywania społeczności (ang. community detection al- gorithms). Po trzecie wreszcie, badamy jak można ukryć istnienie określonej krawędzi w sieci przed algorytmami przewidywania połączeń (ang. link prediction algorithms). Analizujemy złożoność obliczeniową rozważanych zagadnień oraz udowadniamy, że większość z nich to problemy NP-trudne. Tym niemniej prezentujemy również wielomia- nowe rozwiązania heurystyczne, które okazują się efektywne w praktyce. Nasze algorytmy testujemy na szeregu różnych sieci, tak prawdziwych, jak i wygenerowanych losowo. Słowa kluczowe: Sieci społeczne; Ukrywanie się w sieciach; Miary centralności; Modele wpływu; Wykrywanie społeczności; Przewidywanie połączeń; Złożoność oblicze- niowa; Rozwiązania heurystyczne. Klasyfikacja ACM: Bezpieczeństwo i prywatność → Bezpieczeństwo i pry- watność w sieciach społecznych. Teoria obliczeń → Sieci społeczne. Teoria ob- liczeń → Preoblemy, redukcje i zupełność. Acknowledgments I would like to express my deepest gratitude to Tomasz Michalak, my mentor and supervisor, who taught me almost all I know about research (with the rest coming from other people mentioned on this page). I would not be where I am today without him. I am thankful to Piotr Faliszewski, who helped me put together this dissertation and greatly improved the quality of my writing. I am grateful to Talal Rahwan, discussions with whom influenced many parts of this dissertation and whose hospitality made far away country feel like home. I would also like to thank all my other co-authors in this and other lines of research: Michael Wooldridge, Long Tran-Thanh, Agata Nieścieruk, Nicholas Jennings, Marcin Bielecki, Joanna Tyrowicz and Krzysztof Makarski. I have learned a lot from all of them. Finally, I would like to give thanks to my family and my friends, especially to my mom, my dad, and my brother Paweł. Finishing this dissertation would not be possible without their constant love and support. My work was supported by the Polish National Science Centre grant Hiding in Social Networks, no. 2015/17/N/ST6/03686. Contents 1 Introduction 4 1.1 Motivation . .4 1.2 Organization and Results . .6 1.3 Related Work . .7 1.3.1 Link Recommendation . .7 1.3.2 Sensitivity Analysis . .8 1.3.3 Dark Networks Analysis . .9 1.4 Publications . 11 2 Preliminaries 13 2.1 Basic Network Notation . 13 2.2 Datasets . 14 2.3 Computational Complexity . 16 2.3.1 The Turing Machine . 16 2.3.2 Complexity Classes and Reduction Process . 17 2.3.3 NP-Complete Problems . 18 3 Disguising Centrality of a Node 19 3.1 Introduction . 19 3.2 Preliminaries . 20 3.2.1 Centrality Measures . 21 3.2.2 Models of Influence . 21 3.3 Problem Definitions . 23 3.4 Complexity Analysis . 24 3.5 Heuristic Solution . 33 3.5.1 The ROAM heuristic . 33 3.5.2 Configuring the ROAM Heuristic . 35 3.5.3 Experimental Results . 35 3.6 Concluding Remarks . 38 4 Hiding Leaders 41 4.1 Introduction . 41 4.2 Problem Definition . 43 4.3 Complexity Analysis . 44 4.4 Constructing a Network . 54 4.5 Concluding Remarks . 58 2 5 Hiding Communities 61 5.1 Introduction . 61 5.2 Preliminaries . 62 5.3 Minimizing Modularity . 63 5.4 Measure of Concealment . 67 5.5 Heuristic Solution . 68 5.5.1 The DICE Heuristic . 68 5.5.2 Experimental Results . 69 5.6 Concluding Remarks . 71 6 Evading Link Prediction 74 6.1 Introduction . 74 6.2 Preliminaries . 75 6.3 Problem Definition . 77 6.4 Complexity Analysis . 78 6.5 Heuristic Solution . 83 6.5.1 The Effects of Adding or Removing an Edge . 84 6.5.2 The OTC Heuristic . 85 6.5.3 The CTR Heuristic . 86 6.5.4 Experimental Design . 87 6.5.5 Simulation Results . 88 6.6 Concluding Remarks . 91 7 Conclusions 93 Bibliography 97 A ROAM Simulation Results 109 B DICE Simulation Results 121 C Remainder of the Proof of Lemma 1 126 D An Efficient Implementation of the OTC algorithm 132 E Priority Queue Implementation of the CTR algorithm 134 F OTC and CTR Simulation Results 135 3 Chapter 1 Introduction In this chapter we introduce the issue of hiding in social networks and we outline the organization of this dissertation. Next, we present the body of related literature. Finally, we list our publications that contain elements of this dissertation. 1.1 Motivation Recent years have lead to an increased interest in the analysis of social networks [131]. Applications of this wide body of research vary from organizing massive viral-marketing campaigns [31], through the analysis of outbreaks of infectious diseases [56], all the way up to fighting global criminal and terrorist organizations [98, 141]. Such a variety of problems gave rise to various social network analysis tools and techniques. Examples of prominent types of tools include centrality measures, community detection algorithms, and link prediction algorithms. Centrality measures [12] are tools to identify key nodes in the network, being it the leader of a terrorist cell [80], the financial institution responsible for the financial crisis [82] or the most important airport in the world-wide air transportation structure [60]. Community detection algorithms [51] can extract communities of closely cooperating individuals from the network data, e.g., associates of a known criminal [49], climate indices related to seasonal rainfall [14] or protein complexes in the interaction diagram [112]. Link prediction algorithms [57] can asses probabilities of existence of hidden or not-yet-formed edges, either to detect missing connections between mafia members [15], to discover the interactions between proteins in biological networks [26], or to recommend to customer a product that she might be interested in buying [37, 88]. Clearly, many situations, where social network analysis tools are being applied can be described as adversarial settings. In other words, members of the networks that are being analysed may be interested in falsifying the results of such analysis, e.g., by modifying the structure of their social connections. At the same time, however, nearly all
Recommended publications
  • A Geometric Approach to Betweenness
    A GEOMETRIC APPROACH TO BETWEENNESS y z BENNY CHOR AND MADHU SUDAN Abstract An input to the betweenness problem contains m constraints over n real variables p oints Each constraint consists of three p oints where one of the p oint is sp ecied to lie inside the interval dened by the other two The order of the other two p oints ie which one is the largest and which one is the smallest is not sp ecied This problem comes up in questions related to physical mapping in molecular biology In Opatrny showed that the problem of deciding whether the n p oints can b e totally ordered while satisfying the m b etweenness constraints is NPcomplete Furthermore the problem is MAX SNP complete and for every nding a total order that satises at least of the m constraints is NPhard even if all the constraints are satisable It is easy to nd an ordering of the p oints that satises of the m constraints eg by cho osing the ordering at random This pap er presents a p olynomial time algorithm that either determines that there is no feasible solution or nds a total order that satises at least of the m constraints The algorithm translates the problem into a set of quadratic inequalities and solves a semidenite relaxation of n them in R The n solution p oints are then pro jected on a random line through the origin The claimed p erformance guarantee is shown using simple geometric prop erties of the SDP solution Key words Approximation algorithm Semidenite programming NPcompleteness Compu tational biology AMS sub ject classications Q Q Intro duction An input
    [Show full text]
  • Approximating Cumulative Pebbling Cost Is Unique Games Hard
    Approximating Cumulative Pebbling Cost Is Unique Games Hard Jeremiah Blocki Department of Computer Science, Purdue University, West Lafayette, IN, USA https://www.cs.purdue.edu/homes/jblocki [email protected] Seunghoon Lee Department of Computer Science, Purdue University, West Lafayette, IN, USA https://www.cs.purdue.edu/homes/lee2856 [email protected] Samson Zhou School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA https://samsonzhou.github.io/ [email protected] Abstract P The cumulative pebbling complexity of a directed acyclic graph G is defined as cc(G) = minP i |Pi|, where the minimum is taken over all legal (parallel) black pebblings of G and |Pi| denotes the number of pebbles on the graph during round i. Intuitively, cc(G) captures the amortized Space-Time complexity of pebbling m copies of G in parallel. The cumulative pebbling complexity of a graph G is of particular interest in the field of cryptography as cc(G) is tightly related to the amortized Area-Time complexity of the Data-Independent Memory-Hard Function (iMHF) fG,H [7] defined using a constant indegree directed acyclic graph (DAG) G and a random oracle H(·). A secure iMHF should have amortized Space-Time complexity as high as possible, e.g., to deter brute-force password attacker who wants to find x such that fG,H (x) = h. Thus, to analyze the (in)security of a candidate iMHF fG,H , it is crucial to estimate the value cc(G) but currently, upper and lower bounds for leading iMHF candidates differ by several orders of magnitude. Blocki and Zhou recently showed that it is NP-Hard to compute cc(G), but their techniques do not even rule out an efficient (1 + ε)-approximation algorithm for any constant ε > 0.
    [Show full text]
  • Improving the Betweenness Centrality of a Node by Adding Links
    X Improving the Betweenness Centrality of a Node by Adding Links ELISABETTA BERGAMINI, Karlsruhe Institute of Technology, Germany PIERLUIGI CRESCENZI, University of Florence, Italy GIANLORENZO D’ANGELO, Gran Sasso Science Institute (GSSI), Italy HENNING MEYERHENKE, Institute of Computer Science, University of Cologne, Germany LORENZO SEVERINI, ISI Foundation, Italy YLLKA VELAJ, University of Chieti-Pescara, Italy Betweenness is a well-known centrality measure that ranks the nodes according to their participation in the shortest paths of a network. In several scenarios, having a high betweenness can have a positive impact on the node itself. Hence, in this paper we consider the problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. In particular, we study the problem of maximizing the betweenness score of a given node – Maximum Betweenness Improvement (MBI) – and that of maximizing the ranking of a given node – Maximum Ranking Improvement (MRI). We show 1 that MBI cannot be approximated in polynomial-time within a factor ¹1 − 2e º and that MRI does not admit any polynomial-time constant factor approximation algorithm, both unless P = NP. We then propose a simple greedy approximation algorithm for MBI with an almost tight approximation ratio and we test its performance on several real-world networks. We experimentally show that our algorithm highly increases both the betweenness score and the ranking of a given node and that it outperforms several competitive baselines. To speed up the computation of our greedy algorithm, we also propose a new dynamic algorithm for updating the betweenness of one node after an edge insertion, which might be of independent interest.
    [Show full text]
  • Variations on an Ordering Theme with Constraints
    Variations on an Ordering Theme with Constraints Walter Guttmann and Markus Maucher Fakult¨at fur¨ Informatik, Universit¨at Ulm, 89069 Ulm, Germany [email protected] · [email protected] Abstract. We investigate the problem of finding a total order of a finite set that satisfies various local ordering constraints. Depending on the admitted constraints, we provide an efficient algorithm or prove NP-completeness. We discuss several generalisations and systematically classify the problems. Key words: total ordering, NP-completeness, computational complex- ity, betweenness, cyclic ordering, topological sorting 1 Introduction An instance of the betweenness problem is given by a finite set A and a collection C of triples from A, with the task to decide if there is a total order < of A such that for each (a; b; c) 2 C, either a < b < c or c < b < a [1, problem MS1]. The betweenness problem is NP-complete [2]. Applications arise, for example, in the design of circuits and in computational biology [2, 3]. Similarly, the cyclic ordering problem asks for a total order < of A such that for each (a; b; c) 2 C, either a < b < c or b < c < a or c < a < b [1, prob- lem MS2]. The cyclic ordering problem, too, is NP-complete [4]. Applications arise, for example, in qualitative spatial reasoning [5]. On the other hand, if a < b < c or a < c < b is allowed, the problem can be solved with linear time complexity by topological sorting [6, Sect. 2.2.3]. Yet another choice, namely c < a or c < b, is needed to model an object- relational mapping problem described in Sect.
    [Show full text]
  • Comparing the Speed and Accuracy of Approaches to Betweenness Centrality Approximation
    Matta et al. Comput Soc Netw (2019) 6:2 https://doi.org/10.1186/s40649-019-0062-5 RESEARCH Open Access Comparing the speed and accuracy of approaches to betweenness centrality approximation John Matta1* , Gunes Ercal1 and Koushik Sinha2 *Correspondence: [email protected] Abstract 1 Southern Illinois University Background: Many algorithms require doing a large number of betweenness central- Edwardsville, Edwardsville, IL, USA ity calculations quickly, and accommodating this need is an active open research area. Full list of author information There are many diferent ideas and approaches to speeding up these calculations, and is available at the end of the it is difcult to know which approach will work best in practical situations. article Methods: The current study attempts to judge performance of betweenness central- ity approximation algorithms by running them under conditions that practitioners are likely to experience. For several approaches to approximation, we run two tests, clustering and immunization, on identical hardware, along with a process to determine appropriate parameters. This allows an across-the-board comparison of techniques based on the dimensions of speed and accuracy of results. Results: Overall, the speed of betweenness centrality can be reduced several orders of magnitude by using approximation algorithms. We fnd that the speeds of individual algorithms can vary widely based on input parameters. The fastest algorithms utilize parallelism, either in the form of multi-core processing or GPUs. Interestingly, getting fast results does not require an expensive GPU. Conclusions: The methodology presented here can guide the selection of a between- ness centrality approximation algorithm depending on a practitioner’s needs and can also be used to compare new methods to existing ones.
    [Show full text]
  • Label Cover Reductions for Unconditional Approximation Hardness of Constraint Satisfaction
    Label Cover Reductions for Unconditional Approximation Hardness of Constraint Satisfaction CENNY WENNER Doctoral Thesis Stockholm, Sweden 2014 Stockholm University, ISBN 978-91-7447-997-3 SE-10691 Stockholm Akademisk avhandling som med tillstånd av Stockholms Universitet framlägges till offentlig granskning för avläggande av Filosfiedoktorsexamen i Datalogi Tors- dagen den 21:e Oktober, 2014, Klockan 13.15 i Föreläsningssal F3, Sing Sing, Kungliga Tekniska Högskolan, Lindstedtsvägen 26, Stockholm. © Cenny Wenner, October, 2014 Tryck: Universitetsservice US AB iii Abstract Combinatorial optimization problems include such common tasks as finding the quickest route to work, scheduling a maximum number of jobs to specialists, and placing bus stops so as to minimize commuter times. We restrict our attention to problems in which one is given a collection of constraints on variables with the objective of finding an assignment satisfy- ing as many constraints as possible, also known as Constraint Satisfaction Problems (CSPs). With only a few exceptions, most CSPs are NP-hard to solve optimally and we instead consider multiplicative-factor approxima- tions. In particular, a solution is a factor-c approximation if the number of constraints satisfied by the solution is at least c times the optimal number. The contributions of this thesis consist of new results demonstrating the approximation limits of CSPs in various settings. The first paper studies the hardness of ordering CSPs. In these prob- lems, one is given constraints on how items should be ordered relative to one another with the objective of ordering the given items from first to last so as to satisfy as many constraints as possible.
    [Show full text]
  • Scaling Betweenness Approximation to Billions of Edges by MPI-Based Adaptive Sampling
    Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling Alexander van der Grinten Henning Meyerhenke Department of Computer Science Department of Computer Science Humboldt-Universitat¨ zu Berlin Humboldt-Universitat¨ zu Berlin Berlin, Germany Berlin, Germany [email protected] [email protected] Abstract—Betweenness centrality is one of the most popular today. Moreover, recent theoretical results suggest that it vertex centrality measures in network analysis. Hence, many is unlikely that a subcubic algorithm (w. r. t. jV j) exists to (sequential and parallel) algorithms to compute or approximate compute betweenness exactly on arbitrary graphs [1]. Thus, betweenness have been devised. Recent algorithmic advances have made it possible to approximate betweenness very efficiently for large graphs beyond, say, 5M vertices and 100M edges, on shared-memory architectures. Yet, the best shared-memory exact betweenness algorithms can be seen as hardly practical, algorithms can still take hours of running time for large graphs, not even parallel and distributed ones (see Section II). especially for graphs with a high diameter or when a small For betweenness approximation, however, fast practical relative error is required. algorithms exist. The fastest known betweenness approxi- In this work, we present an MPI-based generalization of the state-of-the-art shared-memory algorithm for betweenness mation algorithm is the KADABRA algorithm by Borassi approximation. This algorithm is based on adaptive sampling; and Natale [7]. This algorithm uses sampling to obtain a our parallelization strategy can be applied in the same manner to probabilistic guarantee on the quality of the approximation: adaptive sampling algorithms for other problems.
    [Show full text]
  • Constrained Ordering
    Constrained Ordering Walter Guttmann Markus Maucher Fakult¨at f¨urInformatik, Universit¨atUlm, 2005.12.01 Abstract We investigate the problem of finding a total order of a finite set that satisfies various local ordering constraints. Depending on the admitted constraints, we provide an efficient algorithm or prove NP-completeness. To this end, we define a reduction technique and discuss its properties. Key words: total ordering, NP-completeness, computational complex- ity, betweenness, cyclic ordering, topological sorting. 1 Introduction An instance of the betweenness problem is given by a finite set A and a collection C of triples from A, and one has to decide if there is a total order < of A such that for each (a, b, c) ∈ C, either a < b < c or c < b < a [GJ79, problem MS1]. The betweenness problem is NP-complete [Opa79]. Applications arise, for example, in the design of circuits and in computational biology [Opa79, CS98]. Similarly, the cyclic ordering problem asks for a total order < of A such that for each (a, b, c) ∈ C, either a < b < c or b < c < a or c < a < b [GJ79, problem MS2]. The cyclic ordering problem, too, is NP-complete [GM77]. Ap- plications arise, for example, in qualitative spatial reasoning [IC00]. On the other hand, if a < b < c or a < c < b is allowed, the problem can be solved with linear time complexity by topological sorting [Knu97, Section 2.2.3]. Yet another choice, namely c < a or c < b, is needed to model an object- relational mapping problem described in Section 2. The reader is invited to think about the time complexity of this problem before reading the solution.
    [Show full text]
  • UC Berkeley UC Berkeley Electronic Theses and Dissertations
    UC Berkeley UC Berkeley Electronic Theses and Dissertations Title On the Power of Lasserre SDP Hierarchy Permalink https://escholarship.org/uc/item/1c16b6c5 Author Tan, Ning Publication Date 2015 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California On the Power of Lasserre SDP Hierarchy by Ning Tan A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Prasad Raghavendra, Chair Professor Satish Rao Professor Nikhil Srivastava Fall 2015 On the Power of Lasserre SDP Hierarchy Copyright 2015 by Ning Tan 1 Abstract On the Power of Lasserre SDP Hierarchy by Ning Tan Doctor of Philosophy in Computer Science University of California, Berkeley Professor Prasad Raghavendra, Chair Constraint Satisfaction Problems (CSPs) are a class of fundamental combi- natorial optimization problems that have been extensively studied in the field of approximation algorithms and hardness of approximation. In a ground break- ing result, Raghavendra showed that assuming Unique Games Conjecture, the best polynomial time approximation algorithm for all Max CSPs are given by a family of basic standard SDP relaxations. With Unique Games Conjecture remains as one of the most important open question in the field of Theoretical Computer Science, it is natural to ask whether hypothetically stronger SDP re- laxations would be able to achieve better approximation ratio for Max CSPs and their variants. In this work, we study the power of Lasserre/Sum-of-Squares SDP Hierar- chy. First part of this work focuses on using Lasserre/Sum-of-Squares SDP Hi- erarchy to achieve better approximation ratio for certain CSPs with global cardi- nality constraints.
    [Show full text]
  • Approximating Cumulative Pebbling Cost Is Unique Games Hard
    Approximating Cumulative Pebbling Cost is Unique Games Hard Jeremiah Blocki Department of Computer Science, Purdue University, West Lafayette, IN, USA https://www.cs.purdue.edu/homes/jblocki [email protected] Seunghoon Lee Department of Computer Science, Purdue University, West Lafayette, IN, USA https://www.cs.purdue.edu/homes/lee2856 [email protected] Samson Zhou School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA https://samsonzhou.github.io/ [email protected] Abstract P The cumulative pebbling complexity of a directed acyclic graph G is defined as cc(G) = minP i |Pi|, where the minimum is taken over all legal (parallel) black pebblings of G and |Pi| denotes the number of pebbles on the graph during round i. Intuitively, cc(G) captures the amortized Space-Time complexity of pebbling m copies of G in parallel. The cumulative pebbling complexity of a graph G is of particular interest in the field of cryptography as cc(G) is tightly related to the amortized Area-Time complexity of the Data-Independent Memory-Hard Function (iMHF) fG,H [7] defined using a constant indegree directed acyclic graph (DAG) G and a random oracle H(·). A secure iMHF should have amortized Space-Time complexity as high as possible, e.g., to deter brute-force password attacker who wants to find x such that fG,H (x) = h. Thus, to analyze the (in)security of a candidate iMHF fG,H , it is crucial to estimate the value cc(G) but currently, upper and lower bounds for leading iMHF candidates differ by several orders of magnitude. Blocki and Zhou recently showed that it is NP-Hard to compute cc(G), but their techniques do not even rule out an efficient (1 + ε)-approximation algorithm for any constant ε > 0.
    [Show full text]
  • Almost Linear-Time Algorithms for Adaptive Betweenness Centrality Using Hypergraph Sketches
    Almost Linear-Time Algorithms for Adaptive Betweenness Centrality using Hypergraph Sketches Yuichi Yoshida National Institute of Informatics, and Preferred Infrastructure, Inc. [email protected] ABSTRACT have been used to define its centrality, such as its degree, Betweenness centrality measures the importance of a vertex distance to other vertices, and its eigenvalue [4]. by quantifying the number of times it acts as a midpoint of In this paper, we consider centralities based on shortest the shortest paths between other vertices. This measure is paths. The most famous such centrality is the (shortest- widely used in network analysis. In many applications, we path) betweenness centrality, which quantifies the number of wish to choose the k vertices with the maximum adaptive times a vertex acts as a midpoint of the shortest paths be- betweenness centrality, which is the betweenness centrality tween other vertices [12]. Betweenness centrality has been without considering the shortest paths that have been taken extensively used in network analysis, such as for measuring into account by already-chosen vertices. lethality in biological networks [17, 10], studying sexual net- All previous methods are designed to compute the be- works and AIDS [20], and identifying key actors in terrorist tweenness centrality in a fixed graph. Thus, to solve the networks [18,9]. Betweenness centrality is also used as a above task, we have to run these methods k times. In this primary routine for clustering and community identification paper, we present a method that directly solves the task, in real-world networks [23, 15]. with an almost linear runtime no matter how large the value We also study the (shortest-path) coverage centrality, which of k.
    [Show full text]
  • Measuring Network Centrality Using Hypergraphs
    Measuring Network Centrality Using Hypergraphs Sanjukta Roy Balaraman Ravindran Chennai Mathematical Institute Indian Institute of Technology Madras [email protected] [email protected] ABSTRACT real- world systems. For example, if we model a collabora- Networks abstracted as graph lose some information related tion network as a simple graph, we would have to reduce the to the super-dyadic relation among the nodes. We find nat- interaction into a set of two way interactions, and if there ural occurrence of hyperedges in co-authorship, co-citation, are multiple actors collaborating on the same project, the social networks, e-mail networks, weblog networks etc. Treat- representation will be a clique on all those actors. As we ing these networks as hypergraph preserves the super-dyadic will show, we lose information about the set of actors col- relations. But the primal graph or Gaifmann graph associ- laborating on the same project in the clique construction. A ated with hypergraphs converts each hyperedge to a clique natural way to model these interaction is through a hyper- losing again the n-ary relationship among nodes. We aim to graph. In a simple graph, an edge is represented by a pair measure Shapley Value based centrality on these networks of vertices, whereas an hyperedge is a non-empty subset of without losing the super-dyadic information. For this pur- vertices. pose, we use co-operative games on single graph representa- Multi-way interactions, known as super-dyadic relations, tion of a hypergraph such that Shapley value can be com- characterize many real world applications and there has been puted efficiently[1].
    [Show full text]