Networkx Reference Release 1.9

Total Page:16

File Type:pdf, Size:1020Kb

Networkx Reference Release 1.9 NetworkX Reference Release 1.9 Aric Hagberg, Dan Schult, Pieter Swart June 21, 2014 CONTENTS 1 Overview 1 1.1 Who uses NetworkX?..........................................1 1.2 Goals...................................................1 1.3 The Python programming language...................................1 1.4 Free software...............................................2 1.5 History..................................................2 2 Introduction 3 2.1 NetworkX Basics.............................................3 2.2 Nodes and Edges.............................................4 3 Graph types 9 3.1 Which graph class should I use?.....................................9 3.2 Basic graph types.............................................9 4 Algorithms 127 4.1 Approximation.............................................. 127 4.2 Assortativity............................................... 132 4.3 Bipartite................................................. 141 4.4 Blockmodeling.............................................. 161 4.5 Boundary................................................. 162 4.6 Centrality................................................. 163 4.7 Chordal.................................................. 184 4.8 Clique.................................................. 187 4.9 Clustering................................................ 190 4.10 Communities............................................... 193 4.11 Components............................................... 194 4.12 Connectivity............................................... 208 4.13 Cores................................................... 225 4.14 Cycles.................................................. 229 4.15 Directed Acyclic Graphs......................................... 231 4.16 Distance Measures............................................ 234 4.17 Distance-Regular Graphs......................................... 236 4.18 Eulerian.................................................. 237 4.19 Flows................................................... 239 4.20 Graphical degree sequence........................................ 263 4.21 Hierarchy................................................. 267 4.22 Isolates.................................................. 268 4.23 Isomorphism............................................... 269 i 4.24 Link Analysis............................................... 282 4.25 Link Prediction.............................................. 289 4.26 Matching................................................. 296 4.27 Maximal independent set......................................... 297 4.28 Minimum Spanning Tree......................................... 298 4.29 Operators................................................. 299 4.30 Rich Club................................................. 307 4.31 Shortest Paths.............................................. 308 4.32 Simple Paths............................................... 326 4.33 Swap................................................... 327 4.34 Traversal................................................. 328 4.35 Tree.................................................... 335 4.36 Vitality.................................................. 337 5 Functions 339 5.1 Graph................................................... 339 5.2 Nodes................................................... 341 5.3 Edges................................................... 342 5.4 Attributes................................................. 343 5.5 Freezing graph structure......................................... 345 6 Graph generators 347 6.1 Atlas................................................... 347 6.2 Classic.................................................. 347 6.3 Small................................................... 352 6.4 Random Graphs............................................. 356 6.5 Degree Sequence............................................. 365 6.6 Random Clustered............................................ 371 6.7 Directed................................................. 372 6.8 Geometric................................................ 375 6.9 Hybrid.................................................. 379 6.10 Bipartite................................................. 379 6.11 Line Graph................................................ 383 6.12 Ego Graph................................................ 385 6.13 Stochastic................................................. 386 6.14 Intersection................................................ 386 6.15 Social Networks............................................. 388 7 Linear algebra 389 7.1 Graph Matrix............................................... 389 7.2 Laplacian Matrix............................................. 391 7.3 Spectrum................................................. 393 7.4 Algebraic Connectivity.......................................... 394 7.5 Attribute Matrices............................................ 397 8 Converting to and from other data formats 403 8.1 To NetworkX Graph........................................... 403 8.2 Dictionaries................................................ 404 8.3 Lists................................................... 405 8.4 Numpy.................................................. 406 8.5 Scipy................................................... 411 9 Reading and writing graphs 415 9.1 Adjacency List.............................................. 415 9.2 Multiline Adjacency List......................................... 419 ii 9.3 Edge List................................................. 422 9.4 GEXF................................................... 429 9.5 GML................................................... 431 9.6 Pickle................................................... 434 9.7 GraphML................................................. 435 9.8 JSON................................................... 437 9.9 LEDA................................................... 442 9.10 YAML.................................................. 443 9.11 SparseGraph6.............................................. 444 9.12 Pajek................................................... 450 9.13 GIS Shapefile............................................... 452 10 Drawing 455 10.1 Matplotlib................................................ 455 10.2 Graphviz AGraph (dot).......................................... 464 10.3 Graphviz with pydot........................................... 467 10.4 Graph Layout............................................... 469 11 Exceptions 475 12 Utilities 477 12.1 Helper Functions............................................. 477 12.2 Data Structures and Algorithms..................................... 478 12.3 Random Sequence Generators...................................... 478 12.4 Decorators................................................ 481 12.5 Cuthill-Mckee Ordering......................................... 482 12.6 Context Managers............................................ 484 13 License 487 14 Citing 489 15 Credits 491 16 Glossary 493 Bibliography 495 Python Module Index 505 Index 507 iii iv CHAPTER ONE OVERVIEW NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks. With NetworkX you can load and store networks in standard and nonstandard data formats, generate many types of random and classic networks, analyze network structure, build network models, design new network algorithms, draw networks, and much more. 1.1 Who uses NetworkX? The potential audience for NetworkX includes mathematicians, physicists, biologists, computer scientists, and social scientists. Good reviews of the state-of-the-art in the science of complex networks are presented in Albert and Barabási [BA02], Newman [Newman03], and Dorogovtsev and Mendes [DM03]. See also the classic texts [Bollobas01], [Diestel97] and [West01] for graph theoretic results and terminology. For basic graph algorithms, we recommend the texts of Sedgewick, e.g. [Sedgewick01] and [Sedgewick02] and the survey of Brandes and Erlebach [BE05]. 1.2 Goals NetworkX is intended to provide • tools for the study of the structure and dynamics of social, biological, and infrastructure networks, • a standard programming interface and graph implementation that is suitable for many applications, • a rapid development environment for collaborative, multidisciplinary projects, • an interface to existing numerical algorithms and code written in C, C++, and FORTRAN, • the ability to painlessly slurp in large nonstandard data sets. 1.3 The Python programming language Python is a powerful programming language that allows simple and flexible representations of networks, and clear and concise expressions of network algorithms (and other algorithms too). Python has a vibrant and growing ecosystem of packages that NetworkX uses to provide more features such as numerical linear algebra and drawing. In addition Python is also an excellent “glue” language for putting together pieces of software from other languages which allows reuse of legacy code and engineering of high-performance algorithms [Langtangen04]. Equally important, Python is free, well-supported, and a joy to use. 1 NetworkX Reference, Release 1.9 In order to make the most out of NetworkX you will want to know how to write basic programs in Python. Among the many guides to Python, we recommend the documentation at http://www.python.org and the text by Alex Martelli [Martelli03]. 1.4 Free software NetworkX is free software; you
Recommended publications
  • Networkx: Network Analysis with Python
    NetworkX: Network Analysis with Python Salvatore Scellato Full tutorial presented at the XXX SunBelt Conference “NetworkX introduction: Hacking social networks using the Python programming language” by Aric Hagberg & Drew Conway Outline 1. Introduction to NetworkX 2. Getting started with Python and NetworkX 3. Basic network analysis 4. Writing your own code 5. You are ready for your project! 1. Introduction to NetworkX. Introduction to NetworkX - network analysis Vast amounts of network data are being generated and collected • Sociology: web pages, mobile phones, social networks • Technology: Internet routers, vehicular flows, power grids How can we analyze this networks? Introduction to NetworkX - Python awesomeness Introduction to NetworkX “Python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.” • Data structures for representing many types of networks, or graphs • Nodes can be any (hashable) Python object, edges can contain arbitrary data • Flexibility ideal for representing networks found in many different fields • Easy to install on multiple platforms • Online up-to-date documentation • First public release in April 2005 Introduction to NetworkX - design requirements • Tool to study the structure and dynamics of social, biological, and infrastructure networks • Ease-of-use and rapid development in a collaborative, multidisciplinary environment • Easy to learn, easy to teach • Open-source tool base that can easily grow in a multidisciplinary environment with non-expert users
    [Show full text]
  • 1 Hamiltonian Path
    6.S078 Fine-Grained Algorithms and Complexity MIT Lecture 17: Algorithms for Finding Long Paths (Part 1) November 2, 2020 In this lecture and the next, we will introduce a number of algorithmic techniques used in exponential-time and FPT algorithms, through the lens of one parametric problem: Definition 0.1 (k-Path) Given a directed graph G = (V; E) and parameter k, is there a simple path1 in G of length ≥ k? Already for this simple-to-state problem, there are quite a few radically different approaches to solving it faster; we will show you some of them. We’ll see algorithms for the case of k = n (Hamiltonian Path) and then we’ll turn to “parameterizing” these algorithms so they work for all k. A number of papers in bioinformatics have used quick algorithms for k-Path and related problems to analyze various networks that arise in biology (some references are [SIKS05, ADH+08, YLRS+09]). In the following, we always denote the number of vertices jV j in our given graph G = (V; E) by n, and the number of edges jEj by m. We often associate the set of vertices V with the set [n] := f1; : : : ; ng. 1 Hamiltonian Path Before discussing k-Path, it will be useful to first discuss algorithms for the famous NP-complete Hamiltonian path problem, which is the special case where k = n. Essentially all algorithms we discuss here can be adapted to obtain algorithms for k-Path! The naive algorithm for Hamiltonian Path takes time about n! = 2Θ(n log n) to try all possible permutations of the nodes (which can also be adapted to get an O?(k!)-time algorithm for k-Path, as we’ll see).
    [Show full text]
  • Basic Models and Questions in Statistical Network Analysis
    Basic models and questions in statistical network analysis Mikl´osZ. R´acz ∗ with S´ebastienBubeck y September 12, 2016 Abstract Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as P´olya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more. Outline: • Lecture 1: A primer on exact recovery in the general stochastic block model. • Lecture 2: Estimating the dimension of a random geometric graph on a high-dimensional sphere. • Lecture 3: Introduction to entropic central limit theorems and a proof of the fundamental limits of dimension estimation in random geometric graphs. • Lectures 4 & 5: Confidence sets for the root in uniform and preferential attachment trees. Acknowledgements These notes were prepared for a minicourse presented at University of Washington during June 6{10, 2016, and at the XX Brazilian School of Probability held at the S~aoCarlos campus of Universidade de S~aoPaulo during July 4{9, 2016. We thank the organizers of the Brazilian School of Probability, Paulo Faria da Veiga, Roberto Imbuzeiro Oliveira, Leandro Pimentel, and Luiz Renato Fontes, for inviting us to present a minicourse on this topic. We also thank Sham Kakade, Anna Karlin, and Marina Meila for help with organizing at University of Washington.
    [Show full text]
  • Networkx Reference Release 1.9.1
    NetworkX Reference Release 1.9.1 Aric Hagberg, Dan Schult, Pieter Swart September 20, 2014 CONTENTS 1 Overview 1 1.1 Who uses NetworkX?..........................................1 1.2 Goals...................................................1 1.3 The Python programming language...................................1 1.4 Free software...............................................2 1.5 History..................................................2 2 Introduction 3 2.1 NetworkX Basics.............................................3 2.2 Nodes and Edges.............................................4 3 Graph types 9 3.1 Which graph class should I use?.....................................9 3.2 Basic graph types.............................................9 4 Algorithms 127 4.1 Approximation.............................................. 127 4.2 Assortativity............................................... 132 4.3 Bipartite................................................. 141 4.4 Blockmodeling.............................................. 161 4.5 Boundary................................................. 162 4.6 Centrality................................................. 163 4.7 Chordal.................................................. 184 4.8 Clique.................................................. 187 4.9 Clustering................................................ 190 4.10 Communities............................................... 193 4.11 Components............................................... 194 4.12 Connectivity..............................................
    [Show full text]
  • Latent Distance Estimation for Random Geometric Graphs ∗
    Latent Distance Estimation for Random Geometric Graphs ∗ Ernesto Araya Valdivia Laboratoire de Math´ematiquesd'Orsay (LMO) Universit´eParis-Sud 91405 Orsay Cedex France Yohann De Castro Ecole des Ponts ParisTech-CERMICS 6 et 8 avenue Blaise Pascal, Cit´eDescartes Champs sur Marne, 77455 Marne la Vall´ee,Cedex 2 France Abstract: Random geometric graphs are a popular choice for a latent points generative model for networks. Their definition is based on a sample of n points X1;X2; ··· ;Xn on d−1 the Euclidean sphere S which represents the latent positions of nodes of the network. The connection probabilities between the nodes are determined by an unknown function (referred to as the \link" function) evaluated at the distance between the latent points. We introduce a spectral estimator of the pairwise distance between latent points and we prove that its rate of convergence is the same as the nonparametric estimation of a d−1 function on S , up to a logarithmic factor. In addition, we provide an efficient spectral algorithm to compute this estimator without any knowledge on the nonparametric link function. As a byproduct, our method can also consistently estimate the dimension d of the latent space. MSC 2010 subject classifications: Primary 68Q32; secondary 60F99, 68T01. Keywords and phrases: Graphon model, Random Geometric Graph, Latent distances estimation, Latent position graph, Spectral methods. 1. Introduction Random geometric graph (RGG) models have received attention lately as alternative to some simpler yet unrealistic models as the ubiquitous Erd¨os-R´enyi model [11]. They are generative latent point models for graphs, where it is assumed that each node has associated a latent d point in a metric space (usually the Euclidean unit sphere or the unit cube in R ) and the connection probability between two nodes depends on the position of their associated latent points.
    [Show full text]
  • K-Path Centrality: a New Centrality Measure in Social Networks
    Centrality Metrics in Social Network Analysis K-path: A New Centrality Metric Experiments Summary K-Path Centrality: A New Centrality Measure in Social Networks Adriana Iamnitchi University of South Florida joint work with Tharaka Alahakoon, Rahul Tripathi, Nicolas Kourtellis and Ramanuja Simha Adriana Iamnitchi K-Path Centrality: A New Centrality Measure in Social Networks 1 of 23 Centrality Metrics in Social Network Analysis Centrality Metrics Overview K-path: A New Centrality Metric Betweenness Centrality Experiments Applications Summary Computing Betweenness Centrality Centrality Metrics in Social Network Analysis Betweenness Centrality - how much a node controls the flow between any other two nodes Closeness Centrality - the extent a node is near all other nodes Degree Centrality - the number of ties to other nodes Eigenvector Centrality - the relative importance of a node Adriana Iamnitchi K-Path Centrality: A New Centrality Measure in Social Networks 2 of 23 Centrality Metrics in Social Network Analysis Centrality Metrics Overview K-path: A New Centrality Metric Betweenness Centrality Experiments Applications Summary Computing Betweenness Centrality Betweenness Centrality measures the extent to which a node lies on the shortest path between two other nodes betweennes CB (v) of a vertex v is the summation over all pairs of nodes of the fractional shortest paths going through v. Definition (Betweenness Centrality) For every vertex v 2 V of a weighted graph G(V ; E), the betweenness centrality CB (v) of v is defined by X X σst (v) CB
    [Show full text]
  • The On-Line Shortest Path Problem Under Partial Monitoring
    Journal of Machine Learning Research 8 (2007) 2369-2403 Submitted 4/07; Revised 8/07; Published 10/07 The On-Line Shortest Path Problem Under Partial Monitoring Andras´ Gyor¨ gy [email protected] Machine Learning Research Group Computer and Automation Research Institute Hungarian Academy of Sciences Kende u. 13-17, Budapest, Hungary, H-1111 Tamas´ Linder [email protected] Department of Mathematics and Statistics Queen’s University, Kingston, Ontario Canada K7L 3N6 Gabor´ Lugosi [email protected] ICREA and Department of Economics Universitat Pompeu Fabra Ramon Trias Fargas 25-27 08005 Barcelona, Spain Gyor¨ gy Ottucsak´ [email protected] Department of Computer Science and Information Theory Budapest University of Technology and Economics Magyar Tudosok´ Kor¨ utja´ 2. Budapest, Hungary, H-1117 Editor: Leslie Pack Kaelbling Abstract The on-line shortest path problem is considered under various models of partial monitoring. Given a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way, a decision maker has to choose in each round of a game a path between two distinguished vertices such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be as small as possible. In a setting generalizing the multi-armed bandit problem, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this problem, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched off-line to the entire sequence of the edge weights, by a quantity that is proportional to 1=pn and depends only polynomially on the number of edges of the graph.
    [Show full text]
  • The Domination Number of On-Line Social Networks and Random Geometric Graphs?
    The domination number of on-line social networks and random geometric graphs? Anthony Bonato1, Marc Lozier1, Dieter Mitsche2, Xavier P´erez-Gim´enez1, and Pawe lPra lat1 1 Ryerson University, Toronto, Canada, [email protected], [email protected], [email protected], [email protected] 2 Universit´ede Nice Sophia-Antipolis, Nice, France [email protected] Abstract. We consider the domination number for on-line social networks, both in a stochastic net- work model, and for real-world, networked data. Asymptotic sublinear bounds are rigorously derived for the domination number of graphs generated by the memoryless geometric protean random graph model. We establish sublinear bounds for the domination number of graphs in the Facebook 100 data set, and these bounds are well-correlated with those predicted by the stochastic model. In addition, we derive the asymptotic value of the domination number in classical random geometric graphs. 1. Introduction On-line social networks (or OSNs) such as Facebook have emerged as a hot topic within the network science community. Several studies suggest OSNs satisfy many properties in common with other complex networks, such as: power-law degree distributions [2, 13], high local cluster- ing [36], constant [36] or even shrinking diameter with network size [23], densification [23], and localized information flow bottlenecks [12, 24]. Several models were designed to simulate these properties [19, 20], and one model that rigorously asymptotically captures all these properties is the geometric protean model (GEO-P) [5{7] (see [16, 25, 30, 31] for models where various ranking schemes were first used, and which inspired the GEO-P model).
    [Show full text]
  • An Efficient Reachability Indexing Scheme for Large Directed Graphs
    1 Path-Tree: An Efficient Reachability Indexing Scheme for Large Directed Graphs RUOMING JIN, Kent State University NING RUAN, Kent State University YANG XIANG, The Ohio State University HAIXUN WANG, Microsoft Research Asia Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposed for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this paper, we introduce a novel graph structure, referred to as path- tree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches. Categories and Subject Descriptors: H.2.8 [Database management]: Database Applications—graph index- ing and querying General Terms: Performance Additional Key Words and Phrases: Graph indexing, reachability queries, transitive closure, path-tree cover, maximal directed spanning tree ACM Reference Format: Jin, R., Ruan, N., Xiang, Y., and Wang, H.
    [Show full text]
  • Clique Colourings of Geometric Graphs
    CLIQUE COLOURINGS OF GEOMETRIC GRAPHS COLIN MCDIARMID, DIETER MITSCHE, AND PAWELPRA LAT Abstract. A clique colouring of a graph is a colouring of the vertices such that no maximal clique is monochromatic (ignoring isolated vertices). The least number of colours in such a colouring is the clique chromatic number. Given n points x1;:::; xn in the plane, and a threshold r > 0, the corre- sponding geometric graph has vertex set fv1; : : : ; vng, and distinct vi and vj are adjacent when the Euclidean distance between xi and xj is at most r. We investigate the clique chromatic number of such graphs. We first show that the clique chromatic number is at most 9 for any geometric graph in the plane, and briefly consider geometric graphs in higher dimensions. Then we study the asymptotic behaviour of the clique chromatic number for the random geometric graph G(n; r) in the plane, where n random points are independently and uniformly distributed in a suitable square. We see that as r increases from 0, with high probability the clique chromatic number is 1 for very small r, then 2 for small r, then at least 3 for larger r, and finally drops back to 2. 1. Introduction and main results In this section we introduce clique colourings and geometric graphs; and we present our main results, on clique colourings of deterministic and random geometric graphs. Recall that a proper colouring of a graph is a labeling of its vertices with colours such that no two vertices sharing the same edge have the same colour; and the smallest number of colours in a proper colouring of a graph G = (V; E) is its chromatic number, denoted by χ(G).
    [Show full text]
  • Betweenness Centrality in Dense Random Geometric Networks
    Betweenness Centrality in Dense Random Geometric Networks Alexander P. Giles Orestis Georgiou Carl P. Dettmann School of Mathematics Toshiba Telecommunications Research Laboratory School of Mathematics University of Bristol 32 Queen Square University of Bristol Bristol, UK, BS8 1TW Bristol, UK, BS1 4ND Bristol, UK, BS8 1TW [email protected] [email protected] [email protected] Abstract—Random geometric networks are mathematical are routed in a multi-hop fashion between any two nodes that structures consisting of a set of nodes placed randomly within wish to communicate, allowing much larger, more flexible d a bounded set V ⊆ R mutually coupled with a probability de- networks (due to the lack of pre-established infrastructure or pendent on their Euclidean separation, and are the classic model used within the expanding field of ad-hoc wireless networks. In the need to be within range of a switch). Commercial ad hoc order to rank the importance of the network’s communicating networks are nowadays realised under Wi-Fi Direct standards, nodes, we consider the well established ‘betweenness’ centrality enabling device-to-device (D2D) offloading in LTE cellular measure (quantifying how often a node is on a shortest path of networks [5]. links between any pair of nodes), providing an analytic treatment This new diversity in machine betweenness needs to be of betweenness within a random graph model by deriving a closed form expression for the expected betweenness of a node understood, and moreover can be harnessed in at least three placed within a dense random geometric network formed inside separate ways: historically, in 2005 Gupta et al.
    [Show full text]
  • Robustness of Community Detection to Random Geometric Perturbations
    Robustness of Community Detection to Random Geometric Perturbations Sandrine Péché Vianney Perchet LPSM, University Paris Diderot Crest, ENSAE & Criteo AI Lab [email protected] [email protected] Abstract We consider the stochastic block model where connection between vertices is perturbed by some latent (and unobserved) random geometric graph. The objective is to prove that spectral methods are robust to this type of noise, even if they are agnostic to the presence (or not) of the random graph. We provide explicit regimes where the second eigenvector of the adjacency matrix is highly correlated to the true community vector (and therefore when weak/exact recovery is possible). This is possible thanks to a detailed analysis of the spectrum of the latent random graph, of its own interest. Introduction In a d-dimensional random geometric graph, N vertices are assigned random coordinates in Rd, and only points close enough to each other are connected by an edge. Random geometric graphs are used to model complex networks such as social networks, the world wide web and so on. We refer to [19]- and references therein - for a comprehensive introduction to random geometric graphs. On the other hand, in social networks, users are more likely to connect if they belong to some specific community (groups of friends, political party, etc.). This has motivated the introduction of the stochastic block models (see the recent survey [1] and the more recent breakthrough [5] for more details), where in the simplest case, each of the N vertices belongs to one (and only one) of the two communities that are present in the network.
    [Show full text]