Random Vector Graphs

A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS ROSS M. RICHARDSON Abstract. These notes are an attempt to document various graph drawing resources which I have compiled in the course of my work. They are available to anyone who is interesting, and I welcome comments and suggestions. They are very much a work in progress, and you are advised to check the last change date. I offer no warranty, implicit or explicit, and I make no claim as to the relevance of this information to your own computer system. Last changed: September 28, 2006 Contents 1. Prerequisites 2 2. Graph Formats 2 2.1. A Rouges Gallery 2 2.2. Conversion 5 3. Graph Computing 5 3.1. NetworkX 5 3.2. Boost Graph Library 9 4. Graph Drawing 12 4.1. Algorithms 12 4.2. Presentation 14 4.3. Worked Examples 16 4.4. A Sample Drawing Code 20 5. Degree Distributions 25 5.1. Visualization 25 5.2. Powerlaw Exponent 27 5.3. An example 28 6. A Sample Project 29 Acknowledgments 29 7. Appendix: Datasources. 29 References 29 The reader who enjoys presentations might enjoy a talk I gave on many of these topics. The relevent PDF file can be obtained at http://www.math.ucsd.edu/ ∼rmrichar/talks/graph drawing talk.pdf (warning: 5MB). 1 2 ROSS M. RICHARDSON 1. Prerequisites This guide assumes the reader is sufficiently familiar with computers and computer programming to be able to comprehend the code and procedures contained here within. The author does not intend in any way for this guide to serve as a method of instruction for learning these skills. However, for the reader already familiar with computer programming, we do hope to provide enough examples such that the reader feels comfortable tinkering on their own. The documentation also assumes that the reader has access to the tools on her own system; this guide makes no effort to explain their installation. For those members of Fan Chung’s research group, I will try to the best of my abilities to make sure this guide corresponds to currently installed software on math107. This guide includes code in Python, C++, and Maple. The mathematical content in this guide is minimal, and should not be distracting to anyone for whom these notes might be of interest. That said, someone not familiar with the basic terminology of graph theory would do well to have a reference on hand. I suggest [14]. 2. Graph Formats For almost every graph tool out there, there is some sort of graph file format. Sadly, few apply generally to a large swath of graph drawing contexts. When dis- cussing specific tools that require proprietary formats, we shall discuss the relevant formats. Here we just present a rouges gallery to help you quickly identify those files you come across in the wild. We also discuss some strategies for converting between the formats, which is often the most time-consuming task in any computing project. 2.1. A Rouges Gallery. 2.1.1. GraphXML. This is a newer format, based on XML (eXtensible Markup Language). It should not be confused with the custom XML format used in Lincoln Lu’s graph tools. We don’t currently have tools to use this format, but it is easy to recognize if you come across it. The basic syntax is simple; here is a sample <?xml version=” 1 . 0 ”?> <!DOCTYPE GraphXML SYSTEM ”file:GraphXML.dtd”> <GraphXML> <graph> <node name=”first” /> <node name=”second /> <edge source=”first” target=”second” /> </graph> </GraphXML> Figure 1. A GraphXML file. For further reference, see [1]. A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 3 2.1.2. Lincoln’s XML Format. This is an evolving format that I don’t feel very capable of documenting. Questions should be directed to Lincoln [2] or I if you have some reason to use this format. Files which begin: <?xml version="1.0"?> <graph><node id="v100"><point x="3.21" y = "9.18"></node>... are probably in Lincoln’s format. 2.1.3. Large Graph Layout. This is the input format to the Large Graph Layout drawing engine. Proper documentation can be found at the Large Graph Layout web site, found in the references [3]. LGL actually accepts a number of different file formats. The first of these, the .ncol file format, is given as a simple two column file delimited by whitespace. Thus, to place edge between Paul and Endre and Endre and Laszlo, a file would contain the lines: Paul Endre 3.2 Endre Laszlo 4.5 Note here the optional edge weight following the two endpoints. An .lgl file is somewhat different. It lists vertices first, followed by neighbors. Thus, the same relations would be represented as follows: # Endre Paul 3.2 Laszlo 4.5 There are a few caveats to this file format. For use, please see the section on Large Graph Layout, or read the documentation found in the references. 2.1.4. Walrus. This is a strange one. If you see something akin to figure 2 you Graph { ### metadata ### @name="IMDB1"; @description=; @numNodes=2798; @numLinks=11135; @numPaths=0; @numPathLinks=0; ### structural data ### @links=[ { 712; 0; }, { 0; 735; }, { 0; 2499; }, { 0; 2744; }, { 1; 2; }, { 1; 942; }, ... Figure 2. Beware the Walrus. 4 ROSS M. RICHARDSON graph SD { OceanBeach -- PacificBeach [pos=’1.0, 2.0’] PacificBeach -- LaJolla [pos=’1.0, -3.0’]; LaJolla -- ScrippsRanch [pos=’-2.5, 1.2’]; Hillcrest -- OceanBeach [pos=’0.3,4.0’]; MissionBeach -- OceanBeach [pos=’2.7,4.2’]; NationalCity; } Figure 3. A simple DOT file. should back away very slowly. The file format is quite complicated. I refer the interested reader to the project site [4]. Good luck1. 2.1.5. Matlab. A .mat file is not human-readable. Say you have a file labeled graph.mat. In MatlabTM2 (or Octave), issue the following: >> load graph . mat >> whos Name Size Bytes Class X 3x3 72doublearray Grand total is 9 elements using 72 bytes >> Here we see that graph.mat contained a 3 by 3 array labeled X. Matlab is not explicitly a graph format, but very often graphs are stored as adjacency matrices or lists. The use of Matlab in this context is well outside the scope of the present section. 2.1.6. DOT. The DOT format, which comes from the AT&T Graphviz collection, is by now the default choice for graph storage and manipulation. This is due to two factors: the widespread use of AT&T’s Graphviz tools, and the generality and extensibility of the format itself. The format itself is easily recognized. We present an example in figure 3. Both undirected and directed graphs are supported. The format allows for arbi- trary attributes to be associated at the node, edge, and graph level, though there are a set of attributes which are standardized for use with the Graphviz tools. I strongly urge all users who are looking for a format to store their data in to consider DOT. The advantages are many: it is human readable, standardized, popular, and highly extensible (only GraphXML is more extensible). The primary disadvantage, which is shared by all non-trivial formats, is that a full parser is 1I do have some code which allows me to convert into this format from Lincoln’s format. I will release said code if asked. 2A registered trademark of The MathWorks. A PRACTICAL GUIDE TO DRAWING AND COMPUTING WITH COMPLEX NETWORKS 5 required to read and manipulate DOT files. Luckily, there are a number of pre- written parsers, including the newly available pygraphviz parser (an add-on to the NetworkX package). Full documentation for the DOT format is available at the Graphviz project site [5]. 2.2. Conversion. TBD 3. Graph Computing Graph computing, in the context of this section, refers to an integrated system or library for manipulating graphs. There are a large collection of such systems; indeed, the digital representation of a graph is an all-to-common project for begin- ning computer science undergraduates. For our purposes, we focus only on those systems which: (1) are very general (2) contain a large number of primitive algorithms (3) are well documented and actively supported In my assessment, there are two systems which meet these requirements. One, the Boost Graph Library, is a C++/Python library which has been around for a few years. This library seeks to be a generic set of data structures and algorithms suitable for constructing robust graph algorithms. The other, NetworkX, is a pure Python package with a host of tools which reflect recent trends in complex networkx research, and which puts an emphasis on interactivity. In what follows, we give a general overview of the two packages, and asses their suitability to various computing tasks. We go over some of the basics of their operation, and provide two a sample application of each. 3.1. NetworkX. Accoding to their project description: “NetworkX (NX) is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.” More precisely, NetworkX is a collection of complex network tools (many already in existence) which are collected in one place and given a more or less common python interface. This is not unlike the SAGE project [6] in computational number theory. The project itself is due to Aric Hagberg at LANL, and it is currently in very active construction. 3.1.1. Strengths and Weaknesses. There are a number of features to recommend the NetworkX package. NetworkX is accessed through a simple Python interface, which is useful for a number of reasons.

Load more