Large-Scale Network Analysis

Large-Scale Network Analysis

Large-scale network analysis G´abor Cs´ardi [email protected] Department of Biophysics, KFKI Research Institute for Nuclear and Particle Physics of the Hungarian Academy of Sciences, Budapest, Hungary Currently at Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland Outline 1. The igraph R package 2. What can you do with large graphs? 3. Some unique igraph features 4. Rapid prototyping http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 2 The igraph software package • R package, Python extension and C library. The igraph software package • R package, Python extension and C library. • Under active development. The igraph software package • R package, Python extension and C library. • Under active development. • Free for academic and commercial use (GPL).\Standing on the shoulder of giants." The igraph software package • R package, Python extension and C library. • Under active development. • Free for academic and commercial use (GPL).\Standing on the shoulder of giants." • State of the art data structures and algorithms, works well with large graphs. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 3 How LARGE? • Well, it depends what you want to calculate. How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. • How do I know that it fits into the memory? How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. • How do I know that it fits into the memory? • igraph (typically) needs 32 bytes per edge and 16 bytes per vertex. How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. • How do I know that it fits into the memory? • igraph (typically) needs 32 bytes per edge and 16 bytes per vertex. • A graph with one million vertices and ten million edges needs about 320 Mbytes. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 4 Installation 1 install.packages("igraph") It is really that simple. Isn't it? Installation 1 install.packages("igraph") It is really that simple. Isn't it? You might also need 1 install.packages("digest") 2 install.packages("rgl") http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 5 How to follow this\lecture"? 1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy & paste everything into your R session. You can skip any example if you wish to. How to follow this\lecture"? 1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy & paste everything into your R session. You can skip any example if you wish to. 2. You type in everything I type in. How to follow this\lecture"? 1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy & paste everything into your R session. You can skip any example if you wish to. 2. You type in everything I type in. 3. Sit back and watch. You can download the slides/code anyway. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 6 The igraph data model • Binary relation (=edges) between elements of a set (=vertices). The igraph data model • Binary relation (=edges) between elements of a set (=vertices). • If the pairs are unordered, then the graph is undirected: B A C vertices = fA; B; C; D; Eg E edges = (fA; Bg; fA; Cg; fB; Cg; fC; Eg): D The igraph data model • Binary relation (=edges) between elements of a set (=vertices). • If the pairs are unordered, then the graph is undirected: B A C vertices = fA; B; C; D; Eg E edges = (fA; Bg; fA; Cg; fB; Cg; fC; Eg): D • Otherwise it is directed: B A C vertices = fA; B; C; D; Eg E edges = ((A; B); (A; C); (B; C); (C; E)): D http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 7 Vertex and edge ids • Vertices are always numbered from 0. • Numbering is continual, form 0 to n−1. Vertex and edge ids • Vertices are always numbered from 0. • Numbering is continual, form 0 to n−1. • We have to translate vertex names to ids: V = fA; B; C; D; Eg E = ((A; B); (A; C); (B; C); (C; E)): A = 0;B = 1;C = 2;D = 3;E = 4: Vertex and edge ids • Vertices are always numbered from 0. • Numbering is continual, form 0 to n−1. • We have to translate vertex names to ids: V = fA; B; C; D; Eg E = ((A; B); (A; C); (B; C); (C; E)): A = 0;B = 1;C = 2;D = 3;E = 4: 1 library(igraph) 2 g <- graph( c(0,1, 0,2, 1,2, 2,4), n=5 ) 3 g 4 g2 <- graph( c(0,1, 0,2, 1,2, 2,4), 5 n=5, dir=FALSE ) 6 g2 http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 8 Working with igraph graphs 1 ## How to decide what kind of object a variable refers to 2 class(g2) 3 class(1) 4 class("foobar") 5 6 ## Is this object an igraph graph? 7 is.igraph(g) 8 is.igraph(1:10) 9 10 ## Summary, number of vertices, edges 11 summary(g) 12 vcount(g) 13 ecount(g) 14 15 ## Is the graph directed? 16 is.directed(g) 17 is.directed(g2) 18 is.directed(1:10) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 9 Working with igraph graphs 1 ## Convert from directed to undirected 2 as.undirected(g) 3 4 ## And back 5 as.directed(as.undirected(g)) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 10 The igraph data model, multiple edges • igraph can handle multi-graphs: V = fA; B; C; D; Eg E = ((AB); (AB); (AC); (BC); (CE)): 1 g <- graph( c(0,1,0,1, 0,2, 1,2, 3,4), n=5 ) 2 g http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 11 The igraph data model, loop edges • igraph can handle loop-edges: V = fA; B; C; D; Eg E = ((AA); (AB); (AC); (BC); (CE)): 1 g <- graph( c(0,0,0,1, 0,2, 1,2, 3,4), n=5 ) 2 g http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 12 The igraph data model, what cannot be represented • \Mixed"graphs, with undirected and directed edges. • Hypergraphs. • No direct support for bipartite (two-mode) graphs. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 13 Naming vertices 1 g <- graph.ring(10) 2 V(g)$name <- letters[1:10] 3 V(g)$name 4 g 5 print(g, v=T) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 14 Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) 1 # A directed graph 2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) 1 # A directed graph 2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) 1 # A graph with isolate vertices 2 g4 <- graph.formula( Alice -- Bob -- Daniel, 3 Cecil:Gordon, Helen ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) 1 # A directed graph 2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) 1 # A graph with isolate vertices 2 g4 <- graph.formula( Alice -- Bob -- Daniel, 3 Cecil:Gordon, Helen ) 1 # "Arrows" can be arbitrarily long 2 g5 <- graph.formula( Alice +---------+ Bob ) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 15 Creating graphs, from edge lists and adjacency matrices 1 ## From edge lists 2 el <- cbind( c(0, 0, 1, 2), 3 c(1, 2, 2, 4) ) 4 g <- graph.edgelist(el) 5 g 6 7 ## Symbolic edge lists 8 el <- cbind( c("Alice", "Alice", "Bob", "Cecil"), 9 c("Bob", "Cecil", "Cecil", "Ed") ) 10 g <- graph.edgelist(el) 11 g 12 summary(g) 13 14 ## Adjacency matrices 15 A <- matrix(sample(0:1, 100, rep=TRUE), 10, 10) 16 g <- graph.adjacency(A) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 16 Creating graphs, from data frames 1 source("http://cneurocvs.rmki.kfki.hu/igraph/plus.R") 2 vertices <- read.csv("judicial.csv") 3 edges <- read.table("allcites.txt") 4 jg <- graph.data.frame(edges, vertices=vertices, dir=TRUE) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 17 Visualizing graphs • plot Uses traditional R graphics, non-interactive, 2d. Publication quality plots in all formats R supports. Visualizing graphs • plot Uses traditional R graphics, non-interactive, 2d. Publication quality plots in all formats R supports. 1 g <- barabasi.game(100, m=1) 2 g <- simplify(g) 3 igraph.par("plot.layout", 4 layout.fruchterman.reingold) 5 plot(g, vertex.size=3, vertex.label=NA, 6 edge.arrow.size=0.6) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 18 Visualizing graphs • tkplot Uses Tcl/Tk via the tcltk package, interactive, 2d.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    101 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us