Large-Scale Network Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Large-scale network analysis G´abor Cs´ardi [email protected] Department of Biophysics, KFKI Research Institute for Nuclear and Particle Physics of the Hungarian Academy of Sciences, Budapest, Hungary Currently at Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland Outline 1. The igraph R package 2. What can you do with large graphs? 3. Some unique igraph features 4. Rapid prototyping http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 2 The igraph software package • R package, Python extension and C library. The igraph software package • R package, Python extension and C library. • Under active development. The igraph software package • R package, Python extension and C library. • Under active development. • Free for academic and commercial use (GPL).\Standing on the shoulder of giants." The igraph software package • R package, Python extension and C library. • Under active development. • Free for academic and commercial use (GPL).\Standing on the shoulder of giants." • State of the art data structures and algorithms, works well with large graphs. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 3 How LARGE? • Well, it depends what you want to calculate. How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. • How do I know that it fits into the memory? How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. • How do I know that it fits into the memory? • igraph (typically) needs 32 bytes per edge and 16 bytes per vertex. How LARGE? • Well, it depends what you want to calculate. • Just to create and manipulate it, it is enough if it fits into the memory. • How do I know that it fits into the memory? • igraph (typically) needs 32 bytes per edge and 16 bytes per vertex. • A graph with one million vertices and ten million edges needs about 320 Mbytes. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 4 Installation 1 install.packages("igraph") It is really that simple. Isn't it? Installation 1 install.packages("igraph") It is really that simple. Isn't it? You might also need 1 install.packages("digest") 2 install.packages("rgl") http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 5 How to follow this\lecture"? 1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy & paste everything into your R session. You can skip any example if you wish to. How to follow this\lecture"? 1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy & paste everything into your R session. You can skip any example if you wish to. 2. You type in everything I type in. How to follow this\lecture"? 1. Go to http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html and copy & paste everything into your R session. You can skip any example if you wish to. 2. You type in everything I type in. 3. Sit back and watch. You can download the slides/code anyway. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 6 The igraph data model • Binary relation (=edges) between elements of a set (=vertices). The igraph data model • Binary relation (=edges) between elements of a set (=vertices). • If the pairs are unordered, then the graph is undirected: B A C vertices = fA; B; C; D; Eg E edges = (fA; Bg; fA; Cg; fB; Cg; fC; Eg): D The igraph data model • Binary relation (=edges) between elements of a set (=vertices). • If the pairs are unordered, then the graph is undirected: B A C vertices = fA; B; C; D; Eg E edges = (fA; Bg; fA; Cg; fB; Cg; fC; Eg): D • Otherwise it is directed: B A C vertices = fA; B; C; D; Eg E edges = ((A; B); (A; C); (B; C); (C; E)): D http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 7 Vertex and edge ids • Vertices are always numbered from 0. • Numbering is continual, form 0 to n−1. Vertex and edge ids • Vertices are always numbered from 0. • Numbering is continual, form 0 to n−1. • We have to translate vertex names to ids: V = fA; B; C; D; Eg E = ((A; B); (A; C); (B; C); (C; E)): A = 0;B = 1;C = 2;D = 3;E = 4: Vertex and edge ids • Vertices are always numbered from 0. • Numbering is continual, form 0 to n−1. • We have to translate vertex names to ids: V = fA; B; C; D; Eg E = ((A; B); (A; C); (B; C); (C; E)): A = 0;B = 1;C = 2;D = 3;E = 4: 1 library(igraph) 2 g <- graph( c(0,1, 0,2, 1,2, 2,4), n=5 ) 3 g 4 g2 <- graph( c(0,1, 0,2, 1,2, 2,4), 5 n=5, dir=FALSE ) 6 g2 http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 8 Working with igraph graphs 1 ## How to decide what kind of object a variable refers to 2 class(g2) 3 class(1) 4 class("foobar") 5 6 ## Is this object an igraph graph? 7 is.igraph(g) 8 is.igraph(1:10) 9 10 ## Summary, number of vertices, edges 11 summary(g) 12 vcount(g) 13 ecount(g) 14 15 ## Is the graph directed? 16 is.directed(g) 17 is.directed(g2) 18 is.directed(1:10) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 9 Working with igraph graphs 1 ## Convert from directed to undirected 2 as.undirected(g) 3 4 ## And back 5 as.directed(as.undirected(g)) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 10 The igraph data model, multiple edges • igraph can handle multi-graphs: V = fA; B; C; D; Eg E = ((AB); (AB); (AC); (BC); (CE)): 1 g <- graph( c(0,1,0,1, 0,2, 1,2, 3,4), n=5 ) 2 g http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 11 The igraph data model, loop edges • igraph can handle loop-edges: V = fA; B; C; D; Eg E = ((AA); (AB); (AC); (BC); (CE)): 1 g <- graph( c(0,0,0,1, 0,2, 1,2, 3,4), n=5 ) 2 g http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 12 The igraph data model, what cannot be represented • \Mixed"graphs, with undirected and directed edges. • Hypergraphs. • No direct support for bipartite (two-mode) graphs. http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 13 Naming vertices 1 g <- graph.ring(10) 2 V(g)$name <- letters[1:10] 3 V(g)$name 4 g 5 print(g, v=T) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 14 Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) 1 # A directed graph 2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) 1 # A directed graph 2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) 1 # A graph with isolate vertices 2 g4 <- graph.formula( Alice -- Bob -- Daniel, 3 Cecil:Gordon, Helen ) Creating graphs, the formula interface 1 # A simple undirected graph 2 g <- graph.formula( Alice-Bob-Cecil-Alice, 3 Daniel-Cecil-Eugene, Cecil-Gordon ) 1 # Another undirected graph, ":" notation 2 g2 <- graph.formula( Alice-Bob:Cecil:Daniel, 3 Cecil:Daniel-Eugene:Gordon ) 1 # A directed graph 2 g3 <- graph.formula( Alice +-+ Bob --+ Cecil 3 +-- Daniel, Eugene --+ Gordon:Helen ) 1 # A graph with isolate vertices 2 g4 <- graph.formula( Alice -- Bob -- Daniel, 3 Cecil:Gordon, Helen ) 1 # "Arrows" can be arbitrarily long 2 g5 <- graph.formula( Alice +---------+ Bob ) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 15 Creating graphs, from edge lists and adjacency matrices 1 ## From edge lists 2 el <- cbind( c(0, 0, 1, 2), 3 c(1, 2, 2, 4) ) 4 g <- graph.edgelist(el) 5 g 6 7 ## Symbolic edge lists 8 el <- cbind( c("Alice", "Alice", "Bob", "Cecil"), 9 c("Bob", "Cecil", "Cecil", "Ed") ) 10 g <- graph.edgelist(el) 11 g 12 summary(g) 13 14 ## Adjacency matrices 15 A <- matrix(sample(0:1, 100, rep=TRUE), 10, 10) 16 g <- graph.adjacency(A) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 16 Creating graphs, from data frames 1 source("http://cneurocvs.rmki.kfki.hu/igraph/plus.R") 2 vertices <- read.csv("judicial.csv") 3 edges <- read.table("allcites.txt") 4 jg <- graph.data.frame(edges, vertices=vertices, dir=TRUE) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 17 Visualizing graphs • plot Uses traditional R graphics, non-interactive, 2d. Publication quality plots in all formats R supports. Visualizing graphs • plot Uses traditional R graphics, non-interactive, 2d. Publication quality plots in all formats R supports. 1 g <- barabasi.game(100, m=1) 2 g <- simplify(g) 3 igraph.par("plot.layout", 4 layout.fruchterman.reingold) 5 plot(g, vertex.size=3, vertex.label=NA, 6 edge.arrow.size=0.6) http://cneurocvs.rmki.kfki.hu/igraph/NIPS2008.html 18 Visualizing graphs • tkplot Uses Tcl/Tk via the tcltk package, interactive, 2d.