Graph Patterns and Visualization

Contents

Graph cores...... 1 Dyads and triads...... 1 Motifs...... 2 Visualisation...... 5 Mixing patterns...... 21 ...... 21 Assortativity measures...... 22 Basic network analysis pipeline...... 23

Graph cores

A k-core is the largest subgraph such that each is connected to at least k others in subset. Every vertex in k-core has a ki ≥ k. (k + 1)-core is always subgraph of k-core. The core number of a vertex is the highest order of a core that contains this vertex.

Dyads and triads

Dyad is a pair of vertices and possible relational ties between them: * mutual

• asymmetric

• null (non-existent)

Triad is a subgraph of three vertices and possible ties between them. Triad census :16 isomorphism classes. D - down, U - up, T - transitive, C - cyclic. Mutual dyads | assymetric dyads | null dyads.

1 Motifs

Motifs are often defined as recurrent and statistically significant sub-graphs or patterns. Here, we consider motifs as sub-graphs of a given graph, which are isomorphic to defined sample. Motifs are not induced subgraphs, i.e. they do not contain all the graph edges between selected vertices. Motifs appear in a network more frequently than in a comparable random network

• calculate the number of occurrences of a sub graph • evaluate the significance

For G0 subgraph (motif candidate) of G,

0 0 0 FG(G ) − µR(G ) Zscore(G ) = 0 σR(G ) R - , µ - mean frequency, σ -standard deviation.

2 3 4 Let’s define a simple sample motif and calculate, if it is often in our graph:

#install.packages("rgl") library('igraph') sample1 = graph(c(c(1,2), c(2,3))) plot(sample1) isoclass_num = graph.isoclass(sample1) # defining number of corresponding isoclass motifs3[isoclass_num] # returns number of motifs #Here is our old friend: g = graph.famous("Zachary") motifs3 = graph.motifs(g, size =3) #Due to high computational time (isomorphism checks), `graph.motifs` are #implemented for graps of sizes 3 and 4 only. However, we can easy check numbers #for all motifs of size 4 to find more frequent patterns: motifs4 = graph.motifs(g, size =4) #The most frequent pattern is fifth. Let's draw it. plot(graph.isocreate(g,size=4, number=5)) #Not what we're looking for.. Second most frequent (seventh): plot(graph.isocreate(g,size=3, number=7))

That is certainly better.

Visualisation

There are currently three different functions in the igraph package which can draw graph in various ways:

• plot.igraph does simple non-interactive 2D plotting to R devices. Actually it is an implementation of the plot generic function, so you can write plot(graph) instead of plot.igraph(graph). As it used the standard R devices it supports every output format for which R has an output device. The list is quite impressing: PostScript, PDF files, XFig files, SVG files, JPG, PNG and of course you can plot to the screen as well using the default devices, or the good-looking anti-aliased Cairo device. See plot.igraph for some more information.

• tkplot does interactive 2D plotting using the tcltk package. It can only handle graphs of moderate size, a thousend vertices is probably already too many. Some parameters of the plotted graph can be

5 changed interactively after issuing the tkplot command: the position, color and size of the vertices and the color and width of the edges. See tkplot for details.

• rglplot is an experimental function to draw graphs in 3D using OpenGL. See rglplot for some more information.

Let’s draw a graph-ring using different three methods. library(igraph) library(rgl) g <- graph.ring(10) g$layout <- layout.circle plot(g)

6 4 3

5 2

6 1

7 10

8 9

#doesn't work in R Markdown tkplot(g)

## Loading required package: tcltk

## [1] 1 rglplot(g)

Layout

Either a function or a numeric matrix. It specifies how the vertices will be placed on the plot. Let’s demonstrate how it works on some graph. For example graph which based on barabashi model. g <- barabasi.game(50)

7 • layout.auto - tries to choose an appropriate layout function for the supplied graph, and uses that to generate the layout. plot(g, layout=layout.auto, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

8 47 26 44 35 28 21 27 6 22 14 25 39 50 2 11 1945 7 12 13 43 10 245 4620 4 9 29 1 38 34 42 36 3 174933 37 3123 30 8 41 48 32 15 16 18 40

• layout.random - simply places the vertices randomly on a square. plot(g, layout=layout.random, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

9 49 43 1 26 41 6 4 48 36 28 50 11 29 178 3 30 1415 2 23 2712 42 46 4721 945 38 22 35 24 4434 3237 39 40 10 187 33 1331 19 5 25 16 20

• layout.circle - places the vertices on a unit circle equidistantly. plot(g, layout=layout.circle, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

10 15141312 1716 1110 18 9 19 8 20 7 21 6 22 5 23 4 24 3 25 2 26 1 27 50 28 49 29 48 30 47 31 46 32 45 33 44 3435 4243 363738394041

• layout.sphere - places the vertices (approximately) uniformly on the surface of a sphere, this is thus a 3d layout. plot(g, layout=layout.sphere, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

11 2032 1931 2133 9 42 18 43 8 30 10 41 2234 2 48 7 17 44 49 29 40 11 3 501 2335 47 6 16 45 28 12 4 39 46 5 2436 15 13 38 27 37 14 25 26

• layout.fruchterman.reingold uses a force-based algorithm proposed by Fruchterman and Reingold. plot(g, layout=layout.fruchterman.reingold, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

12 16 47 40 18 44 15 26 32 30 21 50 8 37 42 3 43 11 10 38 45 36 14 4 29 27 34 5 17 1 19 23 48 49 46 6 2 24 3320 35 9 22 25 31 28 7 41 12 39 13

• layout.kamada.kawai is another force based algorithm. plot(g, layout=layout.kamada.kawai, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

13 47 26 44 21 35 13 14 12 6 25 41 7 39 2 4531 32 18 22 17 38 28 5 9 46 8 15 40 2349 1 24 36 19 3320 16 48 3 29 37 4 30 42 10 34 11 27 50 43

• layout.spring is a spring embedder algorithm. plot(g, layout=layout.spring, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

14 30 6 28 32 36 41 13 21 37 44 3 14 33 2247 381920 4248 18 401534 429498 35 26 39 4524469 1625 517 1 11 2 7 43 31 10 23 5027

12

• layout.fruchterman.reingold.grid is similar to layout.fruchterman.reingold but repelling force is calculated only between vertices that are closer to each other than a limit, so it is faster. plot(g, layout=layout.fruchterman.reingold.grid, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

15 13 47 39 35 12 28 44 7 6 22 21 48 26 14 2 37 25 3823 36 32 20 5 30 40 8 491 33 3 15 45 17 46 9 24 19 16 18 4 31 29 34 10 42 41 50 11 43 27

• layout.lgl is for large connected graphs, it is similar to the layout generator of the Large Graph Layout software http://lgl.sourceforge.net/. plot(g, layout=layout.lgl, vertex.size=4, vertex.label.dist=0.5, vertex.color="red", edge.arrow.size=0.5)

16 13 12 39 43 7 34 47 44 35 25 27 6 11 21 14 42 10 2 50 26 37 38 4 22 28 36 29 49 24 32 45 31 48 18 19 1 23 17 15 8 20 46 41 33 16 3 5 40 9 30

• layout.graphopt is a port of the graphopt layout algorithm by Michael Schmuhl. graphopt version 0.4.1 was rewritten in C and the support for layers was removed (might be added later) and a code was a bit reorganized to avoid some unneccessary steps is the node charge (see below) is zero.

• layout.svd is a currently experimental layout function based on singular value decomposition. • layout.norm normalizes a layout, it linearly transforms each coordinate separately to fit into the given limits.

• layout.drl is another force-driven layout generator, it is suitable for quite large graphs. • layout.reingold.tilford generates a tree-like layout, so it is mainly for tree.

Highlight components

Let’s make different color for each graph g <- erdos.renyi.game(100,1/100) l <- layout.fruchterman.reingold(g) op = par(mfrow = c(1,2)) plot(g, layout=l, vertex.size=5, vertex.label=NA) comps <- clusters(g)$membership colbar <- rainbow(max(comps)+1)

17 V(g)$color <- colbar[comps+1] plot(g, layout=l, vertex.size=5, vertex.label=NA)

Highlight communities in graph

Let’s make different color for each community g <- graph.full(5) %du% graph.full(5) %du% graph.full(5) g <- add.edges(g, c(1,6,1,11,6,11)) op = par(mfrow = c(1,2)) plot(g, layout = layout.kamada.kawai) com <- spinglass.community(g, spins=5) V(g)$color <- com$membership+1 g <- set.graph.attribute(g, "layout", layout.kamada.kawai(g)) plot(g, vertex.label.dist=1.5)

18 3 2 3 7 9 4 2 12 5 10 14 1 5 1 6 4 11 15 8 13 11 6 14 8 15 7 9 12 13 10

par(op)

Trees Visualization plot(graph.tree(50,2))

19 41 40 4544 43 33 20 32 22 21 42 34 16 10 46 11 23 35 17 8 5 47 4 2 38 19 9 39 1 18 48 37 3 24 36 6 12 49 7 13 25 14 50 28 15 2627 29 30 31

We can use layout = layout.reingold.tilford to draw tree

plot(graph.tree(50,2), vertex.size=3, vertex.label=NA, layout=layout.reingold.tilford)

20 tkplot(graph.tree(50,2, mode="undirected"), vertex.size=10, vertex.color="green")

## [1] 2

Mixing patterns

• Assortative mixing, “like links with like”, attributed of connected nodes tend to be more similar than if there were no such edge • Disassortative mixing, “like links with dislike”, attributed of connected nodes tend to be less similar than if there were no such edge Examples: • assortative mixing - in social networks political beliefs, obesity, race • disassortative mixing - dating network, food web (predator/prey), economic networks (produc- ers/consumers)

Assortative Mixing

Assortative Mixing coefficient shows whether nodes with the same attribute values tend to form connections. Download Caltech friendship network. Inspect nodes attributes and compute assortativity coefficients with assortativity function

Assortative mixing by node degree, xi ← ki − 1:

21   P kikj ij Aij − 2m kikj r =   P kikj ij kiδij − 2m kikj

Political polarization on Twitter: political retweet network, red color - “right-learning” users, blue color - “left learning” users.

Assortativity measures

Discrete mixing by categorical attribute (ci -label: color, gender, ethnicity). How much more often do attributes match across edges than expected at random? Assortativity coefficient:

  P kikj Q ij Aij − 2m δ(ci, cj) C = = k k Qmax P i j 2m − ij 2m δ(ci, cj)

22 Mixing by scalar properties, scalar value attribute (age, income, number of friends). Correlation of values across edges. Assortativity coefficient:

  P kikj cov ij Aij − 2m xixj r = = var P kikj ij(kiδij − 2m )xixj # Your code here g <- read.graph(file = 'Caltech.gml', format = 'gml') assortativity.nominal(graph = g, types = V(g)$dorm+1, directed =F)

## [1] 0.3491531

Basic network analysis pipeline

The basic pipeline for exploratory graph analysis consists of:

Loading

See Class 1.

Cleaning

After loading graph with all necessary attributes, it is often recommended to:

• delete empty nodes:

g = delete.vertices(g, degree(g) ==0)

• delete self-cycles and multiple edges - to make graph simple. simplify does all the work; some parameters could be tuned:

simplify(g)

## IGRAPH U--- 769 16656 -- ## + attr: id (v/n), status (v/n), gender (v/n), magor (v/n), ## sndmagor (v/n), dorm (v/n), year (v/n), school (v/n)

# is.simple(simplify(g, remove.multiple=FALSE))

Obtaining main characteristics

See Classes 2-4.

Clustering

See Class 5.

Visualization

See previous section + Classes 1, 4.

23