NETWORK SCIENCE
Graphs and Networks
Prof. Marcello Pelillo Ca’ Foscari University of Venice
a.y. 2016/17 Section 1
The Bridges of Konigsberg Drawing Curves with a Single Stroke… Königsberg (today’s Kaliningrad, Russia) Konigsberg’s People
Immanuel Kant (1724 – 1804)
Gustav Kirchhoff (1824 – 1887)
David Hilbert (1862 – 1943) THE BRIDGES OF KONIGSBERG
Can one walk across the seven bridges and never cross the same bridge twice?
Network Science: Graph Theory
h p://www.numericana.com/answer/graphs.htm THE BRIDGES OF KONIGSBERG
Can one walk across the seven bridges and never cross the same bridge twice?
1735: Euler’s theorem:
(a) If a graph has more than two nodes of odd degree, there is no path.
(b) If a graph is connected and has no odd degree nodes, or two such vertices, it has at least one path.
Euler’s solution is considered to be the first theorem in graph theory.
Network Science: Graph Theory
h p://www.numericana.com/answer/graphs.htm The Bridges Today A “Local” Variation of Euler’s Problem Graphs and networks after the “bridges”
• Laws of electrical circuitry (G. Kirchhoff, 1845) • Molecular structure in chemistry (A. Cayley, 1874) • Network representa on of social interac ons (J. Moreno, 1930) • Power grids (1910) • Telecommunica ons and the Internet (1960) • Google (1997), Facebook (2004), Twi er (2006), . . . Section 2
Networks and graphs COMPONENTS OF A COMPLEX SYSTEM
§ components: nodes, vertices N
§ interactions: links, edges L
§ system: network, graph (N,L)
Network Science: Graph Theory NETWORKS OR GRAPHS?
network often refers to real systems • www, • social network • metabolic network.
Language: (Network, node, link)
graph: mathematical representation of a network • web graph, • social graph (a Facebook term)
Language: (Graph, vertex, edge)
We will try to make this distinction whenever it is appropriate, but in most cases we will use the two terms interchangeably.
Network Science: Graph Theory A COMMON LANGUAGE
N=4 L=4
Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION
The choice of the proper network representation determines our ability to use network theory successfully.
In some cases there is a unique, unambiguous representation. In other cases, the representation is by no means unique.
For example, the way we assign the links between a group of individuals will determine the nature of the question we can study.
Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION
If you connect individuals that work with each other, you will explore the professional network.
Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION
If you connect those that have a romantic and sexual relationship, you will be exploring the sexual networks.
Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION
If you connect individuals based on their first name (all Peters connected to each other), you will be exploring what?
It is a network, nevertheless.
Network Science: Graph Theory UNDIRECTED VS. DIRECTED NETWORKS
Undirected Directed
Links: undirected (symmetrical) Links: directed (arcs).
Graph: Digraph = directed graph:
L A D M B F An undirected C link is the I superposition of D two opposite directed links. G B E G H A C F Undirected links : Directed links : coauthorship links URLs on the www Actor network phone calls protein interactions metabolic reactions
Network Science: Graph Theory Section 2.2 Reference Networks
NETWORK NODES LINKS DIRECTED N L k UNDIRECTED
Internet Routers Internet connections Undirected 192,244 609,066 6.33 WWW Webpages Links Directed 325,729 1,497,134 4.60 Power Grid Power plants, transformers Cables Undirected 4,941 6,594 2.67 Mobile Phone Calls Subscribers Calls Directed 36,595 91,826 2.51 Email Email addresses Emails Directed 57,194 103,731 1.81
Science Collaboration Scientists Co-authorship Undirected 23,133 93,439 8.08 Actor Network Actors Co-acting Undirected 702,388 29,397,908 83.71 Citation Network Paper Citations Directed 449,673 4,689,479 10.43 E. Coli Metabolism Metabolites Chemical reactions Directed 1,039 5,802 5.58 Protein Interactions Proteins Binding interactions Undirected 2,018 2,930 2.90 Section 2.3
Degree, Average Degree and Degree Distribution NODE DEGREES
Node degree: the number of links connected to the node. A
kA =1 kB = 4 B Undirected
In directed networks we can define an in-degree and out-degree. D B C The (total) degree is the sum of in- and out-degree.
in out E G kC = 2 kC = 1 kC = 3 A Directed F Source: a node with kin= 0; Sink: a node with kout= 0. SECTION 2.3 DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION
SECTION 2.3 A key property of each node is its degree, representing the number of links it has to other nodes. The degree can represent the number of mobile phone contacts an individual has in the call graph (i.e. the number of dif- BOX 2.2 DEGREE, AVERAGEferent individuals the personDEGREE, has talked to), or the number of citations a BRIEF STATISTICS REVIEW research paper gets in the citation network. AND DEGREE DISTRIBUTION Four key quantities characterize Degree a sample of N values x , ... , x : th 1 N We denote with ki the degree of the i node in the network. For exam- ple, for the undirected networks shown in Figure 2.2 we have k =2, k =3, 1 2 Average (mean): k3=2, k4=1. In an undirected network the total number of links, L, can be expressed as the sum of the node degrees: xx++…+ x 1 N = 12 N = N x ∑ xi NN= 1 (2.1) i 1 Lk = ∑ i . 2 i=1 The nth moment: Here the 1/2 factor corrects for the fact that in the sum (2.1) each link is
A key property of each node is its degreecounted, representing twice. For the example, number the of linkA BIT connecting OF STATISTICS the nodes 2 and 4 in Figure nn n N n xx12++…+ x N 1 n links it has to other nodes. The degree can2.2 represent will be counted the number once of in mobile the degree of node 1 and once in the degree of x = = ∑ xi NNi=1 phone contacts an individual has in the nodecall graph 4. (i.e. the number of dif- BOX 2.2 ferent individuals the person has talked to), or the number of citations a BRIEF STATISTICS REVIEW research paper gets in the citation network. Standard deviation: Average Degree An important property of a network is itsFour average key quantities degree (BOX characterize 2.2), which Degree N 1 2 for an undirected network is a sample of N values x1, ... , xN : σ = − We denote with k the degree of the ith node in the network. For exam- xi∑()xx i N = N i 1 ple, for the undirected networks shown in Figure 2.2 we have k =2, k =3, 12L 1 2 . (2.2) k ≡ ∑ ki = Average (mean): k =2, k =1. In an undirected network the total number of links, L, can be N i=1 N 3 4 Distribution of x: expressed as the sum of the node degrees: xx++…+ x 1 N In directed networks we distinguish between= 12 incoming degreeN = , k in, rep- N x ∑ixi 1 NNi=1 1 = (2.1) p = δ . Lk ∑resentingi . the number of links that point to node i, and outgoing degree, x ∑ xx, i 2 = N i i 1 k out, representing the number of links that point from node i to other i The nth moment: Here the 1/2 factor corrects for the fact that in the sum (2.1) each link is where p follows nodes. Finally, a node’s total degree, ki, is given by x counted twice. For example, the link connecting the nodes 2 and 4 in Figure nn n N in out xx++…+ x 1 n .= 12 N = n (2.3) 2.2 will be counted once in the degree of node 1 and once in the degree of kkii=+kxi ∑ xi NNi=1 node 4. For example, on the WWW the number of pages a given document points to represents its outgoing degree, Standardkout, and deviation the number: of docu- Average Degree ments that point to it represents its incoming degree, k . The total number An important property of a network is its average degree (BOX 2.2), which in Network Science: Graph Theory N for an undirected network is 1 2 σ xi= ∑()xx− GRAPH THEORY N = 8 N i 1 12L . (2.2) k ≡ ∑ ki = N i=1 N Distribution of x: In directed networks we distinguish between incoming degree, k in, rep- i 1 = δ . resenting the number of links that point to node i, and outgoing degree, px ∑ xx, i N i out ki , representing the number of links that point from node i to other where p follows nodes. Finally, a node’s total degree, ki, is given by x in out. (2.3) kkii=+ki For example, on the WWW the number of pages a given document points to represents its outgoing degree, kout, and the number of docu- ments that point to it represents its incoming degree, kin. The total number
GRAPH THEORY 8 AVERAGE DEGREE
1 N 2L k k j ≡ ∑ i k ≡ N i=1 N
i
N – the number of nodes in the graph Undirected
D 1 N 1 N B k in k in , k out k out , k in k out C ≡ ∑ i ≡ ∑ i = N i=1 N i=1
E A L Directed k ≡ F N
Network Science: Graph Theory Average Degree
NETWORK NODES LINKS DIRECTED N L k UNDIRECTED
Internet Routers Internet connections Undirected 192,244 609,066 6.33 WWW Webpages Links Directed 325,729 1,497,134 4.60 Power Grid Power plants, transformers Cables Undirected 4,941 6,594 2.67 Mobile Phone Calls Subscribers Calls Directed 36,595 91,826 2.51 Email Email addresses Emails Directed 57,194 103,731 1.81
Science Collaboration Scientists Co-authorship Undirected 23,133 93,439 8.08 Actor Network Actors Co-acting Undirected 702,388 29,397,908 83.71 Citation Network Paper Citations Directed 449,673 4,689,479 10.43 E. Coli Metabolism Metabolites Chemical reactions Directed 1,039 5,802 5.58 Protein Interactions Proteins Binding interactions Undirected 2,018 2,930 2.90
Network Science: Graph Theory DEGREE DISTRIBUTION
Degree distribution P(k): probability that a randomly chosen node has degree k
Nk = # nodes with degree k
P(k) = Nk / N ➔ plot The degree distribution has taken a central role in net- work theory following the discovery of scale-free networks (Barabási & Albert, 1999). Another reason for its impor- tance is that the calculation of most network properties re- quires us to know pk. For example, the average degree of a network can be written as ∞
kk = ∑ pk k=0 We will see in the coming chapters that the precise func- tional form of pk determines many network phenomena, from network robustness to the spread of viruses.
DEGREE DISTRIBUTION
Image 2.4a Image 2.4b Degree distribution.
The degree distribution is defined as the pk = Nk /N ratio, where Nk denotes In many real networks, the node degree can vary considerably. For exam- the number of k-degree nodes in a network. For the network in (a) we ple, as the degree distribution (a) indicates, the degrees of the proteins in have N = 4 and p1 = 1/4 (one of the four nodes has degree k1 = 1), p2 = the protein interaction network shown in (b) vary between k=0 (isolated
1/2 (two nodes have k3 = k4 = 2), and p3 = 1/4 (as k2 = 3). As we lack nodes nodes) and k=92, which is the degree of the largest node, called a hub. with degree k > 3, pk = 0 for any k > 3. Panel (b) shows the degree distri- There are also wide differences in the number of nodes with different bution of a one dimensional lattice. As each node has the same degree k = degrees: as (a) shows, almost half of the nodes have degree one (i.e.
2, the degree distribution is a Kronecker’s delta function pk = (k - 2). p1=0.48), while there is only one copy of the biggest node, hence p92 = 1/ N=0.0005. (c) The degree distribution is often shown on a so-called log-
log plot, in which we either plot log pk in function of log k, or, as we did in (c), we use logarithmic axes.
DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION | 29 Section 2.4
Adjacency matrix ADJACENCY MATRIX
4 4
3 3 2 2 1 1
Aij=1 if there is a link between node i and j Aij=0 if nodes i and j are not connected to each other.
0101 0000 1001 1001 A = Aij = 0 1 ij 0 00011 0001 B 1110C B 1000C B C B C Note@ that for a directed graphA (right) the matrix is@ not symmetric. A
Aij =1if there is a link pointing from node j and i if there is no link pointing from j to i. Aij =0 Network Science: Graph Theory ADJACENCY MATRIX AND NODE DEGREES
⎛ 0 1 0 1⎞ N k = A 4 ⎜ ⎟ i ∑ ij 1 0 0 1 j =1 A = ⎜ ⎟ ij ⎜ 0 0 0 1⎟ N 3 k A 2 ⎜ ⎟ j = ∑ ij ⎝ 1 1 1 0⎠ i=1 1
Aij = A ji 1 N 1 N
Undirected L = k = A 2 ∑ i 2 ∑ ij Aii = 0 i=1 ij