Quick viewing(Text Mode)

Network Science

Network Science

Class 2: (Ch2)

Albert-László Barabási with Emma K. Towlson. Sebastian Ruf, Michael Danziger, and Louis Shekhtman

www.BarabasiLab.com Section 1

The Bridges of Konigsberg THE BRIDGES OF KONIGSBERG

Can one walk across the seven bridges and never cross the same bridge twice?

Network Science: Graph Theory THE BRIDGES OF KONIGSBERG

Can one walk across the seven bridges and never cross the same bridge twice?

1735: Euler’s theorem:

a) If a graph has more than two nodes of odd , there is no .

b) If a graph is connected and has no odd degree nodes, it has at least one path. Network Science: Graph Theory Section 2

Networks and graphs COMPONENTS OF A COMPLEX SYSTEM

. components: nodes, vertices N

. interactions: links, edges L

. system: network, graph (N,L)

Network Science: Graph Theory NETWORKS OR GRAPHS?

network often refers to real systems • www, • • metabolic network.

Language: (Network, node, link) graph: mathematical representation of a network • web graph, • social graph (a Facebook term)

Language: (Graph, , edge) We will try to make this distinction whenever it is appropriate, but in most cases we will use the two terms interchangeably.

Network Science: Graph Theory A COMMON LANGUAGE

N=4 L=4

Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION

The choice of the proper network representation determines our ability to use successfully.

In some cases there is a unique, unambiguous representation. In other cases, the representation is by no means unique.

For example, the way we assign the links between a group of individuals will determine the nature of the question we can study.

Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION

If you connect individuals that work with each other, you will explore the professional network.

Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION

If you connect those that have a romantic and sexual relationship, you will be exploring the sexual networks.

Network Science: Graph Theory CHOOSING A PROPER REPRESENTATION

If you connect individuals based on their first name (all Peters connected to each other), you will be exploring what?

It is a network, nevertheless.

Network Science: Graph Theory Section 2.3

Degree, Average Degree and NODE DEGREES

d Node degree: the number of links connected to the node.

e A t c e r

i kA =1 kB = 4

d B n U

In directed networks we can define an in-degree and out-degree. B D d C e The (total) degree is the sum of in- and out-degree. t c

e in out

r E kC  2 kC  1 kC  3

i G A D F

Source: a node with kin= 0; Sink: a node with kout= 0. A BIT OF STATISTICS

Network Science: Graph Theory AVERAGE DEGREE d N e 1 2L t k  ki c j  k º N i1 e N r i i d n N – the number of nodes in the graph U

D 1 N 1 N B k in  k in , k out  k out , k in  k out d C  i  i N i1 N i1 e t c

e E r i A L D k º F N

Network Science: Graph Theory

Average Degree

Network Science: Graph Theory DEGREE DISTRIBUTION

Degree distribution P(k): probability that a randomly chosen node has degree k

Nk = # nodes with degree k

P(k) = Nk / N ➔ plot DEGREE DISTRIBUTION DEGREE DISTRIBUTION

Discrete Representation: pk is the probability that a node has degree k.

Continuum Description: p(k) is the pdf of the degrees, where

k2 ò p(k)dk k1

represents the probability that a node’s degree is between k1 and k2.

Normalization condition: ¥ ¥

å pk =1 ò p(k)dk =1 0 K min

where Kmin is the minimal degree in the network.

Network Science: Graph Theory UNDIRECTED VS. DIRECTED NETWORKS Undirected Directed

Links: undirected (symmetrical) Links: directed (arcs).

Graph: Digraph = : L A D F M B An undirected C link is the I superposition of D two opposite directed links. B G E G H A C F Undirected links : Directed links : coauthorship links URLs on the www Actor network phone calls protein interactions metabolic reactions

Network Science: Graph Theory Section 2.2 Reference Networks Question 4

Q4: Adjacency Matrices Section 2.4

Adjacency matrix ADJACENCY MATRIX

4 4

3 3 2 2 1 1

Aij=1 if there is a link between node i and j

Aij=0 if nodes i and j are not connected to each other.

Note that for a directed graph (right) the matrix is not symmetric. if there is a link pointing from node j and i

Network Science: Graph Theory if there is no link pointing from j to i. ADJACENCY MATRIX AND NODE DEGREES æ 0 1 0 1ö N 4 ki = åAij d ç ÷ 1 0 0 1 j =1 e ç ÷ t A = ij N c ç 0 0 0 1÷

e 3 ç ÷ k j = å Aij r 2 i è 1 1 1 0ø i=1 1 d

n Aij = A ji 1 N 1 N U L = åki = å Aij A 0 2 2 ii = i=1 ij

æ 0 0 0 0 ö ç ÷ d 4 1 0 0 1 e A = ç ÷ t ij ç ÷ N c 0 0 0 1 out

e ç ÷ k j = åAij r è 1 0 0 0 ø

i 3 2 i=1

D N N N 1 in out L = åki = åk j = å Aij Aij ¹ A ji i=1 j=1 i, j Aii = 0

ADJACENCY MATRIX

a e a b c d e f g h a 0 1 0 0 1 0 1 0 b 1 0 1 0 0 0 0 1 c 0 1 0 1 0 1 1 0 d 0 0 1 0 1 0 0 0 h b d e 1 0 0 1 0 0 0 0 f 0 0 1 0 0 0 1 0 g 1 0 1 0 0 0 0 0 f h 0 1 0 0 0 0 0 0

g c

Network Science: Graph Theory ADJACENCY MATRICES ARE SPARSE

Network Science: Graph Theory Section 4

Real networks are sparse

The maximum number of links a network æ Nö N(N -1) of N nodes can have is: Lmax = ç ÷ = è 2 ø 2

A graph with degree L=Lmax is called a complete graph, and its average degree is =N-1

Network Science: Graph Theory REAL NETWORKS ARE SPARSE

Most networks observed in real systems are sparse:

L << Lmax or <

6 12 WWW (ND Sample): N=325,729; L=1.4 10 Lmax=10 =4.51 7 Protein (S. Cerevisiae): N= 1,870; L=4,470 Lmax=10 =2.39 5 10 Coauthorship (Math): N= 70,975; L=2 10 Lmax=3 10 =3.9 6 13 Movie Actors: N=212,250; L=6 10 Lmax=1.8 10 =28.78

(Source: Albert, Barabasi, RMP2002) Network Science: Graph Theory ADJACENCY MATRICES ARE SPARSE

Network Science: Graph Theory METCALFE’S LAW

The maximum number of links a network æ Nö N(N -1) of N nodes can have is: Lmax = ç ÷ = è 2 ø 2

Network Science: Graph Theory

Section 2.6

WEIGHTED AND UNWEIGHTED NETWORKS WEIGHTED AND UNWEIGHTED NETWORKS METCALFE’S LAW

The maximum number of links a network æ Nö N(N -1) of N nodes can have is: Lmax = ç ÷ = è 2 ø 2

Network Science: Graph Theory

Section 2.7

BIPARTITE NETWORKS BIPARTITE GRAPHS bipartite graph (or bigraph) is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets.

Examples:

Hollywood actor network Collaboration networks Disease network (diseasome)

Network Science: Graph Theory GENE NETWORK – DISEASE NETWORK

DISEASOME PHENOME

GENOME

Gene network Disease network

Goh, Cusick, Valle, Childs, Vidal & Barabási, PNAS (2007) Network Science: Graph Theory HUMAN DISEASE NETWORK Ingredient-Flavor Bipartite Network

Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási Flavor network and the principles Network Science: Graph Theory of food pairing , Scientific Reports 196, (2011).

Section 2.8

PATHOLOGY PATHS

A path is a sequence of nodes in which each node is adjacent to the next one

Pi0,in of length n between nodes i0 and in is an ordered collection of n+1 nodes and n links

Pn = {i0,i1,i2,...,in} Pn = {(i0 ,i1),(i1,i2 ),(i2 ,i3 ),...,(in-1,in )}

• In a directed network, the path can follow only the direction of an arrow.

Network Science: Graph Theory DISTANCE IN A GRAPH Shortest Path, Path

B The distance (shortest path, geodesic path) between two

A nodes is defined as the number of edges along the shortest path connecting them.

C D *If the two nodes are disconnected, the distance is infinity.

B In directed graphs each path needs to follow the direction of A the arrows. Thus in a digraph the distance from node A to B (on an AB path) is generally different from the distance from node B to A C D (on a BA path).

Network Science: Graph Theory NUMBER OF PATHS BETWEEN TWO NODES Adjacency Matrix

Nij, number of paths between any two nodes i and j:

Length n=1: If there is a link between i and j, then Aij=1 and Aij=0 otherwise.

Length n=2: If there is a path of length two between i and j, then AikAkj=1, and AikAkj=0 otherwise. The number of paths of length 2: N N (2) = A A = [A2 ] ij å ik kj ij k =1

Length n: In general, if there is a path of length n between i and j, then Aik…Alj=1

and Aik…Alj=0 otherwise. The number of paths of length n between i and j is* N (n) = [An ] ij ij

*holds for both directed and undirected networks.

Network Science: Graph Theory FINDING DISTANCES: BREADTH FIRST SEARCH

Distance between node 0 and node 4:

1. Start at 0.

3 4 3 2 4 3 2 1 01 1 3 4

2 3 3 4 1 4

4 3 4 4 2 2

3

Network Science: Graph Theory FINDING DISTANCES: BREADTH FIRST SEARCH

Distance between node 0 and node 4: 1. Start at 0. 2. Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.

3 4 3 2 4 3 2 1 10 1 3 4

2 3 3 4 1 4

4 3 4 4 2 2

3

Network Science: Graph Theory FINDING DISTANCES: BREADTH FIRST SEARCH

Distance between node 0 and node 4: 1. Start at 0. 2. Find the nodes adjacent to 0. Mark them as at distance 1. Put them in a queue. 3. Take the first node out of the queue. Find the unmarked nodes adjacent to it in the graph. Mark them with the label of 2. Put them in the queue.

3 4 3 2 4 3 2 1 01 1 3 4

2 3 3 4 1 4

4 3 4 4 2 2

3

Network Science: Graph Theory FINDING DISTANCES: BREADTH FIRST SEARCH

Distance between node 0 and node 4:

1. Repeat until you find node 4 or there are no more nodes in the queue. 2. The distance between 0 and 4 is the label of 4 or, if 4 does not have a label, infinity.

3 4 3 2 4 3 2 1 0 1 3 4

2 3 3 4 1 4

4 3 4 4 2 2

3

Network Science: Graph Theory NETWORK DIAMETER AND AVERAGE DISTANCE

Diameter: dmax the maximum distance between any pair of nodes in the graph.

Average path length/distance, , for a connected graph:

where d is the distance from node i to node j 1 ij d º ådij 2 L max i, j ¹i

In an undirected graph dij =dji , so we only need to count them once:

1 d º ådij Lmax i, j>i

Network Science: Graph Theory

PATHOLOGY: summary

Shortest Path 1

2 5

3 4

A path with the shortest length between two nodes (distance).

Network Science: Graph Theory PATHOLOGY: summary

Diameter Average Path Length

1 1

2 5 2 5

3 4 3 4

The length of a longest The average length of the shortest shortest path in a graph paths for all pairs of nodes. Network Science: Graph Theory PATHOLOGY: summary

Cycle Self-avoiding Path 1 1

2 5 2 5

3 4 3 4

A path with the same start A path that does not intersect and end node. itself.

Network Science: Graph Theory PATHOLOGY: summary

Eulerian Path 1 1

2 5 2 5

3 4 3 4

A path that traverses each A path that visits each link exactly once. node exactly once.

Network Science: Graph Theory Section 2.9

CONNECTEDNESS CONNECTIVITY OF UNDIRECTED GRAPHS

Connected (undirected) graph: any two vertices can be joined by a path. A disconnected graph is made up by two or more connected components.

B B A A Largest : Giant Component

C D F C D F F F The rest: Isolates G G

Bridge: if we erase it, the graph becomes disconnected.

Network Science: Graph Theory CONNECTIVITY OF UNDIRECTED GRAPHS Adjacency Matrix

The adjacency matrix of a network with several components can be written in a block- diagonal form, so that nonzero elements are confined to squares, with all other elements being zero:

Network Science: Graph Theory CONNECTIVITY OF DIRECTED GRAPHS

Strongly connected directed graph: has a path from each node to every other node and vice versa (e.g. AB path and BA path). Weakly connected directed graph: it is connected if we disregard the edge directions.

Strongly connected components can be identified, but not every node is part of a nontrivial strongly connected component.

B E A B F

A

D E C D C G F G In-component: nodes that can reach the scc, Out-component: nodes that can be reached from the scc. Network Science: Graph Theory Section 2.9 Section 10

Clustering coefficient CLUSTERING COEFFICIENT

Clustering coefficient: what fraction of your neighbors are connected?

Node i with degree ki

Ci in [0,1]

Watts & Strogatz, Nature 1998. Network Science: Graph Theory CLUSTERING COEFFICIENT

Clustering coefficient: what fraction of your neighbors are connected?

Node i with degree ki

Ci in [0,1]

Watts & Strogatz, Nature 1998. Network Science: Graph Theory Section 11

summary THREE CENTRAL QUANTITIES IN NETWORK SCIENCE

Degree distribution: P(k)

Path length:

Clustering coefficient:

Network Science: Graph Theory GRAPHOLOGY 1 Undirected Directed 4 4 1 1

2 2 3 3 æ 0 1 1 0ö æ 0 1 0 0ö ç ÷ ç ÷ 1 0 1 1 0 0 1 1 A = ç ÷ A = ç ÷ ij ç 1 1 0 0÷ ij ç 1 0 0 0÷ ç ÷ ç ÷ è 0 1 0 0ø è 0 0 0 0ø A = 0 A = A ii ij ji A = 0 A ¹ A N ii ij ji 1 2L N L L = å Aij < k >= L = A < k >= 2 N å ij i, j=1 i, j=1 N

Actor network, protein-protein interactions WWW, citation networks Network Science: Graph Theory

GRAPHOLOGY 2 Unweighted Weighted (undirected) 4 (undirected) 4 1 1

2 2 3 3

æ 0 1 1 0ö æ 0 2 0.5 0ö ç ÷ ç ÷ 1 0 1 1 2 0 1 4 A = ç ÷ A = ç ÷ ij ç 1 1 0 0÷ ij ç 0.5 1 0 0÷ ç ÷ ç ÷ è 0 1 0 0ø è 0 4 0 0ø

Aii = 0 Aij = A ji Aii = 0 Aij = A ji 1 N 2L 1 N 2L L = A < k >= L = nonzero(A ) < k >= å ij å ij 2 i, j=1 N 2 i, j=1 N

protein-protein interactions, www Call Graph, metabolic networks Network Science: Graph Theory

GRAPHOLOGY 3 Self-interactions Multigraph 4 (undirected) 4 1 1

2 2 3 3 æ 1 1 1 0ö æ 0 2 1 0ö ç ÷ ç ÷ 1 0 1 1 2 0 1 3 A = ç ÷ A = ç ÷ ij ç 1 1 0 0÷ ij ç 1 1 0 0÷ ç ÷ ç ÷ è 0 1 0 1ø è 0 3 0 0ø

Aii ¹ 0 Aij = A ji Aii = 0 Aij = A ji 1 N N 1 N 2L L = å Aij + å Aii ? L = ånonzero(Aij ) < k >= 2 i, j=1,i¹ j i=1 2 i, j=1 N

Protein interaction network, www Social networks, collaboration networks Network Science: Graph Theory

GRAPHOLOGY 4 Complete Graph (undirected) 4 1

2 3

æ 0 1 1 1ö ç ÷ 1 0 1 1 A = ç ÷ ij ç 1 1 0 1÷ ç ÷ è 1 1 1 0ø

Aii = 0 Ai¹ j =1 N(N -1) L = L = < k >= N -1 max 2

Network Science: Graph Theory

GRAPHOLOGY: Real networks can have multiple characteristics

WWW > directed multigraph with self-interactions

Protein Interactions > undirected unweighted with self-interactions

Collaboration network > undirected multigraph or weighted

Mobile phone calls > directed, weighted

Facebook Friendship links > undirected, unweighted

Network Science: Graph Theory THREE CENTRAL QUANTITIES IN NETWORK SCIENCE

A. Degree distribution: pk B. Path length: C. Clustering coefficient:

Network Science: Graph Theory GENOME

protein-gene interactions

PROTEOME

protein-protein interactions

METABOLISM

Bio-chemical reactions

Citrate Cycle Metabolic Network Protein Interactions A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

Network Science: Graph Theory A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

Undirected network N=2,018 proteins as nodes L=2,930 binding interactions as links. Average degree =2.90.

Not connected: 185 components the largest (giant component) 1,647 nodes

Network Science: Graph Theory A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

pk is the probability that a node has degree k.

Nk = # nodes with degree k

pk = Nk / N

Network Science: Graph Theory A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

dmax=14

=5.61

Network Science: Graph Theory A CASE STUDY: PROTEIN-PROTEIN INTERACTION NETWORK

=0.12

Network Science: Graph Theory