NETWORK THEORY

Dr. Alioune Ngom School of University of Windsor [email protected]

Winter 2013

1 What is a Network?

Network is a mathematical structure composed of points connected by lines

Network Theory <->

Network  Graph

Nodes  Vertices (points)

Links  Edges (Lines)

F. Harary, Graph Theory, Addison Wesley, Reading, MA, 1969 Gross & Yellen, Handbook of Graph Theory, CRC Press, Boca Raton, FL, 2004

A network can be build for any functional system System vs. Parts = Networks vs. Nodes Networks As Graphs

 Networks can be undirected or directed, depending on whether the interaction between two neighboring nodes proceeds in both directions or in only one of them, respectively.

1 2 3 4 5 6

 The specificity of network nodes and links can be quantitatively characterized by weights 2.5 7.3 3.3 12.7 5.4

8.1 2.5 -Weighted Edge-Weighted Networks As Graphs - 2

A network can be connected (presented by a single ) or disconnected (presented by several disjoint components).

connected disconnected Networks having no cycles are termed trees. The more cycles the network has, the more complex it is.

trees cyclic graphs Networks As Graphs - 3 Some Basic Types of Graphs

Paths

Stars

Cycles

Complete Graphs

Bipartite Graphs Air Transportation Network The Fragment of a (Melburn, 2004) Biological Networks

A. Intra-Cellular Networks Protein interaction networks Metabolic Networks Signaling Networks Regulatory Networks Composite networks Networks of Modules, Functional Networks Disease networks B. Inter-Cellular Networks Neural Networks C. Organ and Tissue Networks

D. Ecological Networks

E. Evolution Network L-A Barabasi

miRNA ______regulation? - protein-gene - interactions

protein-protein interactions

METABOLISM

Bio-chemical reactions

Citrate The Protein Network of Drosophila

CuraGen Corporation Science, 2003 Metabolic Networks

Source: ExPASy Apoptosis Pathway - 1

Apoptosis is a mechanism of controlled cell death critically important in many biological processes

Cleavage of Caspase Substrates DISC CASP6

CASP10

Heterodimer DFF

FAS-L FAS-R FADD CASP3

Membrane Death protein CASP8 DFF45 DFF40 activator

Death-Inducing Signaling Complex CASP7 Initiator Caspases Start DNA Executor Fragmentation Caspases

D. Bonchev, L.B. Kier, C. Cheng, Lecture Series on Computer and Computational Sciences 6, 581-591 (2006). Gene Regulation Networks The Longevity Gene-Protein Network (LGPN)

C. elegans

T. Witten, D. Bonchev, Network of Interacting Pathways (NIP)

381 organisms

A.Mazurie D.Bonchev G.A. Buck, 2007 Functional Networks Yeast: 1400 proteins, 232 complexes, nine functional groups of complexes

Cell Cycle Cell Polarity & Structure (Data A.-M. Gavin 13 7 8 Number of protein complexes et al. (2002) Nature 111 25 40 61 Number of proteins Transcription/DNA 415,141-147) 77 19 Number of shared proteins Maintenance/Chromatin 14 11 15 Structure 30 16 27 7 22 55 187 43 Intermediate 740 94 33 221 and Energy 73 83 37 103 65 Signaling 11 20 13 20 Membrane 125 53 147 Biogenesis & Turnover 35 321 19 41 299 49

75 97 5 RNA 28 9 33 Protein Synthesis 24 260 Metabolism 692 6 419 and Turnover 172 12 75 160 Protein RNA / Transport

D. Bonchev, Chemistry & Biodiversity 1(2004)312-326 Summary

 All complex networks in nature and technology have common features.

 They differ considerably from random networks of the same size

 By studying network structure and dynamics, and by using comparative network analysis, one can get answers of important biological questions. Some Fundamental Biological Questions to Answer

(i) Which interactions and groups of interactions are likely to have equivalent functions across species?

(ii) Based on these similarities, can we predict new functional information about proteins and interactions that are poorly characterized?

(iii) What do these relationships tell us about the evolution of proteins, networks and whole species?

(iv) How to reduce the noise in biological data: Which interactions represent true binding events? False-positive interaction is unlikely to be reproduced across the interaction maps of multiple species. Why Study Networks?

 It is increasingly recognized that complex systems cannot be described in a reductionist view.

 Understanding the behavior of such systems starts with understanding the topology of the corresponding network.

 Topological information is fundamental in constructing realistic models for the function of the network. Properties of Biological Networks

 Large network comparison is computationally hard due to NP- completeness of the underlying subgraph isomorphism problem:

• Given 2 graphs G and H as input, determine whether G contains a subgraph that is isomorphic to H.

 Thus, network comparisons rely on easily computable heuristics (approximate solutions), called “network properties”

 Network properties can roughly & historically be divided in two categories:

1. Global network properties: give an overall view of the network, but might not be detailed enough to capture complex topological characteristics of large networks.

2. Local network properties: more detailed network descriptors which usually encompass larger number of constraints, thus reducing degrees of freedom in which the networks being compared can vary.

21 Biological Networks Properties

 Scale-Free - Power law distribution: Rich get richer

 Small World: A small average length  Mean shortest node-to-node path

 Robustness: Resilient and have strong resistance to failure on random attacks and vulnerable to targeted attacks

 Hierarchical : A large  How many of a node’s neighbors are connected to each other

Global Network Properties

Readings: Chapter 3 of “Analysis of biological networks” by Junker and Björn

 Global Network Measures: 1) P(k) 2) Average clustering coefficient 3) Clustering spectrum 4) Network Diameter 5) Average Diameter 6) Mean Path Length 7) Spectrum of shortest path lengths 8) 9) … etc

23 Global Network Properties - Degree Distribution

x

Definitions: deg(x)=5

 degree of a node is the number of edges incident to the node.

 Average degree of a network: average of the degrees over all nodes in the network.

However, avg. deg might not be representative, since the distribution of degrees might be skewed.

24 Global Network Properties – Degree Distribution

 Let P(k) be the percentage of nodes of degree k in the network. The degree distribution is the distribution of P(k) over all k.

 P(k) can be understood as the probability that a node has degree k.

 The degree distribution is the probability distribution function P(k), which shows the probability that the degree of a randomly selected node is k.

25

Degree Distribution

# of nodes of # k degree having 10

1 2 3 4 Degree

Degree Distribution

P(k)

1

1 2 3 4 Degree Any randomness in the network will broaden the shape of this peak

Degree Distribution

# of nodes of # k degree having 4 2

1 2 3 4 Degree

Degree Distribution

P(k)

0.5 0.25

1 2 3 4 Degree Degree Distribution  k P() k e k! Poisson’s Distribution

e = 2.71828..., the Base of natural Logarithms

Degree distribution of random graphs follow Poisson’s distribution Degree Distribution P(k)

P(k) ~ k-γ

Power Law Distribution Connectivity k

Degree distribution of many biological networks follow Power Law distribution

Power Law Distribution on log-log plot is a straight line Degree distributions

fk = fraction of nodes with degree k frequency = probability of a randomly selected node to have degree k

fk

k degree

 Why measure the degree distribution? The degree distribution is a “fingerprint” of the network– it allows us to generally characterize its structure Degree distribution from a random network

What if we constructed a network by adding edges between proteins at random?

Log-log plot:

Frequency

Node degree

Properties:  highly concentrated around the mean  the probability of very high degree nodes is exponentially small

Barabasi, Oltvai. Network : Understanding the cell’s functional . Nature Reviews Genetics 5, 101-113 (2004). What about the degree distribution of real networks?

Random network: Yeast 2-hybrid interaction network Hawoong Jeong et al. Oltvai and lethality of protein networks. Nature 411, 41-42 (2001)

What about other types of real networks?

Random

Conclusion: many real networks have the same fingerprint! [Newman, 2003] Global Network Properties – Scale-Freeness 1) Degree Distribution

 Example:

(log-log plot)

 Here P(k) ~ k-γ , where often 2 ≤ γ < 3. This is a power-law, heavy-tailed distribution.

 Networks with power-law degree distributions are called scale-free networks. In them, most of the nodes are of low degree, but there is a small number of highly- linked nodes (nodes of high degree) called “hubs.” 36 WHAT DOES SCALE FREE REALLY MEAN, ANYWAY?

P(k) is probability of each degree k

For scale free: P(k) ~ kg

What happens for small vs. large g? Random vs

 Erdos-Renyi Start with N nodes and connect each pair with equal probability p

 Scale-free Add nodes incrementally. New nodes connect to each existing node I with probability proportional to its degree:

kI

k  J J

Scale-free networks have small avg. path lengths ~ log (log N)– this is called the ‘small world’ effect Global Network Properties – Small-World Network

 Most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops  Small average path length  Any node can be reached within a small number of edges, 4~5 hops.

 Random networks:  mean path length ~ log N (N- number of nodes)

 Scale-free networks:  “ultra-small-world”  Mean path length ~ log (log N) Global Properties -- Scale-Free Network (or, Power Law Network)

 PREFERENTIAL ATTACHMENT on Growth: the probability that a new vertex will be connected to vertex i depends on the connectivity of that vertex (vertex degree):

ki ()ki k j j The Barabási-Albert [BA] model

ER Model WS Model Actors Power Grid www (a) Random Networks (b) Power law Networks

Power Law Network (Scale Free)

 The probability of finding a highly connected node decreases PKK( ) ~ g exponentially with k: Global Network Properties – Degree Distribution

 Another Example:

average degree is meaningful

Here P(k) is a Poisson distribution.

42 Global Network Properties – Degree Distribution

 However: degree distribution (and global properties in general) are weak predictors of network structure.  Illustration:

G and H are of the same size (i.e.,|G|=|H| -- they have the same number of nodes and edges) and they have same degree distribution, but G and H have very different topologies (i.e., graph stucture).

43 Examples:

G Global Network Properties – Scale-Free Network

 Power-law degree distribution & Small world phenomena also observed in:  communication networks  web graphs  research citation networks  social networks

 Classical -Erdos-Renyi type random graphs do not exhibit these properties:  Links between pairs of fixed set of nodes picked uniformly:  Maximum degree logarithmic with network size  No hubs to make short connections between nodes “Scale-free” networks

 Many real networks have a power-law distributed degree distribution

log p(k) = -α logk + logC

log frequency frequency α

degree log degree  α : power-law exponent (typically 2 ≤ α ≤ 3) Summary: Random vs. scale-free network

Albert L. Barabasi, 2002, LINKED: The New Science of Networks Attack Tolerance

 Complex systems maintain their basic functions even under errors and failures (cell  mutations;  router breakdowns)

node failure Attack Tolerance

 Robust. For g<3, removing nodes does not break network into islands.

 Very resistant to random attacks, but attacks targeting key nodes are

more dangerous.

Path Length Path Research debates… • Assortative vs. disassortative mixing of degrees: – Do high-degree nodes interact with high-degree nodes? – Done by: – Pearson corr. coefficient between degrees of adjacent vertices – Average neighbor degree; then average over all nodes of degree k • Structural robustness and attack tolerance: – “Robust, yet fragile” • Scale-free degree distribution: – “Party” vs. “date” hubs • J.D. Han et al., Nature, 430:88-93, 2004 – Bias in the data collection – sampling? • M. Stumpf et al., PNAS, 102:4221-4224, 2005 • J. Han et al., Nature Biotechnology, 23:839-844, 2005 • High degree nodes: – Essential • H. Jeong at al., Nature 411, 2001. – Disease/cancer genes • Jonsson and Bates, , 22(18), 2006 • Goh et al., PNAS, 104(21), 2007 50 50 Global Network Properties – Clustering Coefficient

 Definition:

clustering coefficient Cv of a node v:

Cv = |E(N(v))|/(max possible number of edges in N(v)) Where N(v) the neighborhood of v, i.e., all nodes adjacent to v

Cv can be viewed as the probability that two neighbors of v are connected.

Thus 0 ≤ Cv ≤ 1.

By definition: For vertex v of degree 0 or 1, by definition Cv=0.

51

Global Network Properties – Clustering Coefficient

The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity # edges between node I’s neighbors n 2n C  I  I I k  k k 1 # of neighbors of I   2

C(k) = avg. clustering The combination coefficient for nodes of “k choose 2” degree k Global Network Properties – Clustering Coefficient

• Example:

 |N(v)|= 4, since there are 4 nodes in N(v), i.e., N(v)= {1, 2, 3, 4}

 |E(N(v))|= 3, since there are 3 edges between nodes in N(v)

 Max possible number of edges between nodes in N(v) is: choose(4,2) = 6.

 Therefore Cv= 3/6 = 1/2

53 Global Network Properties – Clustering Coefficient

 Definition: average clustering coefficient, C, of a network is the average

Cv over all the nodes v∈ V.

 In real- nets: If A is connected to B and B is connected to C, then it is highly probable that A is also connected to C.

 Reflects the completeness of a networks

54 Clustering coefficient

2E N C  i 1 i CC  i kkii( 1) N i1 ki = # of neighbors of node i

Ei = # of edges among the neighbors of node i

a c

d b f

e Clustering coefficient

2E N C  i 1 i CC  i kkii( 1) N i1 Ca=2*1/2*1= 1 ki = # of neighbors of node i

Ei = # of edges among the neighbors of node i

a c

d b f

e Clustering coefficient

2E N C  i 1 Ca=2*1/2*1= 1 i CC  i kkii( 1) N i1 Cb=2*1/2*1= 1 ki = # of neighbors of node i Cc=2*1/3*2= 0.333

Ei = # of edges among the Cd=2*1/3*2= 0.333 neighbors of node i Ce=2*1/2*1= 1

Cf=2*1/2*1= 1 a c Total = 4.666 d b f C =4.666/6= 0.7776

e Clustering coefficient

By studying the average clustering C(k) of nodes with a given degree k, information about the actual modular organization can be extracted. Ca=2*1/2*1= 1

Cb=2*1/2*1= 1

Cc=2*1/3*2= 0.333

Cd=2*1/3*2= 0.333 a Ce=2*1/2*1= 1 c Cf=2*1/2*1= 1 d b f C(1)=0

C(2)=(Ca+Cb+Ce+Cf)/4=1 e C(3)=(Cc+Cd)/2=0.333 Clustering coefficient

By studying the average clustering C(k) of nodes with a given degree k, information about the actual modular organization can be extracted.

For most of the known biological networks the average clustering coefficient follows the power-law.

C(k) ~ k-γ Power Law Distribution Global Network Properties – Clustering Spectrum

 Definition: clustering spectrum, C(k), is the distribution of the average clustering coefficients of all nodes of degree k in the network, over all k.

Example:

60 2) And 3) Clustering Coefficient and Spectrum

• Cv – Clustering coefficient of node v CA= 1/1 = 1 CB = 1/3 = 0.33 G CC = 0 CD = 2/10 = 0.2 …

• C = Avg. clust. coefficient of the whole network

= avg {Cv over all nodes v of G}

• C(k) – Avg. clust. coefficient of all nodes of degree k

E.g.: C(2) = (CA + CC)/2 = (1+0)/2 = 0.5

=> Clustering spectrum

E.g. (not for G)

Need to evaluate whether the value of C (or any other property) is statistically significant. Global Network Properties -- Diameter

 Definition: the between two nodes is the smallest number of links that have to be traversed to get from one node to the other.

 Definition: the shortest path is the path that achieves that distance.

 Definition: Diameter = Maximum over all shortest paths in the network.

 Definition: the average network diameter is the average of shortest path lengths over all pairs of nodes in a network. 62 Global Network Properties -- Spectrum of shortest path lengths

 Definition: Let S(d) be the percentage of node pairs that are at distance d. The spectrum of shortest path lengths is the distribution of S(d) over d. Example:

63 4) and 5) Average Diameter and Spectrum of Shortest Path Lengths u • Distance between a pair of nodes u and v:

G Du,v = min {length of all paths between u and v} = min {3,4,3,2} = 2 = dist(u,v)

v • Average diameter of the whole network:

D = avg {Du,v for all pairs of nodes {u,v} in G}

• Spectrum of the shortest path lengths

E.g. (not for G) Average Path length

Distance between node u and v called d(u,v) is the least length of a path from u to v. d(a,e) = ?

a c

d b f

e Average Path length

Distance between node u and v called d(u,v) is the least distance of a path from u to v. d(a,e) = ? Length of a-b-c-d-f-e path is 5 a c

d b f

e Average Path length

Distance between node u and v called d(u,v) is the least distance of a path from u to v. d(a,e) = ? Length of a-b-c-d-f-e path is 5 a c Length of a-c-d-f-e path is 4

d b f

e Average Path length

Distance between node u and v called d(u,v) is the least length of a path from u to v d(a,e) = ?

a Length of a-b-c-d-f-e path is 5 c Length of a-c-d-f-e path is 4 d Length of a-c-d-e path is 3 b f

e

The minimum length of a path from a to e is 3 and therefore d(a,e) = 3. Average Path length Average path length L of a network is defined as the mean distance between all pairs of nodes.

a c

d There are 6 nodes and b f 6 C2 = (6!)/(2!)(4!)=15 distinct pairs for example e (a,b), (a,c)…..(e,f).

We have to calculate distance between each of these 15 pairs and average them Average Path length Average path length L of a network is defined as the mean distance between all pairs of nodes.

a a to b 1 c a to c 1 d a to d 2 b f a to e 3 e a to f 3 ------L=27/15=1.8 ------Average path length of most ______real is small 15 pairs 27(total length) Average Path length

Finding average path length is not easy when the network is big enough. Even finding shortest path between any two pair is not easy. A well known algorithm is as follows: Dijkstra E.W., A note on two problems in connection with Graphs”, Numerische Mathematik, Vol. 1, 1959, 269-271. Dijkstra’s algorithm can be found in almost every book of graph theory. There are other algorithms for finding shortest paths between all pairs of nodes. Diameter Distance between node u and v called d(u,v) is the least length of a path from u to v. The longest of the distances between any two node is called Diameter a to b 1 a c a to c 1

d a to d 2 b f a to e 3 a to f 3 e ------Diameter of this graph is 3 15 pairs Eccentricity And Radius Eccentricity of a node u is the maximum of the distances of any other node in the graph from u. The radius of a graph is the minimum of the eccentricity values among all the nodes of the graph. 2 a to b 1 a 3 a to c 1 c 2 d 3 a to d 2 b f a to e 3 3 a to f 3 e 3 Therefore eccentricity of Radius of this graph is 2 node a is 3 Global Network Properties -- Centralities

(Readings: Chapter 3 of “Analysis of biological networks”-Junker,Björn)

 Rank nodes according to their “topological importance”

 Definition:  Centrality quantifies the topological importance of a node (edge) in a network. There are many different types of centralities.

 Different types of centralities:  Degree centrality   Eccentricity centrality   Subgraph centrality 

 Software tools: Visone (social nets) and CentiBiN (biological nets) 74 Global Network Properties -- Centralities

 Definitions:

1. Degree centrality, Cd(v): nodes with a large number of neighbors (i.e., edges) have high degree centrality. Therefore, we have Cd(v)=deg(v).

Example of a use of degree centrality: In PPI networks, nodes with high degree centrality are considered to be “biologically important.” (E.g., essential/lethal, mutated in cancer…)

2. Closeness centrality, C (v): nodes with short paths to all other nodes in the network have high closenessc centrality 1

Cc(v)= dist(u,v) uV

 75 Global Network Properties -- Node Centralities

 Definitions:

3. Betweenness centrality, Cb(v): Nodes (or edges) which occur in many of the shortest paths have high betweeness centrality. st(v) Cb(v)=  st st sv vt

Above:

The above summation means that there is a sum on the top and on the bottom of the fraction.

σst(v) = the number of shortest paths from s to t that pass through v σ = the number of all shortest paths from s to t (they may or not pass st through node v)

76 76 Global Network Properties -- Node Centralities

 Definitions:

4. Eccentricity centrality, Ce(v): nodes with short paths to any other node have high eccentricity centrality Eccentricity of a node v is defined as ecc(v) = max dist(u,v) vV So it is the maximum shortest path length from node u to all other nodes v in V.

Eccentricity centrality of a node v:

Thus, central nodes have higher Ce since they have lower ecc.

There exist many other definitions of node centralities.

77 77 Global Network Properties -- Node Centralities

 Example:

Degree Closeness Betweeness From highest D F, G H F, G D, H F, G to A, B A, B I C, E, H C, E D lowest I I A, B J J C, E, J 78 Global Network Properties -- Node Centralities

 You need to know how to compute these centralities (and all other network properties) by hand on small networks.

 For large real-world networks, you could use software, e.g., CentiBiN.  http://centibin.ipk-gatersleben.de/

79 Local Network Properties

(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

 They encompass a larger number of constraints, thus reducing degrees of freedom in which networks (also nodes) being compared can vary

 How do we show that two networks are different?

 How do we show that they are the same?

 How do we quantify the level of similarity?

80 Local Network Properties

(Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber)

1) Network motifs 2) Graphlets Two network comparison measures based on graphlets: 2.1) Relative Graphlet Frequence Distance between two networks 2.2) Graphlet Degree Distribution Agreement between two networks

81 81 Subgraphs Consider a graph G=(V,E). The graph G'=(V',E') is a subgraph of G if V' and E' are respectively subsets of V and E.

a c

b Subgraph of G a c

d c b f d f e Subgraph of G Graph G Induced Subgraphs An induced subgraph on a graph G on a subset S of nodes of G is obtained by taking S and all edges of G having both end- points in S. a c

b Induced subgraph of a G for S={a, b, c} c

d c b f d f e Induced subgraph of Graph G G for S={c, d, f} Graphlets Graphlets are non-isomprphic induced subgraphs of large networks T. Milenkovic, J. Lai, and N. Przulj, GraphCrunch: A Tool for Large Network Analyses, BMC Bioinformatics, 9:70, January 30, 2008. Partial subgraphs/Motifs A partial subgraph on a graph G on a subset S of nodes of G is obtained by taking S and some of the edges in G having both end-points in S. They are sometimes called edge subgraphs.

a c

b a c Partial subgraph of G d For S={a, b, c} b f

e Graph G Partial subgraphs/Motifs Genomic analysis of regulatory reveals large topological changes Nicholas M. Luscombe, M. Madan Babu, Haiyuan Yu, Michael Snyder, Sarah A. Teichmann & Mark Gerstein, NATURE | VOL 431| 2004

SIM MIM FFL

SIM=Single input motif MIM= Multiple input motif FFL=Feed forward This paper searched for these motifs in transcriptional regulatory network of Saccharomyces cerevisiae Local Network Properties -- Network Motifs

(’s group, 2002-2004)

 Definition: A is a small over-represented partial subgraph of real network.

Here, over-represented means that it is over- represented when compared to networks coming from a model.

Problem: What is expected at random, i.e., which network “null model” to use to identify motifs?

87 Local Network Properties -- Network Motifs

Example of a random graph model:  Erdos-Renyi (ER) random graphs – Definition:  A graph on n nodes (for some positive integer n)  Edges are added between pairs of nodes uniformly at random with same probability p

ER graphs usually have a small number of dense (in term of number of edges) subgraphs There will be no regions in the network that have large density of edges. Why?

88 Local Network Properties -- Network Motifs

Example:

If motifs are identified when comparing the data with ER model networks, every dense subgraph would come up as a motif because they do not exist in our ER model networks.

89 Network motifs (Uri Alon’s group, ’02-’04)

 Small subgraphs that are overrepresented in a network when compared to randomized networks

 Network motifs: Feed-forward loop  Reflect the underlying evolutionary processes that generated the network  Carry functional information  Define superfamilies of networks 

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1  But:  Functionally important but not statistically significant patterns could be missed  The choice of the appropriate null model is crucial, especially across “families”

90 Network motifs (Uri Alon’s group, ’02-’04)

• Small subgraphs that are overrepresented in a network when compared to randomized networks

• Network motifs: – Reflect the underlying evolutionary processes that generated the network – Carry functional information – Define superfamilies of networks 

- Zi is statistical significance of subgraph i, SPi is a vector of numbers in 0-1 • But: – Functionally important but not statistically significant patterns could be missed – The choice of the appropriate null model is crucial, especially across “families” – Random graphs with the same in- and out- degree distribution as data might not be the best network null model 91 – Motifs are partial subgraphs, while we use induced ones to understand network structure

Local Network Properties -- Network Motifs

Example: Feed-forward loop

Shen-Orr, Milo, Mangan, and Alon, “Network motifs in the transcriptional regulation network of Escherichia coli,” Nature Genetics, 2002 92 Network motifs (Uri Alon’s group, ’02-’04) http://www.weizmann.ac.il/mcb/UriAlon/

Also, see Pajek, MAVisto, and FANMOD 93 Graphlets (Przulj group, ’04-’08)

_____

Different from network motifs:  Induced subgraphs  Of any frequency (don’t need to be over-represented)

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling : Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

2.1) Relative Graphlet Frequency (RGF) distance between networks G and H:

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Graphlet Degree Distributions

Generalize node degree

N. Przulj, “ Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007. Network structure vs. biological function & disease

Graphlet Degree (GD) vectors, or “node signatures”

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008. Similarity measure between “node signature” vectors

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008. Signature (or GDV) Similarity Measure between nodes u and v:

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

YBR095C SMD1 40%

PMA1

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible).

90%*

SMD1 RPO26 SMB1

*Statistically significant threshold at ~85%

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008:6 257-273, 2008 (Highly Visible). Use this and other techniques to link network structure with biological function:

 cluster nodes of a net. using their “signature similarity”  can use various clustering methods introduced in previous lectures  obtained clusters are statistically significantly enriched with: a particular biological function, or membership in the same protein complexes, or the same sub-cellular localization, tissue coexpression, involvement in pathways, diseases...

 predict function of uncharacterized prot’s in clusters Generalize Degree Distribution of a network

The degree distribution measures: • the number of nodes “touching” k edges for each value of k

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007. N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007. / sqrt(2) ( to make it between 0 and 1)

This is called Graphlet Degree Distribution (GDD) Agreement between networks G and H. Software that implements many of these network properties and compares networks with respect to them: GraphCrunch http://bio-nets.doc.ic.ac.uk/graphcrunch/ Software that implements many of these network properties and compares networks with respect to them: GraphCrunch http://bio-nets.doc.ic.ac.uk/graphcrunch2/