<<

Overview of

MAE 298, Spring 2009, Lecture 1

Prof. Raissa D’Souza University of California, Davis Example social networks (Immunology; viral marketing; aliances/policy)

M. E. J. Newman The (Robustness to failure; optimizing future growth; testing protocols on sample topologies)

H. Burch and B. Cheswick A typical web domain (Web search/organization and growth centralized vs. decentralized protocols)

M. E. J. Newman The airline network (Optimization; dynamic external demands)

Continental Airlines The power grid (Mitigating failure; Distributed sources)

M. E. J. Newman Biology: Networks at many levels Control mechanisms / drug design/ gene therapy / biomarkers of disease Cellular networks:

GENOME

• Genome, Proteome: protein-gene Dandekar Lab interactions

PROTEOME • Metabolome: protein-protein Fiehn Lab interactions

METABOLISM

• Data intergration Bio-chemical BIOshare reactions

Lin, Genome Center Citrate

• Network structure / search for biomarkers: D’Souza Software systems (Highly evolveable, modular, robust to mutation, exhibit punctuated eqm) Open-source software as a “systems” paradigm.

Networks:

• Function calls

• Email communication

• Socio-Technical congruence

Bird, Devanbu, D’Souza, Filkov, Saul, Wen Networks: Physical, Biological, Social

• Geometric versus virtual (Internet versus WWW).

• Natural /spontaneously arising versus engineered /built.

• Each network optimizes something unique.

• Identifying similarities and fundamental differences can guide future design/understanding.

• Interplay of topology and function ?

• Unifying features: – Broad heterogeneity in . – Small Worlds (Diameter ∼ log(N)). Explosion of work and tools

• R, Graphviz, Pajek, igraph, Network Workbench, NetworkX, Netdraw, UCInet, Bioconductor, Ubigraph.... Natn Acam Sciences/Natn Research Council Study (2005)

“all our modern critical infrastructure relies on networks... too much emphasis on specific applications/jargon/disciplinary stovepipes... need a cross-cutting science of networks... Research for the 21st century” How do we represent a network as a mathematical object? NETWORK TOPOLOGY

Connectivity , M: ( 1 if edge exists between i and j Mij = 0 otherwise.

  1 1 1 1 0  1 1 0 1 0       1 0 1 0 0  = M    1 1 0 1 1  0 0 0 1 1

Node degree is number of links. Typical measures of network topology

(fraction of nodes with degree k, for all k)

• Clustering coefficient (fraction of triangles in the graph/transitivity: Are my friends friends with each other?)

Also a local measure, for each node ci is number of connections existing between neighbors/total number of possible connections. Typical measures of network topology, cont

• Diameter (Greatest between any two connected nodes) “Small world” if d ∼ log N and strong clustering. (Watts Stogatz, Nature 393, 1998.)

• Betweenness (Fraction of shortest paths passing through a node, i.e., is a node a bottleneck for flow?) Typical measures of network topology, cont

• Assortative/dissortative mixing (Are nodes with similar attributes more or less likely to link to each other? Mixing by node degree common. Also, in social networks mixing by gender and race.)

(Example of assortative mixing by race. Friendship network of HS students: White, African American and Other.) Network Activity: FLOWS on NETWORKS (Spread of disease, data, materials transport/flow)

Random walk on the network has state transition matrix, P :

  1/4 1/3 1/2 1/4 0  1/4 1/3 0 1/4 0       1/4 0 1/2 0 0  = P    1/4 1/3 0 1/4 1/2  0 0 0 1/4 1/2

The eigenvalues and eigenvectors convey much information. Markov Chains, Spectral Gap. Random walk on the WWW is the “Page Rank”

Page Rank of a node is the steady-state random walk occupancy probabilty. Example Eigen-technique: (Political Books 2004)

M. Girvan and M. E. J. Newman The “classic” , G(N, p) (The Null Model)

• P. Erdos¨ and A. Renyi,´ “On random graphs”, Publ. Math. Debrecen. 1959. • P. Erdos¨ and A. Renyi,´ “On the evolution of random graphs”, Publ. Math. Inst. Hungar. Acad. Sci. 1960. • E. N. Gilbert, “Random graphs”, Annals of Mathematical Statistics, 1959.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • N ● ● ● ● ● Start with isolated vertices. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • Add random edges one-at-a-time. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● N(N − 1)/2 total edges possible. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • After E edges, probability p of any ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● edge is p = 2E/N(N − 1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● What does the resulting graph look like? (Typical member of the ensemble) N=300

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

p = 1/400 = 0.0025 p = 1/200 = 0.005 Emergence of a “giant

• pc = 1/N.

• p < pc, Cmax ∼ log(N)

• p > pc, Cmax ∼ A · N

(Ave node degree t = pN so tc = 1.)

Branching process (Galton-Watson); “”-like at tc = 1. Is connectivity a good thing?

• Communication, transportation networks

• Spreading of a virus (human or computer) Random graphs as real-world networks?

• What about degree distribution, clustering, ....? – Shown later, Erdos-Renyi yields a Poisson degree distribution, but “configuration” models work around this. – Still need null models to match other properties.

• “Network Analysis in the Social Sciences”, S. P. Borgatti, A. Mehra, D. J. Brass, G. Labianca, Science 323, 892-895, 2009.

– Why would a real network look like a random one? – Local properties of nodes and edges, not statistics of the network.

• Developing the correct null models? Degree distribution of “real-world” networks

Extremely broad range of node degree observed: from biological, to technological, to social. Typical distribution in node degree

The “Internet” “Who-is-Who” network Faloutsos3, SIGCOMM 1999 Szendroi¨ and Csanyi´ p(k) ∼ k−2.16 p(k) = ck−γe−αk

10000 10000 "971108.out" "980410.out" exp(7.68585) * x ** ( -2.15632 ) exp(7.89793) * x ** ( -2.16356 )

1000 1000

100 100

10 10

1 1 1 10 100 1 10 100

• Small data sets, power laws vs other similar distributions?

10000 • What is the “Internet”/ what10000 level? (e.g., vs AS) "981205.out" "routes.out" exp(8.11393) * x ** ( -2.20288 ) exp(8.52124) * x ** ( -2.48626 )

1000 1000

100 100

10 10

1 1 1 10 100 1 10 100 Power law with exponential tail

Ubiquitous empirical measurements:

System with: p(x) ∼ x−B exp(−x/C) B C Full protein-interaction map of Drosophila 1.20 0.038 High-confidence protein-interaction map of Drosophila 1.26 0.27 Gene-flow/hydridization network of plants as function of spatial distance 0.75 105 m Earthquake magnitude 1.35 - 1.7 ∼ 1021 Nm Avalanche size of ferromagnetic materials 1.2 - 1.4 L1.4 ArXiv co-author network 1.3 53 MEDLINE co-author network 2.1 ∼ 5800 PNAS paper citation network 0.49 4.21 What is a power law?

(Also called a “Pareto Distribution” in statistics).

−γ pk ∼ k

ln pk ∼ −γ ln k 1e−01 1e−04 p(k) 1e−07 1e−10 1 100 10000 k Power Laws versus Bell Curves: “Heavy tails” −γ • Power law distribution: pk ∼ k . 2 2 • Gaussian distribution: pk ∼ exp(−k /2σ ). 1.0 1e−08 0.8 0.6 1e−20 p(k) p(k) 0.4 1e−32 0.2 1e−44 0.0 1e−56 0 100 200 300 400 500 1 2 5 10 20 50 100 500

k k If 1 < γ < 2, mean and variance → ∞. If 2 < γ < 3 mean is finite, but variance → ∞. Degree distribution and Network Growth Models

• Heterogeneity in real networks.

• Concentrated, Poisson Distribution in Erdos-R¨ enyi´ :

– Probability to connect to k nodes is pk.

– Probability to be disconnected from remaining (n − k) is (1 − p)(n−k).

– Probability for a to have degree k follows a binomial distribution:

n k n−k pk = k p (1 − p) .

• Seek alternate mechanisms... Known Mechanisms for Power Laws

• Phase transitions (singularities) • Random multiplicative processes (fragmentation) • Combination of exponentials (e.g. word frequencies) • / Proportional attachment (Polya 1923, Yule 1925, Zipf 1949, Simon 1955, Price 1976, Barabasi´ and Albert 1999)

Attractiveness is proportional to size: ds dt ∝ s

• Add in saturation [Amaral 2000, Borner¨ 2004], get power laws with exponential decay . Origins of preferential attachment

• 1923 — Polya , urn models.

• 1925 — Yule , explain genetic diversity.

• 1949 — Zipf , distribution of city sizes (1/f ).

• 1955 — Simon , distribution of wealth in economies.

(“The rich get richer”).

• [Interesting note, in sociology this is referred to as the Matthew effect after the biblical edict, “For to every one that hath shall be given ... ” (Matthew 25:29)] Preferential attachment in networks

• D. J. de S. Price, “Networks of scientific papers” Science, 1965. First observation of power laws in a network context. Studied paper co-citation network.

• D. J. de S. Price, “A general theory of bibliometric and other cumulative advantage processes” J. Amer. Soc. Info. Sci., 1976.

The rate at which a paper gains citations is proportional to the number it already has. (Probability to learn of a paper proportional to number of references it currently has).

• A.-L. Barabasi´ and R. Albert, “Emergence of Scaling in Random Networks” Science, 1999. (Citations: ∼ 1000 in 2006 ∼ 2000 in 2007 ∼ 3400 in April, 2008 ∼ 4300 in Feb, 2009)

(Together with Watts-Strogatz “Collective dynamics of ’small-world’ networks” Nature 1998, launched flurry of activity in .) Preferential Attachment random graphs:

• A discrete time process.

• Start with single isolated node.

• At each time step, a new node arrives.

• Probability incoming node attaches to a particular node of degree j:

Pij ∝ j P Explicitly: Pij = j/ j dj = j/(2mt) .

• We are interested in the limit of large graph size. Rate equations (a typical analysis tool) (Let nk,t ≡ number of nodes of degree k at time t, and nt ≡ total number of nodes at time t: Note nt = t)

For each arriving link:

(k−1) k • nk,t+1 = nk,t + 2mt nk−1,t − 2mt nk,t

Probability: pk,t = nk,t/n(t)

Assume steady state: pk,t → pk. Recursion for pm

(k−1)(k−2)···(m) m(m+1)(m+2) 2 pk = (k+2)(k+1)···(m+3) · pm = (k+2)(k+1)k · (m+1)

2m(m+1) pk = (k+2)(k+1)k

For k  1 −3 pk ∼ k

Get power law with γ = 3. Concepts covered today

• Social, physical and biological networks

• Simple network metrics (recapped next page)

• Random walks on networks

• Random graphs

• Phase transitions in connectivity

• Preferential attachment and network growth

• Next time: Robustness, Internet structure, optimization, biological networks. Outstanding challenges

• How do we connect network structure to function?

– Degree – Clustering Coefficient – Motifs – Betweeness Centrality – Assortativity – Flow and transport – Growth/evolution mechanisms.

• Interacting networks

• Strategic interactions / Game theory on networks