<<

Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

Homophily Law of Networks: – Principles, Methods and Experiments

LI Angsheng

Institute of Software Chinese Academy of Sciences

SAS, Isaac Newton Institute 11, Jan. 2012 Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Abstract

In this talk, I will introduce our new discovery of the law of networks, that the homophily property ensures that a real network satisfies the small community phenomenon, including: 1) the distribution, 2) the small diameter property, 3) a significant fraction of nodes are contained in some small communities, 4) nodes within a small community share something in common, the colors in our model, 5) a small community contains a few representatives, and 6) nodes within a small community satisfy the power law . I will also introduce the applications of the small community phenomenon and homophily law of networks in new searching, predicting and algorithms in real networks. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Outline

1. Graph in the information age 2. Clustering/community finding - past decade 3. Challenges - next decade 4. Mathematical definitions 5. Small community phenomenon 6. Homophily law 7. Local and searching 8. Local reductions 9. Cascading/giant cascading 10. Small core Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Scale free graphs

1) 1999, Barabási, Albert, Science, Scale free of a web network 2) 2002, Barabási, Albert, 2002, Review of modern physics 3) Kumar et al. 2000, in- and out-degrees of a web crawl Later on: – networks – telephone call graph – US power grid – Hollywod graph of actors – foodweb – protein-protein interaction networks – email – collaboration – citation, · · · Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Power law G = (V , E), ∃β > 0: The number of nodes of degree k in G is proportional to 1/k β.

Degree Distribution of COND-MAT(N=21363) 104

103

102 Number of nodes of Number

101

100 100 101 102 103 Degree

Figure: Degree distribution of COND-MAT Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions PA

Degree distribution of PA model(N=20000) 104

103

102 number of nodes of number

101

100 100 101 102 103 104 degree

Figure: Degree distribution of PA model Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions History 1. 1926, Lotka, Chemical abstracts from 1907 - 1916 # of authors published k papers proportional to

1/k 2 2. 1932, Zipf # of English words that are used for k times proportional to

1/k 3. 1949, Yule Explained the reason why power law exists, having idea of the 4. 1955, Simon First simple model of the PA, leading to power law Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Small world

1. 1967, Stanley Milgram 6 degree separation - social experiments 2. 1999, reasoning on the diameter of web on the average, the diameter of a 200 million nodes – 19, 1.5 million nodes – 16. 3. 1998, Watts, Strogatz, Nature a 1 dimension model of graphs with small diameter, poly(log n). 4. 2000, Kleinberg a d-dimension model of graphs, satisfying the small world phenomenon, and in which a short can be locally found. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Summary

1. Both the power law and small world were found in social study by experiments 2. Both can be studied mathematically in networks 3. The two laws are shared by most real world networks 4. Perhaps the major achievement in the past decade for the is the discovery that networks are universal representation for complex systems for a number of subjects, including both natural and social sciences – commented by Barabási, 2009, in his Science review article. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Challenges 2009, Barabási, Science paper, reviewing the past decade The state of the art: There are almost as many as dynamics as the real world networks. Questions 1) Are there uniform approaches to the dynamics of networks in general? 2) Why the failure of a few nodes cause the failure of the whole network? With applications in: – environments – society – economics – technology – internet etc. 3) Can network be robust and secure? Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Clustering

Hypothesis: Structures are essential to the dynamics of networks First step: clustering to understand the structures Two approaches: 1. – physical approach, implicit def 2. Graph partitioning – algorithmic dependent – balanced, so large – disjoint Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Summary - community finding

• Graph partitioning algorithms have been successfully used in finding large communities of networks – the project of finding large communities is done Observations 1. real communities are small 2. communities are overlapping 3. need a mathematical definition of community Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Intuition

Figure: Two sets (with dark background) have the same value 1/11, while the left one is more community-like Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Overview

I. Local approach II. Global approach Our study consists of: • Mathematical theory • Algorithms • Experiments • Applications Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Mathematical definition

Definition 1 For S ⊆ V and vol(S) ≤ 2 vol(G), the conductance of S is defined as: e(S, S¯) Φ(S)= . vol(S)

Definition Given a graph G = (V , E), a set S ⊂ V is a (α, β)-community of G, if α Φ(S) ≤ . |S|β Moreover, if |S| = O((ln n)γ), then we say that S is a (α, β, γ)-community of G, where n = |V |. In this case, we say that (α, β, γ) is the local dimension of G. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Small community phenomenon

Let {Gn} be a sequence of networks evolved from some network model. We say that {Gn} satisfies the small community phenomenon, if there are constants α, β, γ such that there is a significant fraction of nodes of Gn which are contained in some (α, β, γ)-communities, for almost all n. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Erdös-Rényi model

Theorem If p = ω(n) ln n/n, where ω(n) →∞ arbitrarily slowly, then with high probability, a G in G(n, p) does not even contain an (α, β, γ)-community for every β > 0 and all γ > 0. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Ravasz-Barabási hierarchical model

Theorem For a graph Gt generated from the stochastic Ravasz-Barabási hierarchical model, with high probability, every node is contained in an (α, β, γ)-community for some γ > 0 and 1 β = min{1, log5 p }. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Preferential attachment model

Theorem With high probability, for a graph Gd,n in the preferential attachment model and d ≥ 2, 0 < β ≤ 2, there is no (α, β)-community in Gd,n. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Geometric preferential attachment model

Theorem For Gn generated from the Geometric Preferential Attachment model, with high probability, each vertex in Gn is contained in an (α, β)-community of size nǫ, where 0 <β,ǫ< 1/2. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions d-dimensional small world model

Theorem In the d-dimensional small world model G, with high probability: (1) if r < d, there is no proper community for an arbitrary node; (2) if r d, there exists weak -communities of size n = (α, β) (ln n)c1 for every node, where β < 1, c1 > 0; (3) if 2d ≥ r > d, there exists (α, r − 1, c2)-communities for every node, where c2 > 1; and (4) if r > 2d, there exists (α, 1, 1)-communities for every node. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Small community model

A hybrid built based on the following rules: (i) A new node v links to an existing node u with probability proportional to both the degree d(u) and (dist(u, v))−β for some constant β, (ii) A new local edge is added with probability inverse proportional to the between the two nodes, and (iii) A new global edge is added with probability proportional to both the degrees and (dist(u, v))−γ for some constant γ. Theorem (1) The global edges satisfy the power law distribution, (2) The whole network satisfies the small community phenomenon, and (3) The whole network satisfies the small world phenomenon. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Approximation of the PageRank

With Jiankou Li, Pan Peng Community (v,α,β,κ,ǫ) • Compute an ǫ-approximation personalized PageRank vector p = prκ(χv − r) by invoking ApproximatePR (v,κ,ǫ). • Order the vertices into the sequence v1, · · · , vn by swapping. Define Si = {v1, · · · , vi } for each i ∈ [1, |supp(p)|]. • (The first local optimal strategy) For each i ∈ [1, |supp(p)|], and check whether Si satisfies both conditions given by choosing the first set which is the local optimal set satisfying the community conditions. If yes, output Si and stop. • Output empty set. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Determining local dimension

Determine (α, β, γ) β • Set α = c1 and choose β0 such that c1/| ln n| 0 = c2.

• For each β from β0 to 1 with increment 0.01: – Run Compute (α0, β) from all possible vertices to obtain the corresponding size-fraction curve;

– To determine CSVα,β and compute f (CSVc1,β).

• Output β, γ such that f (CSVc1,β) reaches the maximum γ and (ln n) = CSVc1,β. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions The first local optimal strategy

One useful point is the first local optimal strategy in the choice of community. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

1

0.9

0.8 cond_mat astro 0.7 grqc hepph 0.6 hepth

0.5

0.4 fraction of nodes 0.3

0.2

0.1

0 0 50 100 150 200 250 300 community size

Figure: The size-fraction curve on Collaboration networks Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

0.7

0.6

cit_hepph 0.5 cit_hepth

0.4

0.3 fraction of nodes

0.2

0.1

0 0 200 400 600 800 1000 1200 1400 1600 community size

Figure: The size-fraction curve on citation networks Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

0.45

0.4 slash0811 0.35 slash0922 slash0811−2 0.3 slash0922−2

0.25

0.2 fraction of nodes 0.15

0.1

0.05

0 0 50 100 150 200

Figure: The size-fraction curve on Slashdot networks Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

0.2

0.18 epinions 0.16

0.14

0.12

0.1

0.08 fraction of nodes 0.06

0.04

0.02

0 0 100 200 300 400 500 600 community size

Figure: The size-fraction curve on Epinions networks Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

0.8

0.7

0.6

0.5 email_enron 0.4 email_euall

fraction of nodes 0.3

0.2

0.1

0 0 200 400 600 800 1000 1200 1400 1600 community size

Figure: The size-fraction curve on email networks Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

0.7

social_wikivote 0.6

0.5

0.4

0.3 fraction of nodes

0.2

0.1

0 0 200 400 600 800 1000 community size

Figure: The size-fraction curve on Wikivote network with c1 = 0.5, c2 = 0.48 Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

1

0.9 p=0.0002 p=0.0003 0.8 p=0.0004 p=0.0005 0.7

0.6

0.5

0.4 fraction of nodes 0.3

0.2

0.1

0 0 20 40 60 80 100 community size

Figure: The size-fraction curves on the giant connected of G(n, p) graphs with n = 10000 and different p values Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

1

0.9

0.8 p=0.18 0.7 p=0.2 p=0.22 0.6 p=0.24 p=0.26 0.5 p=0.28 p=0.3 0.4 fraction of nodes 0.3

0.2

0.1

0 0 20 40 60 80 100 community size

Figure: The size-fraction curves of the WS small world model with n = 10000 and different p values Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

1

0.9 football LFR benchmark 0.8

0.7

0.6

0.5

0.4 fraction of nodes 0.3

0.2

0.1

0 0 20 40 60 80 100 120 community size

Figure: The size-fraction curves on the benchmark graphs Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Benchmark conference O_Alg N_Alg Western Athletic 0.471405 0.843274 Sun Belt 0.370479 0.412393 Big East 0.478091 1 Atlantic Coast 0.480384 1 Big Twelve 0.561951 1 Big Ten 0.663325 1 Independents 0.291111 0.23094 Conference USA 0.580948 0.948683 Mountain West 0.417029 1 Mid-American 0.72111 1 Southeastern 0.707107 1 Pacific Ten 0.471405 1 Table: The comparison of our community detecting algorithm with the traditional one Math Searching Homophily Law SearchingandPredicting Reduction Cascading Validation Questions Extension problem of graph coloring

1972, Nobel economics winner Thomas Schelling model: a geometric model shows how global patterns of spatial segregation arises from the effect of local operations - social experiments, no theory. With Peng Zhang: Let G be a graph, φ be a constant, and g be a coloring of the vertices of G. Give v ∈ V , we say that v is satisfied if there are φ · d(v) of neighbors of v in G which share the same color with v. Question: Given a partial coloring f of nodes of G, to find an extension g of f such that the number of satisfied nodes of G is maximized. 1. The question is NP-hard 2. there is a poly time app algorithm with app ratio Ω(∆−3), where ∆ is the max degree of G. Math Searching Homophily Law SearchingandPredicting Reduction Cascading Validation Questions Homophily model of networks

With Jiankou Li, Yicheng Pan, and Pan Pend:

1. At time 2, we are given an initial graph G2. Each of the two nodes has d − 1 self-loops. 2. For i = 3, 4,..., n, at time i, add a new node v such that 2.1 Let p be in [0, 1]. With probability p, v chooses a new color, κ say. Add an edge (u, v), where u is chosen with prob proportional to the degrees of nodes in Gi−1. We say that v is the seed node of the color κ. 2.2 With probability 1 − p, define the color of v to be the one, κ say, uniformly chosen from the existing colors in Gi−1. 3. For every seed node v, we remove all the self-loops on v, and replace it by a random (d − 1) regular graph. Math Searching Homophily Law SearchingandPredicting Reduction Cascading Validation Questions Homophily Theorem of Networks

Let p = log−c i. Then: With probability 1 − o(1): (1) For every color κ, the induced subgraph of the homochromatic set Sκ is connected, and satisfies the power low degree distribution. (2) G obeys the power law degree distribution. (3) The average node to node distance is O(log n). (4) Every community of the same color, κ say, has a seed node, which is interpreted as the representative of the community. (5) There are 1 − o(1) fraction of nodes of G whose homochromatic set is an (α, 0.8/(c + 1), c + 1)-community. Math Searching Homophily Law SearchingandPredicting Reduction Cascading Validation Questions Homophily Law

The Homophily Theorem implies the following Homophily Law of Networks: (1) A real network has a local structure and information captured by the small community phenomenon. (2) The small community phenomenon of a network is (approximately) described by the conclusions of the Homophily Theorem. Math Searching Homophily Law SearchingandPredicting Reduction Cascading Validation Questions Common Principle

Why do most real networks have the small community phenomena?

Homophily law The homophily property ensures that a network satisfies the small community phenomenon.

Why? A Chinese saying:

People sharing the same tastes come together, materials are grouped by categories. Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions New methods (In progress) New methods we are able to build based on the small community phenomenon, and the homophily law. 1. Function prediction of unknown proteins from the PPI networks 2. (With Jiankou Li) To define a property associated with a small community, using properties of nodes, keywords of papers, in the citation networks, and classification of products from the product networks, say. 3. (With Jiankou Li) New searching algorithm – high dimensional searching – the local optimum choice law: the first satisfactory solution that has no better immediate extension is the local optimum choice in searching in real world networks (A new law in searching) – multi-color searching a practical challenge to us. Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Finding missing keywords

We study Arxiv HEP-TH (high energy physics theory). It is a citation graph from the e-print arXiv and covers all the citations within a dataset of 27, 770 papers with 352, 807 edges. If paper i cites paper j, then the graph contains a directed edge from i to j. Each of the papers in the network contain the title, abstract, publication journal, and publication date of the paper. There are 1214 papers from the total 27400 papers for which keywords were listed by their authors. Our goal is to use this information to predict and confirm keywords for the remaining papers in the network. Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Local dimension

n α β γ f d˜ d˜i 27400 0.33 0.04 2.91 0.67 25.7 16.6

d˜o s˜ k˜1 k˜2 p˜s p˜r pmax 44.3 225 22 124 1214 219 1819 Table: 1 Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Algorithm F

For each community, C say, suppose that K1, K2, · · · , Kl are all known keywords among papers in the community C. 1. Let i ≤ l be a number.

2. Suppose that k1, k2, · · · , ki are the most popular i keywords among all the known keywords of C in C. 3. Given a paper P in C with which no keywords are listed in the network, for each j ≤ i, if kj appears in either the title or the abstract of paper P, then we say that kj is a predicted and confirmed keyword for P. 4. Let r be the number of keywords that are predicted and confirmed for P. In this case, we say that r keywords are predicted and confirmed to paper P. Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Application of the homophily law

The homophily law implies that a community has a short list of keywords which very well represents the feature of the community. Figure 2 shows the change of number of papers whose keywords are predicted and confirmed with the size of the keywords of the community, that is the most popular i keywords among all known keywords of papers in C, for some i. It shows that it is true that most communities have short list of representative keywords. Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Small community phenomenon

0.7

0.6

0.5

0.4

0.3 node fraction

0.2

0.1

0.0 0 500 1000 1500 2000 community size

Figure: size-fraction curve on citation networks Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Homophily law

18000

16000

14000

12000

10000

8000 number of papers whose keywords are confirmed

6000 0 20 40 60 80 100 number of keywords of communities

Figure: the keywords prediction curve Math Searching HomophilyLaw Searching and Predicting Reduction Cascading Validation Questions Keywords confirmation

i\r 1 2 3 4 5+ total 5 5286 2979 1639 768 606 11279 10 4701 3605 2429 1407 1633 13795 15 4360 3627 2671 1798 2330 14829 20 3953 3467 2853 1999 3049 15397 25 3666 3301 2909 2116 3609 15721 30 3344 3169 2934 2223 4185 16015 35 3199 3116 2952 2238 4438 16151 40 3081 3044 2922 2255 4695 16239 45 2987 2992 2850 2321 4892 16333 50 2869 2915 2770 2340 5221 16453 all 2336 2587 2560 2348 5351 16842 Table: Keywords prediction Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Algorithm

With Wei Zhang, Jiankou Li, Yicheng Pan, and Pan Peng: Given a network G, our overall approach is as follows:

1. Let G1 = G.

2. Suppose that G1, G2, · · · , Gl are the sequence of local reductions of G such that – for each i, Gi ≤LR Gi+1, – Gl is small and “normal", which will be defined or discovered later in the experiments, and – Gi+1 preserves the giant cascading phenomenon of Gi .

3. To study the “similarity" of Gi+1 to Gi for each i in the reduction sequence in step 2. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Local reduction - HEP-PH The local reductions: The local reduction is given in Table 4. C 7310 3 1 ′ G1 (11204) N 1 1 3891 C 4668 1 ′ G2 (7310) N 1 2462 C 3008 1 ′ G3 (4668) N 1 1660 C 2114 1 ′ G4 (3008) N 1 894 C 572 1 ′ G5 (2114) N 1 1542 C 339 1 ′ G6 (572) N 1 133

Table: Local reductions of HEP-PH (C: component size; N: number of Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Local dimension - HEP-PH

N LBC α β γ fraction G1 11204 0.184 0.204 0.05 2.47 54.3% G2 7310 0.333 0.37 0.05 2.4 50.3% G3 4668 0.454 0.504 0.05 2.72 64.6% G4 3008 0.51 0.566 0.05 2.67 55.4% G5 2114 0.801 0.89 0.05 2.63 70% G6 572 0.828 0.92 0.05 3.44 52.3% G7 339 Table: Local of HEP-PH. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions HEP-PH, core

The last two graphs G6 and G7 are already small which are given in Figure 15.

Figure: Network structure of HEP-PH (G6 and G7) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

C 14289 20 12 1 ′ G1 (21363) N 1 1 1 7042 C 8918 2 1 ′ G2 (14289) N 1 1 5359 C 6102 1 ′ G3 (8918) N 1 2816 C 4340 1 ′ G4 (6102) N 1 1762 C 2618 1 ′ G5 (4340) N 1 1722 C 1093 1 ′ G6 (2618) N 1 1525 C 463 1 ′ G7 (1093) N 1 630 Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions COND-MAT, local dimension

Local dimensions of the reduction are given in Table 7.

N LBC α β γ fraction G1 21363 0.27 0.3 0.05 2.38 52.8% G2 14289 0.297 0.33 0.05 2.5 50.9% G3 8918 0.36 0.4 0.05 2.54 58.5% G4 6102 0.207 0.23 0.05 3.51 55.7% G5 4340 0.144 0.16 0.05 3.94 50.9% G6 2618 0.18 0.2 0.05 3.68 59.9% G7 1093 0.27 0.3 0.05 3.72 89.9% G8 463 Table: Local dimensions of COND-MAT network Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions COND-MAT, core

The last two graphs G7 and G8 are described in Figure 16.

Figure: Network structure of COND-MAT (G7 and G8) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

C 13006 2 1 ′ G1 (17903) N 1 2 4893 C 10513 1 ′ G2 (13006) N 1 2493 C 8055 1 ′ G3 (10513) N 1 2458 C 6138 1 ′ G4 (8055) N 1 1917 C 4412 1 ′ G5 (6138) N 1 1726 C 3089 2 1 ′ G6 (4412) N 1 1 1321 C 985 2 1 ′ G7 (3089) N 1 1 2102 C 562 1 G′ (985) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions ASTRO-PH, local dimension

The local dimensions of the reductions are shown in Table 9.

N LBC α β γ fraction G1 17903 0.288 0.32 0.04 2.45 52.7% G2 13006 0.477 0.53 0.05 4.21 54.1% G3 10513 0.549 0.61 0.05 4.16 66.5% G4 8055 0.576 0.64 0.05 4.09 63.6% G5 6138 0.603 0.67 0.05 2.57 60.4% G6 4412 0.621 0.69 0.05 2.06 54.7% G7 3089 0.621 0.69 0.05 2.61 89.2% G8 985 0.324 0.36 0.05 2.94 50.6% G9 562 Table: Local dimensions of ASTRO-PH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions ASTRO-PH, core We also choose the largest component of ASTRO-PH network as G1. After 8 rounds, as shown in Table 9, we obtain network sequences {G1, G2, · · · , G9}. G8 and G9 are shown in Figure 17.

Figure: Network structure of ASTRO-PH(G8 and G9) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions HEP-TH, local reduction The local reductions are given in Table 10.

C 5056 2 1 ′ G1 (8638) N 1 3 3576

C 2936 2 1 ′ G2 (5056) N 1 2 2116

C 1529 2 1 ′ G3 (2936) N 1 2 1403

C 459 3 1 ′ G4 (1529) N 1 1 1067

Table: Local reductions of HEP-TH (C: component size; N: number of components) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions HEP-TH, local dimension

The local dimensions of the reductions are as Table 11.

N LBC α β γ fraction G1 8638 0.167 0.18 0.05 2.3 50.3% G2 5056 0.228 0.25 0.05 2.72 54.1% G3 2936 0.291 0.32 0.05 2.69 53.2% G4 1529 0.315 0.35 0.05 2.55 50.6% G5 459 Table: Local dimension of HEP-TH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions HEP-TH, core

G4 and G5 are shown in Figure 18.

Figure: Network structure of HEP-TH (G4 and G5) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Cascading

Definition Let G = (V , E) be a graph, φ be a constant in (0, 1), and S be a subset of V . We define a set T ⊆ V recursively as follows: (i) Set T ← S, (ii) We say that a node v ∈ V is active, or injured if v ∈ T , and (iii) For every v ∈ V , if there are at least φ × dv neighbors of v which are active, then v becomes active or injured, and enters T , where dv is the degree of v in G. (iv) Recursively run step (iii) until no new node can be added to T . Let T be the final set defined as above. We say that T is the G φ-cascading of S in G, denoted by cφ (S)= T . Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Giant cascading

Definition Let Gn = (Vn, En) be a family of graphs, and φ be a constant in (0, 1). We say that {Gn} satisfies the φ-giant cascading phenomenon, if there exist a subset Un ⊂ Vn, constants c, and d, such that for sufficiently large n, the following property holds: With probability p > 1/2, a random set S of nodes over Un of c G size at most (log n) satisfies that |cφ (S)|≥ d × n, where n is the number of nodes in Vn. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Cascading simulation

For every Gi = {Vi , Ei } and some given φ, we implement the following cascading simulation. 1. Fix a set size |S|.

2. Randomly pick a node set S of size |S| in Vi and compute T = cφ(S) in G1 (Recall Definition 9. The key point is that, we select S in Gi , but calculate T in G1, since our goal is to examine the influence of nodes of Gi in G1.) 3. Repeat step 2 for sufficiently number of times, say 100 times, and calculate the average of |T |. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions HEP-PH

Figure: The cascading curves of HEP-PH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions φ-transition

Figure: The φ-Transition curves of HEP-PH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions COND-MAT

Figure: The cascading curves of COND-MAT Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

Figure: The φ-Transition curves of COND-MAT Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions ASTRO-PH

Figure: The cascading curves of ASTRO-PH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

Figure: The φ-transition curves of ASTRO-PH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions HEP-TH

Figure: The cascading curves of HEP-TH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions

Figure: The φ-transition curves of HEP-TH Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Benchmark

Figure: American college football network (G1) and its supporter graph (G2) Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions More issues

1. Ranking by the local reductions 2. Characterization of the core or supporter graphs 3. Why does a network have a core? 4. Semantics of the core 5. Semantics of the small communities Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Robustness Definition Given G, we say that G is robust, if ∃φ0 (small), for all φ ≥ φ0, for some α,

G α PrS,|S|=k [ |casφ (S)|≤ (k · log n) ]= 1 − o(1) In this case, we say that, α is the robustness ratio of G.

Conjecture. Small community phenomena guarantee the robustness of a network.

Experiments: YES ! (with Xiaohui Zhang, Jiankou Li, Wei Zhang)

Theory: Open, and challenging! (One of the most important open problems in the local theory.) Need a new theory to resolve the problem. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Grand Challenges

1. Security - both theory and experiments 2. To develop a global theory 3. Both local and global approaches to exploring the true mechanisms of complex networks in various disciplines 4. To extend the theories to general complex systems such as economical networks etc, in which game in involved