Homophily Law of Networks: – Principles, Methods and Experiments

Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Homophily Law of Networks: – Principles, Methods and Experiments LI Angsheng Institute of Software Chinese Academy of Sciences SAS, Isaac Newton Institute 11, Jan. 2012 Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Abstract In this talk, I will introduce our new discovery of the homophily law of networks, that the homophily property ensures that a real network satisfies the small community phenomenon, including: 1) the power law degree distribution, 2) the small diameter property, 3) a significant fraction of nodes are contained in some small communities, 4) nodes within a small community share something in common, the colors in our model, 5) a small community contains a few representatives, and 6) nodes within a small community satisfy the power law degree distribution. I will also introduce the applications of the small community phenomenon and homophily law of networks in new searching, predicting and ranking algorithms in real networks. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Outline 1. Graph in the information age 2. Clustering/community finding - past decade 3. Challenges - next decade 4. Mathematical definitions 5. Small community phenomenon 6. Homophily law 7. Local dimension and searching 8. Local reductions 9. Cascading/giant cascading 10. Small core Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Scale free graphs 1) 1999, Barabási, Albert, Science, Scale free of a web network 2) 2002, Barabási, Albert, 2002, Review of modern physics 3) Kumar et al. 2000, in- and out-degrees of a web crawl Later on: – internet networks – telephone call graph – US power grid – Hollywod graph of actors – foodweb – protein-protein interaction networks – email – collaboration – citation, · · · Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Power law G = (V , E), ∃β > 0: The number of nodes of degree k in G is proportional to 1/k β. Degree Distribution of COND-MAT(N=21363) 104 103 102 Number of nodes 101 100 100 101 102 103 Degree Figure: Degree distribution of COND-MAT Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions PA Degree distribution of PA model(N=20000) 104 103 102 number of nodes 101 100 100 101 102 103 104 degree Figure: Degree distribution of PA model Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions History 1. 1926, Lotka, Chemical abstracts from 1907 - 1916 # of authors published k papers proportional to 1/k 2 2. 1932, Zipf # of English words that are used for k times proportional to 1/k 3. 1949, Yule Explained the reason why power law exists, having idea of the preferential attachment 4. 1955, Simon First simple model of the PA, leading to power law Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Small world 1. 1967, Stanley Milgram 6 degree separation - social experiments 2. 1999, reasoning on the diameter of web on the average, the diameter of a 200 million nodes – 19, 1.5 million nodes – 16. 3. 1998, Watts, Strogatz, Nature a 1 dimension model of graphs with small diameter, poly(log n). 4. 2000, Kleinberg a d-dimension model of graphs, satisfying the small world phenomenon, and in which a short path can be locally found. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Summary 1. Both the power law and small world were found in social study by experiments 2. Both can be studied mathematically in networks 3. The two laws are shared by most real world networks 4. Perhaps the major achievement in the past decade for the network science is the discovery that networks are universal representation for complex systems for a number of subjects, including both natural and social sciences – commented by Barabási, 2009, in his Science review article. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Challenges 2009, Barabási, Science paper, reviewing the past decade The state of the art: There are almost as many as dynamics as the real world networks. Questions 1) Are there uniform approaches to the dynamics of networks in general? 2) Why the failure of a few nodes cause the failure of the whole network? With applications in: – environments – society – economics – technology – internet etc. 3) Can network be robust and secure? Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Clustering Hypothesis: Structures are essential to the dynamics of networks First step: clustering to understand the structures Two approaches: 1. Modularity – physical approach, implicit def 2. Graph partitioning – algorithmic dependent – balanced, so large – disjoint Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Summary - community finding • Graph partitioning algorithms have been successfully used in finding large communities of networks – the project of finding large communities is done Observations 1. real communities are small 2. communities are overlapping 3. need a mathematical definition of community Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Intuition Figure: Two vertex sets (with dark background) have the same conductance value 1/11, while the left one is more community-like Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Overview I. Local approach II. Global approach Our study consists of: • Mathematical theory • Algorithms • Experiments • Applications Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Mathematical definition Definition 1 For S ⊆ V and vol(S) ≤ 2 vol(G), the conductance of S is defined as: e(S, S¯) Φ(S)= . vol(S) Definition Given a graph G = (V , E), a set S ⊂ V is a (α, β)-community of G, if α Φ(S) ≤ . |S|β Moreover, if |S| = O((ln n)γ), then we say that S is a (α, β, γ)-community of G, where n = |V |. In this case, we say that (α, β, γ) is the local dimension of G. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Small community phenomenon Let {Gn} be a sequence of networks evolved from some network model. We say that {Gn} satisfies the small community phenomenon, if there are constants α, β, γ such that there is a significant fraction of nodes of Gn which are contained in some (α, β, γ)-communities, for almost all n. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Erdös-Rényi model Theorem If p = ω(n) ln n/n, where ω(n) →∞ arbitrarily slowly, then with high probability, a random graph G in G(n, p) does not even contain an (α, β, γ)-community for every β > 0 and all γ > 0. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Ravasz-Barabási hierarchical model Theorem For a graph Gt generated from the stochastic Ravasz-Barabási hierarchical model, with high probability, every node is contained in an (α, β, γ)-community for some γ > 0 and 1 β = min{1, log5 p }. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Preferential attachment model Theorem With high probability, for a graph Gd,n in the preferential attachment model and d ≥ 2, 0 < β ≤ 2, there is no (α, β)-community in Gd,n. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Geometric preferential attachment model Theorem For Gn generated from the Geometric Preferential Attachment model, with high probability, each vertex in Gn is contained in an (α, β)-community of size nǫ, where 0 <β,ǫ< 1/2. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions d-dimensional small world model Theorem In the d-dimensional small world model G, with high probability: (1) if r < d, there is no proper community for an arbitrary node; (2) if r d, there exists weak -communities of size n = (α, β) (ln n)c1 for every node, where β < 1, c1 > 0; (3) if 2d ≥ r > d, there exists (α, r − 1, c2)-communities for every node, where c2 > 1; and (4) if r > 2d, there exists (α, 1, 1)-communities for every node. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Small community model A hybrid social network built based on the following rules: (i) A new node v links to an existing node u with probability proportional to both the degree d(u) and (dist(u, v))−β for some constant β, (ii) A new local edge is added with probability inverse proportional to the distance between the two nodes, and (iii) A new global edge is added with probability proportional to both the degrees and (dist(u, v))−γ for some constant γ. Theorem (1) The global edges satisfy the power law distribution, (2) The whole network satisfies the small community phenomenon, and (3) The whole network satisfies the small world phenomenon. Math Searching HomophilyLaw SearchingandPredicting Reduction Cascading Validation Questions Approximation of the PageRank With Jiankou Li, Pan Peng Community (v,α,β,κ,ǫ) • Compute an ǫ-approximation personalized PageRank vector p = prκ(χv − r) by invoking ApproximatePR (v,κ,ǫ). • Order the vertices into the sequence v1, · · · , vn by swapping. Define Si = {v1, · · · , vi } for each i ∈ [1, |supp(p)|]. • (The first local optimal strategy) For

Load more