Graph Embedding and Reasoning

Graph Embedding and Reasoning Jie Tang Department of Computer Science and Technology Tsinghua University The slides can be downloaded at http://keg.cs.tsinghua.edu.cn/jietang 1 Networked World • 2 billion MAU • >777 million trans. (alipay) • 26.4 billion minutes/day • 200 billion on 11/11 • 320 million MAU • 350 million users • Peak: 143K tweets/s • influencing our daily life • 700 million MAU • ~200 million MAU • 95 million pics/day • 70 minutes/user/day • 300 million MAU •QQ: 860 million MAU • 30 minutes/user/day • WeChat: 1 billion MAU 2 Mining Big Graphs/Networks • An information/social graph is made up of a set of individuals/entities (“nodes”) tied by one or more interdependency (“edges”), such as friendship. Mining big networks: “A field is emerging that leverages the capacity to collect and analyze data at a scale that may reveal patterns of individual and group behaviors.” [1] 1. David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Alber-Laszlo Barabasi, et al. Computational Social Science. Science 2009. 3 Challenges Info. Space + Social Space Challenges Info. big Space dynamic Interaction Social hetero Space geneous Interaction 1. J. Scott. (1991, 2000, 2012). Social network analysis: A handbook. 2. D. Easley and J. Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press, 2010. 4 Social & Information Network Analysis o Graph Neural Networks o Deep Learning for Networks Recent Trend: o High-Order Networks [Benson et al.] 2015~2019 Deep Learning for Networks o Info. vs. Social Networks (Twitter) [Kwak et al.] o Signed Networks [Leskovec et al.] o Semantic Social Networks [Tang et al.] 2010~2014 o Graph Evolution [Leskovec et al.] o Four Deg. Of Separation [Backstrom et al.] o 3 Deg. Of Influence [Christakis & Fowler] The 1st decade of the 21st o Structural Diversity [Ugander et al.] o Social Influence Analysis [Tang et al.] o Computational Social Science [Watts] 2005~2009 o Six Deg. Of Separation [Leskovec & Horvitz] Century: o Network Embedding [Perozzi et al.] o Network Heterogeneity [Sun & Han] More Computer & Data o Network Embedding [Tang & Liu] Scientists o Influence Max’n [Domingos & Kempe et al.] 2000~2004 o Computer Social Science [Lazer et al.] o Community Detection [Girvan & Newman] o Network Motifs [Milo et al.] 1998/9 o Small Worlds [Watts & Strogatz] o Link Prediction [Liben-Nowell & Kleinberg] o Scale Free [Barabasi & Albert] The Late 20th Century: 1997 o Power Law [Faloutsos 3] o HITS [Kleinberg] o PageRank [Page & Brin] 1992 o Structural Hole [Burt] CS & Physics o Hyperlink Vector Voting [Li] o Dunbar’s Number [Dunbar] 1970s o The Strength Of Weak Tie [Granovetter] o Small Worlds [Migram] 1960s th 1950s o Homophily [Lazarsfeld & Merton] The 20 Century: o Random Graph [Erdos, Renyi, Gilbert] o Balance Theory [Heider et al.] o Degree Sequence [Tuttle, Havel, Hakami] Sociology & Anthropology 1930s o Sociogram [Moreno] 5 Recent Trend d-dimensional vector, d<<|V| Representation Learning/ Graph Embedding 0.8 0.2 0.3 … 0.0 0.0 Users with the same label are located in the d-dimensional space closer than those with different labels e.g., node classification label2 label1 6 Example1: Open Academic Graph —how can we benefit from graph embedding? Open Academic Graph (OAG) is a large knowledge graph by linking two billion-scale academic graphs: Microsoft Academic Graph (MAG) and AMiner. 240M Authors 220M Publications Event • 189M publications Open • 113M researchers Academic • 8M concepts Graph • 8M IP accesses @ Open data knowledge intelligence Academic 25K Institutions Society 664K Fields of Studies 50K Venues Tsinghua University AMiner Microsoft Academic Graph Linking large-scale heterogeneous academic graphs https://www.openacademic.ai/oag/ 1. F. Zhang, X. Liu, J. Tang, Y. Dong, P. Yao, J. Zhang, X. Gu, Y. Wang, B. Shao, R. Li and K. Wang. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs. KDD'19. 7 Results Our methods based on graph embeddings ** Code&Data available at https://github.com/zfjsail/OAG 8 Example2: Social Prediction —how can we benefit from graph embedding? Example: Online Forum E E D D F F H v H v1 v2 B B C A C A Who are more likely to be “active”, v1 or v2? e.g., active = “dropout an online forum” Active neighbor Inactive neighbor v User to be influenced 1. J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang. DeepInf: Social Influence Prediction with Deep Learning. KDD'18. 9 Dropout Prediction • Dropout prediction with influence – Problem: dropout prediction – Game data: King of Glory (王者荣耀) With Influence w/o influence Top k #message #success ratio #message #success ratio 1 3996761 1953709 48.88% 6617662 2732167 41.29% 2 2567279 1272037 49.55% 9756330 4116895 42.20% 3 1449256 727728 50.21% 10537994 4474236 42.46% 4 767239 389588 50.78% 9891868 4255347 43.02% 5 3997251 2024859 50.66% 15695743 6589022 41.98% 10 Outline • Representation Learning for Graphs – Unified graph embedding as matrix factorization – Scalable graph embedding – Fast graph embedding • Extensions and Reasoning – Dynamic – Heterogeneous – Reasoning • Conclusion and Q&A 11 Representation Learning on Networks 푺 = 푓(푨) RLN by Matrix achieve better accuracy Input: Output: Adjacency Matrix Sparsify 푺 Scalable RLN Vectors 푨 풁 handle 100M graph Fast RLN 풁 = 푓(풁′) offer 10-400X speedups 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 12 DeepWalk SkipGram with Random walk One example RW path Hierarchical softmax v3 v4 v4 v3 v1 v5 v6 v 1 v5 v2 v6 Hierarchical softmax 1. B. Perozzi, R. Al-Rfou, and S. Skiena. 2014. Deepwalk: Online learning of social representations. KDD, 701–710. 13 Later… • LINE[1]: explicitly preserves both first- order and second-order proximities. • PTE[2]: learn heterogeneous text network embedding via a semi- supervised manner. • Node2vec[3]: use a biased random walk to better explore node’s neighborhood. 1. J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. 2015. Line: Large-scale information network embedding. WWW, 1067–1077. 2. J. Tang, M. Qu, and Q. Mei. 2015. Pte: Predictive text embedding through large-scale heterogeneous text networks. KDD, 1165–1174. 3. A. Grover and J. Leskovec. 2016. node2vec: Scalable feature learning for networks. KDD, 855–864. 14 Questions • What are the fundamentals underlying the different models? or • Can we unify the different network embedding approaches? 15 Unifying DeepWalk, LINE, PTE, and node2vec into Matrix Forms 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 16 Starting with DeepWalk 17 DeepWalk Algorithm 18 Skip-gram with Negative Sampling • SGNS maintains a multiset 퓓 that counts the occurrence of each word-context pair (푤, 푐) • Objective 푏# 푤 #(푐) ℒ = ෍ ෍(# 푤, 푐 log 푔 푥푇 푥 + log 푔(−푥푇 푥 )) 푤 푐 |풟| 푤 푐 푤 푐 where xw and xc are d-dimenational vector • For sufficiently large dimension d, the objective above is equivalent to factorizing the PMI matrix[1] #(푤, 푐)|풟| log 푏#(푤)#(푐) 1. Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014 19 DeepWalk 20 Understanding random walk + skip gram Suppose the multiset 풟 is constructed based on random #(푤,푐)|풟| walk on graphs, can we interpret 푙표푔 with graph 푏#(푤)#(푐) structures? 21 Understanding random walk + skip gram • Partition the multiset 풟 into several sub-multisets according to the way in which each node and its context appear in a random walk node sequence. • More formally, for 푟 = 1, 2, ⋯ , 푇, we define Distinguish direction and distance 22 Understanding random walk + skip gram 23 Understanding random walk + skip gram 24 Understanding random walk + skip gram the length of random walk 퐿 → ∞ 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 25 Understanding random walk + skip gram the length of random walk 퐿 → ∞ 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 26 Understanding random walk + skip gram the length of random walk 퐿 → ∞ 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 27 Understanding random walk + skip gram the length of random walk 퐿 → ∞ 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 28 29 DeepWalk is factorizing… 푤푖−2 푤푖−1 푤푖 푤푖+1 푤푖+2 DeepWalk is asymptotically and implicitly factorizing 푣표푙 퐺 = ෍ ෍ 퐴푖푗 푖 푗 푨 Adjacency matrix b: #negative samples 푫 Degree matrix T: context window size 30 LINE 31 PTE 32 node2vec — 2nd Order Random Walk 33 Unifying DeepWalk, LINE, PTE, and node2vec into Matrix Forms 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 34 • Can we directly factorize the derived matrix? 35 NetMF: explicitly factorizing the DW matrix 푤푖−2 Matrix 푤푖−1 푤푖 Factorization푤푖+1 푤푖+2 DeepWalk is asymptotically and implicitly factorizing 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. WSDM’18. The most cited paper in WSDM’18 as of May 2019 36 Explicitly factorize the matrix * approximate D-1/2AD-1/2 with T its top-h eigenpairs Uh흠hUh * decompose using Arnoldi algorithm[1] * decompose using Arnoldi algorithm[1] 1.

Load more