Link Prediction Through Deep Learning Supplementary Materials

LINK PREDICTION THROUGH DEEP LEARNING SUPPLEMENTARY MATERIALS XU-WEN WANG1, YIZE CHEN2, and YANG-YU LIU1;3 Supplementary Text Figs. S1 to S8 Tables S1 to S2 References (52-143) Date: August 30, 2018 1Channing Division of Network Medicine, Brigham and Women’s Hospital, and Harvard Medical School, Boston, MA 02115, USA 2Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA 3Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA. 1 2 X.-W. WANG, Y.-Z. CHEN, AND Y.-Y. LIU Section 1: Existing link prediction algorithms ................... 3 1.1. Similarity-based algorithms.......................... 3 1.2. Maximum likelihood algorithms........................ 5 1.3. Other algorithms................................ 7 Section 2: Performance metrics of link prediction methods ............ 9 2.1. AUC...................................... 9 2.2. Precision.................................... 11 Section 3: Deep generative models .......................... 13 3.1. Variational Autoencoder............................ 13 3.2. Generative Adversarial Networks....................... 14 Section 4: DGM-based link prediction ........................ 17 4.1 VAE-based link prediction........................... 17 4.2 GANs-based link prediction.......................... 17 Section 5: Node relabeling algorithms ........................ 22 5.1 Node relabeling algorithm based on Louvain community detection . 22 5.2 Node relabeling algorithm based on multiple resolution modular detection . 23 5.3 Consensus-based node relabeling algorithm.................. 23 Section 6: Brief description of real dataset ..................... 25 Supplementary References ............................... 28 Supplementary Figures ................................. 32 LINK PREDICTION THROUGH DEEP LEARNING 3 1. EXISTING LINK PREDICTION ALGORITHMS A real-world network can be mathematically represented by a graph G(V; E), where V = f1; 2;:::;Ng is the node set and E ⊆ V × V is the link set. A link is a node pair (i; j) with i; j 2 V, representing certain interaction, association or physical connection between nodes i and j. Link prediction aims to infer the missing links or predict future links between currently unconnected nodes based on the observed links [52, 53, 54]. Many algorithms have been de- veloped to solve the link prediction problem [55, 56, 57, 58]. Here we briefly describe some classical link prediction algorithms. 1.1. Similarity-based algorithms. For similarity-based algorithms, each non-observed node pair (i; j) is assigned a similarity score sij. Higher score is assumed to represent higher link existence probability. The similarity score can be defined in many different ways. (1) Common Neighbors. The common neighbors algorithm quantifies the overlap or similarity of two nodes as follows [59]: (S1) sij = jΓ(i) \ Γ(j)j; where Γ(i) denotes the set of neighbors of node i, \ denotes the intersection of two sets and jXj denotes the cardinality or size of set X. (2) Jaccard Index. The Jaccard index measures the overlap of two nodes with normaliza- tion [60]: jΓ(i) \ Γ(j)j (S2) s = ; ij jΓ(i) [ Γ(j)j where [ denotes the union of two sets. (3) Preferential Attachment Index. This index assumes that the existent likelihood of a link between two nodes is proportional to the product of their degrees [61]: (S3) sij = ki × kj; where ki is the degree of node i. (4) Resource Allocation Index. This index is based on a resource allocation process between pair of nodes [62, 63]. Consider a node pair (i; j), the similarity between i and j is defined as 4 X.-W. WANG, Y.-Z. CHEN, AND Y.-Y. LIU the amount of resource j received from i through their common neighbors: X 1 (S4) sij = : km m2Γ(i)\Γ(j) Here we assume each common neighbor has a unit of resource and will equally distribute among all its neighbors. (5) Katz Index. The Katz index is based on a weighted sum over the collection of all paths connecting nodes i and j: 1 X l l (S5) sij = β (A )ij; l=1 where β is a damping factor that gives the shorter paths more weights, and A is the adjacency matrix of the network. The N × N similarity matrix S = (sij) can be written in a compact form [64]: (S6) S = (I − βA)−1 − I; where I is the identity matrix. The damping factor β is a free parameter and should be lower than the reciprocal of the absolute value of the largest eigenvalue jλmaxj of A (in our calculations, we choose β = 0:5=jλmaxj). (6) Average Commute Time. The average commute time index is motivated by a random walk process on the network [58]: 1 (S7) sij = + + + ; lii + ljj − 2lij + where lij is the entry of pseudoinverse of the Laplacian matrix L ≡ D − A, where D = diagfk1; k2; ··· ; kN g is the degree matrix and A is the adjacency matrix. (7) Network embedding. Network embedding aims to represent each node i 2 V into a low- dimensional space Rd by using the proximity between nodes [65]. After embedding the nodes in the network into a low-dimensional space, we can directly calculate the distance between any two nodes in the transformed space, which can then be used in many downstream analy- sis, such as multi-label classification, networks reconstruction as well as link prediction. There are many existing network ambedding methods, such as Structural Deep Networks Embedding LINK PREDICTION THROUGH DEEP LEARNING 5 (SDNE) [66], LINE [67], DeepWalk [68], GraRep [69], and LE [70]. Though network embedding methods can efficiently perform link prediction for large-scale networks, the embedding process itself will cause information loss, which might affect the performance of link prediction. In addition, for sparse network, embedding methods cannot provide the representations of isolate nodes since no attribute is available. We choose DeepWalk to compare with our method and the code is downloaded from github: https://github.com/phanein/deepwalk. (8) Non-negative matrix factorization. Suppose the adjacency matrix A of a network is also non-negative where each node is represented by the corresponding column. Then non-negative matrix factorization (NMF) is aiming to find two non-negative matrices UN×k and Vk×N so that [71, 72]: (S8) A = UV; where N is the network size and k < N is the dimension of latent space. Then, similar to the network embedding methods, we can calculate the distance (similarity) between any two nodes in the latent space and realize the task of link prediction [73]. Other similarity-based algorithms, such as SimRank [74], Random Walks [75], Random Walks with Restart [76], Negated Shortest Path [77] can also be used for link prediction. 1.2. Maximum likelihood algorithms. The maximum likelihood algorithms assume that real networks have some structure, i.e., hierarchical or community structure. The goal of these algorithms is to select model parameters that can maximize the likelihood of the observed structure. (9) Stochastic Block Model. As one of the most general network models, the stochastic block model (SBM) assumes that nodes are partitioned into groups and the probability of two nodes are connected depends solely on the groups they belong to [78, 79]. The SBM assumes that a link with higher reliability has higher existent probability, and the reliability of a link is defined as [80]: lO + 1 ! L 1 X σiσj (S9) Rij = exp[−H(P )]; Z rσ σ + 2 P 2P i j where P represents the partition space of all possible partitions, σi is the group that node i P lO σ σ belongs to the partition , σiσj is the number of links between groups i and j in the observed 6 X.-W. WANG, Y.-Z. CHEN, AND Y.-Y. LIU network, rσ σ is the maximum possible number of links between them, the function H(P ) ≡ i j 0 1 P rαβ P [ln(rαβ + 1) + ln @ A], and Z ≡ exp[−H(P )]. In practice, we can use the α≤β O P 2P lαβ Metropolis algorithms to sample relevant partitions that significantly contribute to the sum over the partition space P. This allows us to calculate the link reliability efficiently. (10) Hierarchical Structure Model. Many real networks have hierarchical structure, which can be represented by a dendrogram D. One can assign a probability pr to each internal node r 0 of D. Then the connecting probability of a pair of leaves is given by pr0 , where r is the lowest common ancestor of these two leaves. Denote Er as the number of edges in the network whose endpoints have r as their lowest common ancestor in the dendrogram D. Let Lr and Rr be the number of leaves in the left and right subtrees rooted at r, respectively. Then the likelihood of D associated with a set of probabilities fprg is given by [81]: Y Er LrRr−Er (S10) L(D; fprg) = pr (1 − pr) : r2D For a specific D, the probabilities fprg that maximize L(D; fprg) are edges between the two subtrees of r that are present in the network: Er (S11) fprg = : LrRr Evaluating the likelihood L(D; fprg) at this maximum yields Y p¯r 1−p¯r LrRr (S12) L(D) = p¯r (1 − p¯r) : r2D One can use the Markov chain Monte Carlo (MCMC) method to sample a large number of dendrorgams D with probability proportional to their likelihood L(D). For each pair of unconnected leaves i and j, we calculate the connection probability pij for each D, and then calculate the average hpiji over all the sampled dendrograms. The hpiji value yields the existent probability of the link between nodes i and j. For each nonexistent link or node pair (i; j), we calculate the average connecting probability hpiji over all sampled dendrograms and the node pairs with highest hpiji are missing links. Since the SBM-based link prediction method introduced in Ref [80] has demonstrated better LINK PREDICTION THROUGH DEEP LEARNING 7 performance than this hierarchical structure model, in this work we chose the former as a rep- resentative maximum likelihood algorithm to compare with our DGM-based link prediction method.

Load more