Link Prediction by Analyzing Common Neighbors Based Subgraphs Using Convolutional Neural Network

24th European Conference on Artificial Intelligence - ECAI 2020 Santiago de Compostela, Spain Link Prediction by Analyzing Common Neighbors Based Subgraphs using Convolutional Neural Network Kumaran Ragunathan, Kalyani Selvarajah and Ziad Kobti1 Abstract. Link prediction (LP) in social networks is to infer if a produce more accurate results. For any given heuristic it will not per- link is likely to be formed in the future. Social networks (SN) are form with the same accuracy in every different type of network. The ubiquitous and have different types such as human interaction and reason is that these methods work based on the extracted pattern from protein-protein networks. LP uses heuristic methods including com- the network topology, which may vary from one SN to another. This mon neighbors and resource allocation to find the formation of future is a significant drawback with heuristic methods. Selecting the ap- links. These heuristics are sensitive to different types of social net- propriate heuristic method for the SN type is often a trial and error works. Certain types of heuristics work better for some SN types, process. Weisfeiler-Lehman Neural Machine (WLNM) method [39] but not for others. Selecting the appropriate heuristic method for the proposed a solution to find the appropriate methods automatically, SN type is often a trial and error process. Recent ground-breaking based on the extracted subgraph in its neighborhood. WLNM is con- methods, WLMN and SEAL, demonstrated that this selection pro- sidered to be a state-of-the-art link prediction method based on its cess can be automated for the different types of SN. While these high accuracy. methods are promising, in some types of SN they still suffer from low The WLNM used high-order heuristics, such as Katz index [17] accuracy. The objective of this paper is to address this weakness by and Pagerank [23], to achieve significant accuracy. However, this introducing a novel framework called PLACN that incorporates the requires a large number of hops that span the enclosing subgraph analysis of common neighbors of nodes on target link and a combi- to the entire network and requires additional computation time and nation of heuristic features through a deep learning method. PLACN memory. To overcome this issue, SEAL (learning from Subgraphs, is driven by a new method to efficiently extract the subgraphs for Embeddings, and Attributes for Link prediction) proposed a method a target link based on the common neighbors. Another novelty is to learn general graph structure features from local enclosing sub- the method for labeling subgraphs based on the average hop and av- graphs using Graph Neural Network (GNN)[40]. SEAL derived γ- erage weight. Furthermore, we introduce a method to evaluate the decaying theory to prove that a small number of hops is enough to approximate number of nodes in the subgraph. Our model converts extract the high-order heuristics and to achieve better accuracy than link prediction to an image classification problem and uses a con- WLNM. However, we discover that SEAL also has many shortcom- volutional neural network. We tested our model on seven real-world ings. networks and compared against traditional LP methods as well as two Our both theoretical and empirical results motivate us to model a recent state-of-the-art methods based on subgraphs. Our results out- new link prediction framework that fixes the various shortcomings performed those LP methods reaching above 96% of AUC in bench- of SEAL. First, SEAL considered the h-hop enclosing subgraph for mark SNs. a pair of nodes (i; j) from the network G = (V; E) by the set of nodes from i and j’s neighbors up to h-hop, which may or may not include essential nodes for predicting the target link between nodes i 1 INTRODUCTION and j. In our proposed model, PLACN described below, we construct Link prediction (LP) is of great interest due to its practical value subgraphs based on common neighbors of nodes i and j in different in real-world applications such as e-commerce and friends recom- orders, which belong to the target link. The subgraphs include a high mendations, as well as networks for discovering future collaborators. number of essential nodes, as common neighbors are shared by both The LP problem aims to predict the links that are expected to oc- nodes i and j. SEAL then evaluated a vector X for i and j with cur or not occur in the future. While LP has been discussed over various heuristic features and an adjacency matrix of the enclosing the last two decades, the work of Jon Kleinberg and David Liben- subgraph A to feed to GNN. SEAL considered various features for Nowell has had a significant impact on this topic [23], drawing a only the nodes i and j, which belong to the target link. They ignore great deal of attention in recent years. The traditional approaches the heuristic features of all other nodes from the subgraph. Consider- include heuristic methods such as Common neighbors (CN) [28], ing the heuristic features of the entire subgraph has a large impact on Adamic Adar (AA)[2], and Resource Allocation (RA)[41]. Other ap- introducing a new link between any two nodes. As our contribution, proaches include supervised learning methods, such as SVM, bag- we consider the heuristic features of the entire subgraph to incorpo- ging, and Naive Bayes [5]. rate the impact of every node from the subgraph. Although many sophisticated methods for LP have been proposed, Moreover, labeling each subgraph is a necessary task to handle the we have identified that for certain types of networks there exists sim- ordering of graph nodes which keeps consistency of all subgraphs. ple heuristic methods, or combination of such heuristics, that can SEAL did not handle subgraph labeling well when common neighbors were in the subgraph. We discuss this further in subgraph label- 1 School of Computer Science, University of Windsor, ON, Canada. email: ing section. In PLACN, we introduce a new labeling method based [email protected], [email protected] and [email protected] 24th European Conference on Artificial Intelligence - ECAI 2020 Santiago de Compostela, Spain WLNM [39] was proposed, initiated an idea to use deep learning method, and learned patterns through subgraphs to predict links. It outperformed than heuristic methods and other famous approaches, and considered as a state-of-the-art method. However, later, Zhang et al. [40] proposed a model SEAL closest to WLMN, and stated that WLMN has several drawbacks and need to resolve. SEAL proposed a new approach for subgraph extraction based on the hop numbers and subgraph labeling. They claimed that their method solved the drawbacks in WLNM model, and achieved better accuracy than WLNM. However, SEAL also has several drawbacks in terms of subgraph extraction and labeling. Both SEAL and WLMN didn’t analyze the importance of common neighbors properly, which highly influence for the link occurrence. Therefore, we proposed PLACN model to overcome those drawbacks, and propose a new approach to analyze Figure 1. The representation of subgraph extraction from the given graph common neighbors. G(the left figure), then the resultant extracted subgraph 3 PRELIMINARIES on both the number of hops and weight of links to overcome the labeling inconsistency problem. 3.1 Problem Definition As our framework gives special importance to the heuristic fea- We are given an undirected, weighted graph G(V; E) at a partic- tures of common neighbors of target nodes to predict the links, we ular time t, representing the topological structure of a social net- named our model PLACN (Predicting Links by Analyzing Common work in which V is the set of vertices, E is the set of links, and Neighbors). the weight is based on interaction between any two nodes u and v. We summarize our contributions as follows: A = (ai;j )N×N represents an adjacency matrix of the graph G, 1. We propose a new method for constructing subgraphs; where ai;j > 0 if i and j interact each others and ai;j = 0 oth- erwise. We aim to build a model using ConvNet based on various 2. We introduce a new theory to find the number of nodes K to heuristic features to predict whether the connection between u and include in a subgraph; v has a high probability of existing or not in the near future at time 3. We propose a new labeling algorithm to maintain the consis- t0(> t). tent sequence of the adjacency matrix; 4. We extract heuristic features of nodes for the entire subgraph 3.2 Heuristic Methods for Link Prediction and model a new link prediction framework based on Con- volutional Neural Network (ConvNet); Generally, heuristic link prediction methods are based on topological structures which can be neighbor-based, path-based or random- 5. We demonstrate that PLACN outperforms all the heuristic walk based [36]. The neighbor-based include Common Neighbors methods as well as the state-of-the-art methods WLNM and (CN) [28], Jaccard Coefficient (JC), Adamic-Adar Coefficient (AA) SEAL. [2], Preferential Attachment (PA) [8], and Resource Allocation (RA) [41] while other topological heuristic methods, sometimes called as higher-order heuristic, include Katz index [17], PageRank [23] and 2 Related Work PropFlow [25]. In our model, we consider only the following neighbor-based Link Prediction (LP) is a generic task for analyzing network (low-order) heuristic methods: data which appears in many applications including Recommending friends [7, 38], Co-author recommendation [6, 10, 15] and web-site • Common Neighbors jΓ(i) \ Γ(j)j link [16, 26, 27]. In SNs, it has got a significant attention after the in- • Jaccard Coefficient jΓ(i)\Γ(j)j novative work by LibenNowell and Kleinberg (2007).

Load more