Directed Graph Convolutional Network
Total Page:16
File Type:pdf, Size:1020Kb
Directed Graph Convolutional Network Zekun Tong Yuxuan Liang Changsheng Sun National University of Singapore National University of Singapore National University of Singapore [email protected] [email protected] [email protected] David S. Rosenblum Andrew Lim∗ National University of Singapore National University of Singapore [email protected] [email protected] ABSTRACT One of the main reasons that a GCN can achieve such good Graph Convolutional Networks (GCNs) have been widely used due results on a graph is that makes full use of the structure of the to their outstanding performance in processing graph-structured graph. It captures rich information from the neighborhood of ob- data. However, the undirected graphs limit their application scope. ject nodes through the links, instead of being limited to a specific In this paper, we extend spectral-based graph convolution to di- distance range. General GCN models provide a neighborhood ag- rected graphs by using first- and second-order proximity, which can gregation scheme for each node to gain a representation vector, and not only retain the connection properties of the directed graph, but then learn a mapping function to predict the node attributes [9]. also expand the receptive field of the convolution operation. Anew Since spatial-based methods need to traverse surrounding nodes GCN model, called DGCN, is then designed to learn representations when aggregating features, they usually add significant overhead on the directed graph, leveraging both the first- and second-order to computation and memory usage [31]. On the contrary, spectral- proximity information. We empirically show the fact that GCNs based methods use matrix multiplication instead of traversal search, working only with DGCNs can encode more useful information which greatly improves the training speed. Thus, in this paper, we from graph and help achieve better performance when generalized focus mainly on the spectral-based method. to other models. Moreover, extensive experiments on citation net- Although the above methods have achieved improvement in works and co-purchase datasets demonstrate the superiority of our many aspects, there are two major shortcomings with existing model against the state-of-the-art methods. spectral-based GCN methods. First, spectral-based GCNs are limited to apply to undirected KEYWORDS graphs [32]. For directed graphs, the only way is to relax the graph structure from a directed to an undirected graph by symmetrizing Graph Neural Networks; Semi-Supervised Learning; Proximity the adjacency matrix. In this way we can get the semi-definite Laplacian matrix, but at the same time we also lose the unique 1 INTRODUCTION structure of the directed graph. For example, in a citation graph, Graph structure is a common form of data. Graphs have a very later published articles can cite former published articles, but the strong ability to represent complex structures and can easily ex- reverse is not true. This is a unique time series relationship. If press entities and their relationships. Graph Convolutional Net- we transform it into an undirected graph, we lose this part of the works (GCNs) [4, 7, 8, 11, 27, 33] are a CNNs variant on graphs and information. Although we can represent the original directed graph effectively learns underlying pairwise relations among data vertices. in another form, such a temporal graph learned by combination of We can divide GCNs into two categories: spectral-based [4, 11, 15] Recurrent Neural Networks (RNNs) and GCNs [20], we still want and spatial-based [7, 27]. A representative structure of spectral- to dig more structural features from the original without adding based GCNs [11] has multiple layers that stack first-order spec- extra components. tral filters and learn graph representations using a nonlinear ac- Second, in most existing spectral-based GCN models, during tivation function, while spatial-based approaches design neigh- each convolution operation, only 1-hop node features are taken into arXiv:2004.13970v1 [cs.LG] 29 Apr 2020 borhood features aggregating schemes to achieve graph convo- account (using the same adjacency matrix), which means they only lutions [32, 35]. Various GCNs have significant improvements in capture the first-order information between vertices. It is natural to benchmark datasets [7, 11, 27]. Such breakthroughs have promoted consider direct links when extracting local features, but this is not the exploration of variant networks: the Attention-based Graph always the case. In some real-world graph data, many legitimate Neural Network(AGNN) [26] uses the attention mechanism to re- relationships may not be encoded via first-order links [25]. For place the propagation layers in order to learn dynamic neighbor- instance, people in social networking communities share common hood information; the FastGCN [3] interprets graph convolutions interests, but they don not necessarily contact each other. In other as integral transforms of embedding functions while using batch words, the features we get by first-order proximity are likely to be training to speed up. The positive effects of different GCN vari- insufficient. Although we can obtain more information by stacking ants greatly promote their use in various task fields, including but multiple layers of GCNs, multi-layer networks will introduce more not limited to social networks [3], quantum chemistry [16], text trainable parameters, making them prone to overfitting when the classification [34] and image recognition [30]. label rate is small or needing extra labeled information, which is not label efficient [14]. Therefore, we need a more efficient feature ∗Corresponding author extraction method. Conference’17, July 2017, Washington, DC, USA Zekun Tong, Yuxuan Liang, Changsheng Sun, David S. Rosenblum, and Andrew Lim 4 7 (3) We experiment with semi-supervised classification tasks on various real-world datasets, which validates the effectiveness 1 of first- and second-order proximity and the improvements ob- 5 3 8 tained by DGCNs over other models. We will release our code 2 for public use soon. 6 9 2 PRELIMINARIES In this section, we first clarify the terminology and preliminary Figure 1: A simple weighted directed graph example. Line knowledge of our study, and then present our task in detail. Par- v width indicates the weight of the edges. The node 1 ticularly, we use bold capital letters (e.g., X) and bold lowercase first-order v second-order has proximity with 3. It also has letters (e.g., x) to denote matrices and vectors, respectively. We use v proximity with 2, because they have shared neighbors non-bold letters (e.g., x) to represent scalars and Greek letters (e.g., fv ;v ;v g v v 4 5 6 . Both 2 and 3’s features should be considered λ) as parameters. when aggregating v1’s feature. To address these issues, we leverage second-order proximity be- 2.1 Formulation tween vertices as a complement to existing methods, which inspire Definition 1. Directed Graph [21]. A general graph has n ver- from the hub and authority web model [12, 37]. By using second- tices or nodes is define as G = ¹V; Eº, where V = ¹v1; :::;vnº is order proximity, the directional features of the directed graph can be vertex set and E ⊂ f1; ::;ng × f1; :::;ng is edge set. Each edge e 2 E retained. Additionally, the receptive field of the graph convolution is an ordered pair e = ¹i; jº between vertex vi and vj . If any of its can be expanded to extract more features. Different from first-order edges e is associated with a weight Wi; j > 0, the graph is weighted. proximity, judging whether two nodes have second-order proximity When pair ¹i; jº < E, we set Wi; j = 0. A graph is directed when has does not require nodes to have paths between them, as long as ¹u;vº . ¹v;uº and wi; j . wj;i . they share common neighbors, which can be considered as second- Here, we consider the undirected graph as a directed graph has order proximity. In other words, nodes with more shared neighbors two directed edges with opposite directions and equal weights, and have stronger second-order proximity in the graph. This general binary graph as a weighted graph which edge weight values only notion also appears in sociology [6], psychology [22] and daily life: take from 0 or 1. Besides, we only consider non-negative weights. people who have a lot of common friends are more likely to be Furthermore, to measure our model’s ability to extract surround- friends. A simple example is shown in Figure 1. When considering ing information, we define two indicators[9, 29] on the directed the neighborhood feature aggregation of v in the layer , we need 1 1 graph: , which is used to evaluate how much to consider v , because it has a first-order connection with v , but Feature Smoothness 3 1 surrounding information we can obtain and we also need to aggregate v ’s features, due to the high second-order Label Smoothness 2 for evaluating how useful the obtained information is. similarity with v1 in sharing three common neighbors. In this paper, we present a new spectral-based model on directed Definition 2. First & second-order edges in directed graph. graphs, Directed Graph Convolutional Networks (DGCN), which Given a directed graph G = ¹V; Eº. For an order pair ¹i; jº, where i utilizes first & second-order proximity to extract graph information. and j 2 V. If ¹i; jº 2 E, we say that order pair ¹i; jº is first-order edge. We not only consider basic first-order proximity to obtain neigh- If it exists any vertex k 2 V that satisfies order pairs ¹k;iº and ¹k; jº 2 borhood information, but also take second-order proximity into E or ¹i;kº and ¹j;kº 2 E, we define order pair ¹i; jº as second-order account, which is a complement for sparsity of first-order infor- edge.