Relational Graph Convolutional Networks: a Closer Look
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web 0 (0) 1 1 IOS Press 1 1 2 2 3 3 4 Relational Graph Convolutional Networks: A 4 5 5 6 6 7 Closer Look 7 8 8 a,* a b a 9 Thiviyan Thanapalasingam , Lucas van Berkel , Peter Bloem and Paul Groth 9 10 a Informatics Institute, University of Amsterdam, 1098 XH Amsterdam, The Netherlands 10 11 b Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands 11 12 12 13 13 14 14 15 15 16 Abstract. In this paper, we describe a reproduction of the Relational Graph Convolutional Network (RGCN). Using our repro- 16 17 duction, we explain the intuition behind the model. Our reproduction results empirically validate the correctness of our imple- 17 mentations using benchmark Knowledge Graph datasets on node classification and link prediction tasks. Our explanation pro- 18 18 vides a friendly understanding of the different components of the RGCN for both users and researchers extending the RGCN 19 19 approach. Furthermore, we introduce two new configurations of the RGCN that are more parameter efficient. The code and 20 datasets are available at https://github.com/thiviyanT/torch-rgcn. 20 21 21 22 Keywords: Relational Graphs, Graph Convolutional Network, Representation Learning, Knowledge Graphs 22 23 23 24 24 25 25 26 26 27 27 28 1. Introduction 28 29 29 30 Knowledge Graphs are graph-structured knowledge bases, representing entities and relations between pairs of 30 31 entities [1]. They have become critical for large-scale information systems for tasks ranging from question answering 31 32 to search [2]. The ability to perform statistical relational learning over such data enables new links, properties, and 32 33 types to be inferred [1], and performing this at a large scale is fundamental for the advancement of the Semantic 33 34 Web. Additionally, models that can be applied to Knowledge Graphs can also be applied to Relational Database 34 35 Management Systems, because there is a one-to-one mapping between the two [3]. 35 36 Relational Graph Convolution Networks (RGCNs) [4] are message passing frameworks for learning valuable 36 37 latent features of relational graphs. RGCNs [4] have become been widely adopted for combining Knowledge Graphs 37 1 38 with machine learning applications for their uses include Knowledge Graph refinement [5], soft-query answering 38 39 [6], and logical reasoning [7]. The original reference code for the RGCN is built on older platforms that are no 39 2 40 longer supported. Furthermore, other reproductions [8, 9] are incomplete. Existing descriptions of the model are 40 41 are mathematically dense, with little exposition, and assume prior knowledge about geometric deep learning. 41 42 We reproduced the model in PyTorch, a widely-used framework for deep learning [11]. Using our reproduction, 42 43 we provide a thorough and friendly description for a wider audience. The contributions of this paper are as follows: 43 44 1.A reproduction of the experiments as described in the original paper by Schlichtkrull et al. [4]; 44 45 2. A new publicly available PyTorch implementation of the model called Torch-RGCN; 45 46 46 47 47 * 48 Corresponding author. E-mail: [email protected]. 48 1The original paper has received over 1200 citations. 49 49 2At the time of writing, we are aware of additional implementations in PyTorch Geometric [9], which reproduces only the RGCN layer, 50 and in Deep Graph Library [10], which provides the node classification and link prediction model. However, they are not focused on scientific 50 51 reproducibility. 51 1570-0844/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 T. Thanapalasingam et al. / Relational Graph Convolutional Networks: A Closer Look 1 3. A description of the subtleties of the RGCN that are crucial for its understanding, use and efficient implemen- 1 2 tation; 2 3 4. New configurations of the model that are parameter efficient. 3 4 4 The rest of the paper is organized as follows. We begin with an overview of the related work in Section 2. Then, 5 5 we introduce the RGCN in Section 3, followed by a description of our reimplementation (Section 4). We then 6 6 proceed to describe the reproduction of node classification (Section 5) and link prediction models with associated 7 7 experiments (Section 6). These sections include the model variants mentioned above. In Section 7, we discuss the 8 8 lessons learned from reproducing the original paper and the implications of our findings. Finally, we discuss our 9 9 results and conclude. 10 10 11 11 12 12 2. Related Work 13 13 14 14 Machine learning over Knowledge Graphs involves learning low-dimensional continuous vector representations, 15 15 called Knowledge Graph Embeddings. Here, we briefly review relevant works in the literature. 16 16 Commonly, Knowledge Graph Embedding (KGE) models are developed and tested on link prediction tasks. Two 17 17 18 key examples of KGEs are TransE [12], in which relations are translational operations used learned to entities as 18 19 low-dimensional vectors, and DistMult [13] where the likelihood of a triple is quantified by a multiplicative scoring 19 et al. 20 function. Ruffinelli [14] provide an overview of the most popular KGE methods and their relative performance. 20 21 The RGCN is a type of Knowledge Graphs Embedding model. However, RGCNs are different from traditional 21 22 link predictors, such as TransE and DistMult, because RGCNs explicitly uses nodes’ neighborhood information for 22 23 learning vector representations for downstream tasks [15]. 23 24 Besides RGCNs, there are other graph embedding models for relational data. Relational Graph Attention Net- 24 25 works uses self-attention layers to learn attention weights of edges in relational graphs but yields similar, or in 25 26 some cases poor, performance when compared to the RGCN. [16]. Heterogenous Information Network (HIN) ex- 26 27 ploit meta-paths (a sequence consisting of node types and edge types for modelling particular relationships) to low- 27 28 dimensional representation of networks [17]. HIN do not use message passing and their expressivity depends on the 28 29 selected meta-paths. 29 30 Beyond node classification and link prediction, RGCN’s have other practical uses. For example, Daza et al. [6] 30 31 explored RGCNs for soft query answering by embedding queries structured as relational graphs which typically 31 32 consists of a few nodes. Recently, Hsu et al. shows that RGCN can be used to extract dependency trees from natural 32 33 language [18]. 33 34 34 35 35 36 3. Relational Graph Convolutional Network 36 37 37 38 In [4], Schlichtkrull et al. introduced the RGCN as a convolution operation that performs message passing on 38 39 multi-relational graphs. In this section, we are going to explain the Graph Convolutional [19] and how they can be 39 40 extended for relational graphs. We will describe message passing in terms of matrix multiplications and explain the 40 41 intuition behind these operations. 41 42 42 43 3.1. Message Passing 43 44 44 3 45 We will begin by describing the basic Graph Convolutional Network (GCN) layer for directed graphs. This will 45 46 serve as the basis for the RGCN in Section 3.2. 46 47 The GCN [19] is a graph-to-graph layer that takes a set of vectors representing nodes as input, together with the 47 48 structure of the graph and generates a new collection of representations for nodes in the graph. A directed graph is 48 49 defined as G = (V; E), where V is a set of vertices (nodes) and hi; ji 2 E is a set of tuples indicating the presence of 49 50 50 51 3The original Graph Convolutional Network [19] operates over undirected graphs. 51 T. Thanapalasingam et al. / Relational Graph Convolutional Networks: A Closer Look 3 1 Input Message Passing Output 1 2 2 3 b 3 c 4 4 5 a i 5 6 e 6 7 d 7 8 8 9 9 10 data edges added edges 10 11 incoming edges self-looping edges 11 12 12 13 13 14 Fig. 1. A schematic diagram of message passing in a directed graph with 6 nodes. hi is a vector that represents the node embedding of the node 14 i (in orange). h(l) and h(l+1) show the node embedding before and after the message passing step, respectively. The neighboring nodes are 15 i i 15 labelled from a to e. 16 16 17 17 18 directed edges, pointing from node i to j. Equation 1 shows the message passing rule of a single layer GCN for an 18 19 undirected graph, G. 19 20 20 21 21 22 H = σ (AXW) ; (1) 22 23 23 24 Here, X is a node feature matrix, W represents the weight parameters, and σ is a non-linear activation function. A 24 25 is a matrix computed by row-normalizing4 the adjacency matrix of the graph G. The row normalization ensures that 25 26 the scale of the node feature vectors do not change significantly during message passing. The node feature matrix, 26 27 X, indicate the presence or absence of a particular feature on a node. 27 28 Typically, more than a single convolutional layer is required to capture the complexity of large graphs. In these 28 29 cases, the RGCN layers are stacked one after another so that the output of the preceeding RGCN layer H(l−1) is 29 30 used as the input for the current layer H(l), as shown in Equation 2. In our work, we will use superscript l to denote 30 31 the current layer.