Two Sides of the Same Coin?

Semantic Web 0 (0) 1 1 IOS Press 1 1 2 2 3 3 4 Knowledge Graph Embedding for Data 4 5 5 6 Mining vs. Knowledge Graph Embedding for 6 7 7 8 8 9 Link Prediction – 9 10 10 11 Two Sides of the Same Coin? 11 12 12 13 Jan Portisch a,b and Nicolas Heist a and Heiko Paulheim a,* 13 14 a Data and Web Science Group, University of Mannheim, Germany 14 15 E-mails: [email protected], [email protected], 15 16 [email protected] 16 17 b SAP SE, Germany 17 18 E-mail: [email protected] 18 19 19 20 20 21 21 22 22 23 Abstract. Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been 23 24 proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both 24 25 lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks and evaluation 25 methodologies. In this paper, we argue that both tasks are actually related, and we show that the first family of approaches can 26 26 also be used for the second task and vice versa. In two series of experiments, we provide a comparison of both families of 27 27 approaches on both tasks, which, to the best of our knowledge, has not been done so far. Furthermore, we discuss the differences 28 in the similarity functions evoked by the different embedding approaches. 28 29 29 30 Keywords: Knowledge Graph Embedding, Link Prediction, Data Mining, RDF2vec 30 31 31 32 32 33 33 34 1. Introduction in the knowledge graph as good as possible. The evalu- 34 35 ations of this kind of approaches are always conducted 35 36 In the recent past, the topic of knowledge graph em- within the knowledge graph, using the existing knowl- 36 37 bedding – i.e., projecting entities and relations in a edge graph assertions as ground truth. 37 38 knowledge graph into a numerical vector space – has 38 39 gained a lot of traction. An often cited survey from 39 40 2017 [1] lists already 25 approaches, with new mod- 40 41 els being proposed almost every month, as depicted in 41 42 Fig. 1. 42 43 Even more remarkably, two mostly disjoint strands 43 44 of research have emerged in that vivid area. The first 44 45 family of research works focus mostly on link predic- 45 46 tion [2], i.e., the approaches are evaluated in a knowl- 46 47 edge graph refinement setting [3]. The optimization 47 48 goal here is to distinguish correct from incorrect triples 48 49 49 50 *Corresponding author. E-mail: Fig. 1. Publications with Knowledge Graph Embedding in their title 50 51 [email protected]. or abstract, created with dimensions.ai 51 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved 2 J. Portisch et al. / Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction 1 A second strand of research focuses on the embed- The second family among the link prediction em- 1 2 ding of entities in the knowledge graph for downstream beddings are semantic matching [1] or matrix factor- 2 3 tasks outside the knowledge graph, which often come ization or tensor decomposition [9] models. Here, a 3 4 from the data mining field – hence, we coin this fam- knowledge graph is represented as a three-dimensional 4 5 ily of approaches embeddings from data mining. Ex- tensor, which is decomposed into smaller matrices or 5 6 amples include: the prediction of external variables for tensors. The reconstruction operation can then be used 6 7 entities in a knowledge graph [4], information retrieval for link prediction. 7 8 backed by a knowledge graph [5], or the usage of a The third and youngest family among the link pre- 8 9 knowledge graph in content-based recommender sys- diction embeddings are based on deep learning and 9 10 tems [6]. In those cases, the optimization goal is to cre- graph neural networks. Here, neural network train- 10 11 ate an embedding space which reflects semantic sim- ing approaches, such as convolutional neural networks, 11 12 ilarity as good as possible (e.g., in a recommender capsule networks, or recurrent neural networks, are 12 13 system, similar items to the ones in the user interest adapted to work with knowledge graphs [9]. 13 14 should be recommended). The evaluations here are al- While most of those approaches only consider 14 15 ways conducted outside the knowledge graph, based graphs with nodes and edges, most knowledge graphs 15 16 on external ground truth. also contain literals, e.g., strings and numeric values. 16 17 In this paper, we want to look at the commonali- [11] shows a survey of approaches which take such lit- 17 18 ties and differences of the two approaches. We look eral information into account. It is also one of the few 18 19 at two of the most basic and well-known approaches review articles which considers embedding methods 19 20 of both strands, i.e., TransE [7] and RDF2vec [4], from both research strands. 20 21 21 and analyze and compare their optimization goals in Link prediction is typically evaluated on a set of 22 22 a simple example. Moreover, we analyze the perfor- standard datasets, and uses a within-KG protocol, 23 23 mance of approaches from both families in the re- where the triples in the knowledge graph are divided 24 24 spective other evaluation setup: we explore the usage in a training, testing, and validation set. Prediction ac- 25 25 of link-prediction based embeddings for other down- curacy is then assessed on the validation set. Datasets 26 26 stream tasks based on similarity, and we propose a link commonly used for the evaluation are FB15k, which 27 27 prediction method based on RDF2vec. From those ex- is a subset of Freebase, and WN18, which is derived 28 28 periments, we derive a set of insights into the differ- from WordNet [7]. Since it has been remarked that 29 29 ences of the two families of methods, and a few rec- 30 those datasets contain too many simple inferences 30 ommendations on which kind of approach should be 31 due to inverse relations, the more challenging variants 31 used in which setting. 32 FB15k-237 [12] and WN18RR [13] have been pro- 32 33 posed. More recently, evaluation sets based on larger 33 34 knowledge graphs, such as YAGO3-10 [13] and DB- 34 35 2. Related Work pedia50k/DBpedia500k [14] have been introduced. 35 36 The second strand of research works, focusing on 36 37 As pointed out above, the number of works on the embedding for downstream tasks (which are often 37 38 knowledge graph embedding is legion, and enumerat- from the domain of data mining), is not as extensively 38 39 ing them all in this section would go beyond the scope reviewed, and the number of works in this area are still 39 40 of this paper. However, there have already been quite a smaller. One of the more comprehensive evaluations 40 41 few survey articles. is shown in [15], which is also one of the rare works 41 42 The first strand of research works – i.e., knowledge which includes approaches from both strands in a com- 42 43 graph embeddings for link prediction – has been cov- mon evaluation. They show that at least the three meth- 43 44 ered in different surveys, such as [1], and, more re- ods for link prediction used – namely TransE, TransR, 44 45 cently, [8], [9], and [10]. The systematic of those re- and TransH – perform inferior on downstream tasks, 45 46 views is similar, as they distinguish different families compared to approaches developed specifically for op- 46 47 of approaches: translational distance models [1] or ge- timizing for entity similarity in the embedding space. 47 48 ometric models [9] focus on link prediction as a geo- For the evaluation of entity embeddings optimized 48 49 metric task, i.e., projecting the graph in a vector space for entity similarity, there are quite a few use cases at 49 50 so that a translation operation defined for relation r on hand. The authors in [16] list a number of tasks, in- 50 51 a head h yields a result close to the tail t. cluding classification and regression of entities based 51 J. Portisch et al. / Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction 3 1 on external ground truth variables, entity clustering, as 3.1. Data Mining is based on Similarity 1 2 well as identifying semantically related entities. 2 3 Works which explicitly compare approaches from Predictive data mining tasks are predicting classes 3 4 both research strands are still rare. In [17], an in-KG or numerical values for instances. A typical application 4 5 scenario, i.e., the detection and correction of erroneous are recommender systems: given the set of items (e.g., 5 6 links, is considered. The authors compare RDF2vec movies, books) a user liked, predict whether s/he likes 6 7 (with an additional classification layer) to TransE and a particular other item (i.e., make a binary prediction: 7 8 DistMult on the link prediction task. The results are yes/no), or, given the (numerical) ratings a user gave 8 9 mixed: While RDF2vec outperforms TransE and Dist- to items in the past, predict the rating s/he will give to 9 10 other instances.

Two Sides of the Same Coin?

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support