Towards Understanding the Geometry of Knowledge Graph Embeddings
Total Page:16
File Type:pdf, Size:1020Kb
Towards Understanding the Geometry of Knowledge Graph Embeddings Chandrahas Aditya Sharma Partha Talukdar Indian Institute of Science Indian Institute of Science Indian Institute of Science [email protected] [email protected] [email protected] Abstract predicates, e.g., (Bill de Blasio, mayorOf, New York City). Knowledge Graph (KG) embedding has The problem of learning embeddings for emerged as a very active area of research Knowledge Graphs has received significant atten- over the last few years, resulting in the tion in recent years, with several methods being development of several embedding meth- proposed (Bordes et al., 2013; Lin et al., 2015; ods. These KG embedding methods rep- Nguyen et al., 2016; Nickel et al., 2016; Trouil- resent KG entities and relations as vectors lon et al., 2016). These methods represent enti- in a high-dimensional space. Despite this ties and relations in a KG as vectors in high di- popularity and effectiveness of KG em- mensional space. These vectors can then be used beddings in various tasks (e.g., link pre- for various tasks, such as, link prediction, entity diction), geometric understanding of such classification etc. Starting with TransE (Bordes embeddings (i.e., arrangement of entity et al., 2013), there have been many KG embed- and relation vectors in vector space) is un- ding methods such as TransH (Wang et al., 2014), explored – we fill this gap in the paper. TransR (Lin et al., 2015) and STransE (Nguyen We initiate a study to analyze the geome- et al., 2016) which represent relations as trans- try of KG embeddings and correlate it with lation vectors from head entities to tail entities. task performance and other hyperparame- These are additive models, as the vectors interact ters. To the best of our knowledge, this is via addition and subtraction. Other KG embed- the first study of its kind. Through exten- ding models, such as, DistMult (Yang et al., 2014), sive experiments on real-world datasets, HolE (Nickel et al., 2016), and ComplEx (Trouil- we discover several insights. For example, lon et al., 2016) are multiplicative where entity- we find that there are sharp differences be- relation-entity triple likelihood is quantified by a tween the geometry of embeddings learnt multiplicative score function. All these methods by different classes of KG embeddings employ a score function for distinguishing correct methods. We hope that this initial study triples from incorrect ones. will inspire other follow-up research on In spite of the existence of many KG embed- this important but unexplored problem. ding methods, our understanding of the geometry 1 Introduction and structure of such embeddings is very shallow. A recent work (Mimno and Thompson, 2017) an- Knowledge Graphs (KGs) are multi-relational alyzed the geometry of word embeddings. How- graphs where nodes represent entities and typed- ever, the problem of analyzing geometry of KG edges represent relationships among entities. Re- embeddings is still unexplored – we fill this impor- cent research in this area has resulted in the de- tant gap. In this paper, we analyze the geometry of velopment of several large KGs, such as NELL such vectors in terms of their lengths and conicity, (Mitchell et al., 2015), YAGO (Suchanek et al., which, as defined in Section4, describes their po- 2007), and Freebase (Bollacker et al., 2008), sitions and orientations in the vector space. We among others. These KGs contain thousands of later study the effects of model type and training predicates (e.g., person, city, mayorOf(person, hyperparameters on the geometry of KG embed- city), etc.), and millions of triples involving such dings and correlate geometry with performance. We make the following contributions: by (Toutanova et al., 2015). In this paper, we study the effect of the number of negative samples on • We initiate a study to analyze the geometry of KG embedding geometry as well as performance. various Knowledge Graph (KG) embeddings. In addition to the additive and multiplicative To the best of our knowledge, this is the first KG embedding methods already mentioned in study of its kind. We also formalize various Section1, there is another set of methods where metrics which can be used to study geometry the entity and relation vectors interact via a neu- of a set of vectors. ral network. Examples of methods in this cate- • Through extensive analysis, we discover sev- gory include NTN (Socher et al., 2013), CONV eral interesting insights about the geometry (Toutanova et al., 2015), ConvE (Dettmers et al., of KG embeddings. For example, we find 2017), R-GCN (Schlichtkrull et al., 2017), ER- systematic differences between the geome- MLP (Dong et al., 2014) and ER-MLP-2n (Rav- tries of embeddings learned by additive and ishankar et al., 2017). Due to space limitations, multiplicative KG embedding methods. in this paper we restrict our scope to the analysis • We also study the relationship between geo- of the geometry of additive and multiplicative KG metric attributes and predictive performance embedding models only, and leave the analysis of of the embeddings, resulting in several new the geometry of neural network-based methods as insights. For example, in case of multiplica- part of future work. tive models, we observe that for entity vec- tors generated with a fixed number of neg- 3 Overview of KG Embedding Methods ative samples, lower conicity (as defined in Section4) or higher average vector length For our analysis, we consider six representative lead to higher performance. KG embedding methods: TransE (Bordes et al., 2013), TransR (Lin et al., 2015), STransE (Nguyen Source code of all the analysis tools de- et al., 2016), DistMult (Yang et al., 2014), HolE veloped as part of this paper is available (Nickel et al., 2016) and ComplEx (Trouillon at https://github.com/malllabiisc/ et al., 2016). We refer to TransE, TransR and kg-geometry. We are hoping that these re- STransE as additive methods because they learn sources will enable one to quickly analyze the embeddings by modeling relations as translation geometry of any KG embedding, and potentially vectors from one entity to another, which results in other embeddings as well. vectors interacting via the addition operation dur- ing training. On the other hand, we refer to Dist- 2 Related Work Mult, HolE and ComplEx as multiplicative meth- In spite of the extensive and growing literature on ods as they quantify the likelihood of a triple be- both KG and non-KG embedding methods, very longing to the KG through a multiplicative score little attention has been paid towards understand- function. The score functions optimized by these ing the geometry of the learned embeddings. A re- methods are summarized in Table1. cent work (Mimno and Thompson, 2017) is an ex- Notation: Let G = (E; R; T ) be a Knowledge ception to this which addresses this problem in the Graph (KG) where E is the set of entities, R is context of word vectors. This work revealed a sur- the set of relations and T ⊂ E × R × E is the set prising correlation between word vector geometry of triples stored in the graph. Most of the KG em- d and the number of negative samples used during bedding methods learn vectors e 2 R e for e 2 E, d training. Instead of word vectors, in this paper we and r 2 R r for r 2 R. Some methods also dr×de focus on understanding the geometry of KG em- learn projection matrices Mr 2 R for rela- beddings. In spite of this difference, the insights tions. The correctness of a triple is evaluated using we discover in this paper generalizes some of the a model specific score function σ : E × R × E ! observations in the work of (Mimno and Thomp- R. For learning the embeddings, a loss function son, 2017). Please see Section 6.2 for more details. L(T ; T 0; θ), defined over a set of positive triples Since KGs contain only positive triples, nega- T , set of (sampled) negative triples T 0, and the tive sampling has been used for training KG em- parameters θ is optimized. beddings. Effect of the number of negative sam- We use small italics characters (e.g., h, r) to ples in KG embedding performance was studied represent entities and relations, and correspond- Type Model Score Function σ(h; r; t) TransE (Bordes et al., 2013) − kh + r − tk1 Additive TransR (Lin et al., 2015) − kMrh + r − Mrtk1 − M 1h + r − M 2t STransE (Nguyen et al., 2016) r r 1 DistMult (Yang et al., 2014) r>(h t) Multiplicative HolE (Nickel et al., 2016) r>(h ? t) ComplEx (Trouillon et al., 2016) Re(r>(h ¯t)) Table 1: Summary of various Knowledge Graph (KG) embedding methods used in the paper. Please see Section3 for more details. 1 2 1 2 ing bold characters to represent their vector em- STransE with Mr = Mr = Id and Mr = Mr = beddings (e.g., h, r). We use bold capitalization Mr, respectively. (e.g., V) to represent a set of vectors. Matrices are represented by capital italics characters (e.g., M). 3.2 Multiplicative KG Embedding Methods This is the set of methods where the vectors inter- 3.1 Additive KG Embedding Methods act via multiplicative operations (usually dot prod- This is the set of methods where entity and rela- uct). The score function for these models can be tion vectors interact via additive operations. The expressed as score function for these models can be expressed as below σ(h; r; t) = r>f(h; t) (2) 1 2 σ(h; r; t) = − M h + r − M t (1) d r r 1 where h; t; r 2 F are vectors for head entity, tail entity and relation respectively.