Translating Embeddings for Knowledge Graph Completion with Relation Attention Mechanism
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Translating Embeddings for Knowledge Graph Completion with Relation Attention Mechanism Wei Qiany, Cong Fuy, Yu Zhuy, Deng Caiy]∗, Xiaofei He?y yState Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China ]Alibaba-Zhejiang University Joint Institute of Frontier Technologies ?Fabu Inc., Hangzhou, China fqwqjzju, [email protected], fzhuyu cad, [email protected], [email protected] Abstract Recently, embedding based methods [Zhu et al., 2016] that encode each object in knowledge graphs into a continuous Knowledge graph embedding is an essential prob- vector space present powerful effects. Thus, this kind of ap- lem in knowledge extraction. Recently, translation proach becomes increasingly popular. Among these method- based embedding models (e.g., TransE) have re- s, translation-based methods are the most popular for their ceived increasingly attentions. These methods try simplicity and effectiveness. They obtain state-of-the-art link to interpret the relations among entities as trans- prediction performance. Inspired by the success of word em- lations from head entity to tail entity and achieve bedding [Mikolov et al., 2013; Zou et al., 2013], TransE [Bor- promising performance on knowledge graph com- des et al., 2013] learns vector embeddings for both entities pletion. Previous researchers attempt to transform and relationships. For a triple (h; r; t) which means head en- the entity embedding concerning the given relation tity h has relation r with tail entity t, the basic idea of TransE for distinguishability. Also, they naturally think the model is that this triple induce a functional relation corre- relation-related transforming should reflect atten- sponds to a translation of the embeddings of entities in vector tion mechanism, which means it should focus on space Rk, that is, h + r ≈ t. only a part of the attributes. However, we found previous methods are failed with creating attention There are a variety of problems for TransE, and a series mechanism, and the reason is that they ignore the of works are proposed to make up its shortcomings, such hierarchical routine of human cognition. When pre- as TransH [Wang et al., 2014], TransR [Lin et al., 2015], dicting whether a relation holds between two en- TranSparse [Ji et al., 2016], etc. Since TransE could not tities, people first check the category of entities, model one-to-many, many-to-one and many-to-many rela- then they focus on fined-grained relation-related at- tions, TransH is proposed to address this issue by introduc- tributes to make the decision. In other words, the ing relation-related projection vectors and projecting entities attention should take effect on entities filtered by to relation-related hyperplanes. However, different relations the right category. In this paper, we propose a may focus on just a few different attributes of entities. Tran- novel knowledge graph embedding method named sR is proposed to address this issue by introducing relation- TransAt to learn the translation based embedding, related transformation matrix Mr and transform entity vec- relation-related categories of entities and relation- tors to different relation-spaces. Still, relations are heteroge- related attention simultaneously. Extensive exper- neous, and imbalance and thus TranSparse is proposed to ad- iments show that our approach outperforms state- dress this issue by introducing complex relation-related trans- of-the-art methods significantly on public datasets, formation matrix. and our method can learn the true attention varying We believe human cognition for a relation follows a hi- among relations. erarchical routine and there exist categories among entities. As shown in Figure 2, we will first analyze whether the cat- 1 Introduction egory of candidate entities is reasonable by some attributes of entities and then determine whether the specific relation- Knowledge graphs [Bakker, 1987] are directed graphs con- ship is valid by other attributes. However, no matter how the sisting of entities as nodes and relations as edges. Typical- previous methods transform the entity embeddings given cor- ly, a knowledge graph encodes structured information of mil- responding relations, they finish cognition in one step. As lions of entities and billions of relational facts. But it is still a consequence, they must take some non-relation-related at- far from completeness. Therefore, the purpose of knowledge tributes (but maybe category-related) into consideration since graph completion is to predict new relations between entities they need to analyze category in the relationship discrimina- according to the existing knowledge graph. tion step and can not focus on just a few different attributes of ∗corresponding author entities. We also prove this experimentally. Figure 1 shows 4286 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) 2 Related Work Notation. Here we briefly introduce mathematical notation- s used in this paper. We denote a triplet by (h; r; t), their embeddings by bold lower case letters h, r, t and matrices by bold upper case letters, such as M. Score function is repre- sented by fr(h; t). Figure 1: Transformation matrices learned by TransR. These are some randomly selected transformation matrices learned by Tran- 2.1 Translation-based Models sR on FB15k. Each entry of matrix is multiplied by 255 and each TransE [Bordes et al., 2013] holds the view that the functional matrix is shown as a grayscale image. The numbers below the fig- relation induced by a triple (h; r; t) corresponds to a transla- ures are the relation IDs. Actually, all transformation matrices are similar and approximate to the unit matrix. This means TransR fails tion of the embeddings of entities. Specifically, the entities to learn transformations only focusing on part of dimensions. h and t are learned to be low-dimensional embeddings, and their relationship r is represented as a translation in the em- bedding space, i.e. h + r ≈ t. Hence, the score function is 2 some examples of transformation matrices learned by Tran- fr(h; t) = jjh + r − tjjl1=2. TransE is effective and applies sR. They are approximate to the unit matrix. It means their well to irreflexive and one-to-one relations. However, since attention mechanisms do not work. the representation of an entity is the same when involved in In this paper, we propose a knowledge graph embedding any relations in TransE, it has problems when dealing with re- method with attention mechanism named TransAt (Transla- flexive or many-to-one/one-to-many/many-to-many relations tion with Attention). The basic idea of TransAt is illustrated [Wang et al., 2014]. in Figure 2. We hold the view that when we consider whether TransH [Wang et al., 2014] tries to overcome the problem- a relationship described by a triple is true, it is done through s of TransE by enabling an entity to have distributed repre- a two-stage process. In the first stage, we will check whether sentations when the entity is involved in different relations. the categories of head and tail entities with respect to a given Specifically, given a triplet (h; r; t), the entity embeddings h relation make sense. Then, for those possible compositions, and t are firstly projected to the relation-specific hyperplane we need to consider whether the relationship holds based on to obtain their projections h?r and t?r. The translation em- the relation-related dimensions (attributes). Thus, we use the bedding dr is simply posited in the hyperplane. Finally, its following function to evaluate a triple (h; r; t). 2 score function is defined as fr(h; t) = jjh?r +dr −t?rjjl1=2. [ et al. ] Pr(h) + r − Pr(t) h 2 Hr; t 2 Tr TransR/CTransR Lin , 2015 proposes that an entity fr(h; t) = may have multiple attributes and various relations may fo- 1 otherwise cus on different attributes of entities. TransE and TransH use a common embedding space for both entities and relations, where h; t are two entity vectors, r is a relation vector, Hr(Tr) which is insufficient for modeling this. To address this is- is capable head (tail) candidate set for relation r, Pr is a pro- jection which only keeps dimensions related to r. sue, TransR/CTransR models entities and relations in differ- We evaluate our models with the tasks of link predic- ent embedding spaces, i.e. the entity space and the relation tion and triple classification on public benchmark datasets: space, and performs the translation in the relation space. It WN18, FB15k, WN11, and FB13. Experimental results show sets a transfer matrix Mr for each relation r to map enti- that our approach outperforms state-of-the-art models signif- ty embeddings to the relation space. The score function is 2 icantly. fr(h; t) = jjMrh + r − Mrtjjl1=2. It’s worthwhile to highlight our main contributions of this TransD [Ji et al., 2015] considers the diversity of not only paper as follows: relations but also entities. In addition, it has fewer parameter- • We point out that previous approaches can not learn an s and replaces matrix-vector multiplication by vector-vector indeed attention mechanism and analyze the reasons; multiplication for an entity-relation pair, which is more scal- able and can be applied to large scale graphs. • We propose a method which genuinely pays attention to KG2E [He et al., 2015b] switches to density-based em- a part of the dimensions; bedding. This is different from previous translation models, • We propose an asymmetric mechanism which can fur- which focus on point-based embedding. The point-based em- ther improve the performance; bedding methods ignore that different entities and relations may contain different (un)certainties. To address this, KG2E • Our experimental results outperform state-of-the-art uses Gaussian embedding to model the data uncertainty.