Group Representation Theory for Knowledge Graph Embedding

Group Representation Theory for Knowledge Graph Embedding Chen Cai, Yunfeng Cai, Mingming Sun, Zhiqiang Xu Cognitive Computing Lab Baidu Research No.10 Xibeiwang East Road, Beijing, 10085, China [email protected], {caiyunfeng,sunmingming01,xuzhiqiang04}@baidu.com Abstract Knowledge1 graph embedding has recently become a popular way to model relations and infer missing links. In this paper, we present a group theoretical perspective of knowledge graph embedding, connecting previous methods with different group actions. Furthermore, by utilizing Schur’s lemma from group representation theory, we show that the state-of-the-art embedding method RotatE has the capacity to model relations from any finite Abelian group. 1 Introduction Knowledge graphs are collections of factual triples, where each triple (h; r; t) represents a relation r between a head entity h and a tail entity t. Examples of real-world knowledge graphs include Freebase, Yago [13], and WordNet [9]. Knowledge graphs are potentially useful for a variety of applications such as question-answering [6], information retrieval [18], recommender systems [22], and natural language processing [20]. Research on knowledge graphs is attracting growing interests in both academia and industry communities. Since knowledge graphs are usually incomplete, a fundamental problem for knowledge graphs is predicting the missing links. Recently, extensive studies have been done on learning low-dimensional representations of entities and relations for missing link prediction (a.k.a. knowledge graph embedding). These methods have been shown to be scalable and effective compared to previous methods [2, 8, 10, 12, 15, 17]. The general intuition of these methods is to model and infer the connectivity patterns in knowledge graphs according to the observed knowledge facts. For example, some relations are symmetric (e.g., marriage) while others are anti-symmetric (e.g., filiation); some relations are the inverse of other relations (e.g., hypernym and hyponym); and some relations may be composed by others (e.g., my mother’s husband is my father). It is critical to find ways to model and infer these patterns, i.e., symmetry/anti-symmetry, inversion, and composition, from the observed facts in order to predict missing links. We argue that relation patterns, symmetry/anti-symmetry, inversion, and composition, have natural correspondence to notions in group theory. Many existing embedding models can be interpreted as modeling relations as elements of a group acting on a properly chosen space. We believe that the algebraic (group) perspective of knowledge graph embedding is interesting and worth investigating as an attempt to shed light on the power and limitation of knowledge graph embedding. Moreover, it complements existing statistical approaches [11,24] and deepens understanding. Our main contributions in this work can be summarized as follows: • We revisit knowledge graph embedding from the perspective of group representation theory. To the best of our knowledge, this connection has not been made explicitly before. b731afe6-XeAuIkRK1GnSwCkLSQI1eeCZYQrXRz49U6CKDw==-500ddfd78926 1Chen Cai’s work was conducted while interning with Baidu Research from May to the end of August 2019. Chen Cai is a graduate student from the Department of Computer Science & Engineering, Ohio State University. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. • We provide a self-contained introduction of concepts and examples from group theory and group representation theory. • We characterize the representation power of a recent embedding method, i.e., RotatE. By utilizing tools from group representation theory, we show that RotatE has the capacity to represent any finite Abelian groups. 2 Background The general methodology of Knowledge Graph Embedding(KGE) is to define a score function for triples. Formally, letting E denote the set of entities and R the set of relations, a knowledge graph is a collection of factual triples (h; r; t), where h; t 2 E and r 2 R. Since entity embeddings are usually represented as vectors, the score function takes the form f(h; r; t), where h and t are head and tail entity embeddings. We list the score functions for a few popular embedding methods as below: • TransE [1]: f(h; r; t) = −kh + r − tk, h; r; t 2 Rk. • TorusE [4]: f(h; r; t) = −kh + r − tk, h; r; t 2 Tk, where Tk is a k-dimensional torus. • DistMul [21]: f(h; r; t) = hT diag(r)t, h; r; t 2 Rk. • ComplEx [16]: f(h; r; t) = Re(hT diag(r)t¯), h; r; t 2 Ck. d • RotatE [14]: f(h; r; t) = −kh ◦ r − tk, h; r; t 2 C ; krik = 1, where ◦ denotes Hadamard product. Most models can capture only a portion of relation patterns. For example, TransE represents each relation as a translation from source entities and target entities, and thus implicitly models inversion and composition of relations. However, it cannot model symmetric relations. ComplEx [7] extends DistMult by introducing complex embeddings so as to better model asymmetric relations, but it cannot infer the composition pattern. Particularly, RotatE is the first model able to cover all the relation patterns. We thus mainly focus on this model in this paper. 2.1 Relation Pattern We formally define common relation patterns that many embedding methods try to model in knowledge graphs. Definition 2.1. A relation clause r is symmetric if 8x; y; r(y; x) =) r(y; x): A clause in this form is a symmetry pattern. Definition 2.2. A relation clause r is anti-symmetric if 8x; y; r(x; y) =):r(y; x): A clause in this form is an anti-symmetry pattern. Definition 2.3. A relation r1 is inverse to r2 if 8x; y; r2(x; y) =) r1(y; x): A clause in this form is an inversion pattern. Definition 2.4. A relation r1 is composed of relation r2 and relation r3 if 8x; y; z; r1(x; y) ^ r2(y; z) =) r3(x; z). A clause in this form is a composition pattern. Definition 2.5. A relation r1 the subrelation of r2 if 8x; y; r1(x; y) =) r2(x; y): A clause in this form is a subrelation pattern. 3 Group and Relation Modeling Due to space limitation, we only provide the definition of group here. Readers who are not familiar with group theory are encouraged to read the appendix, where a self-contained introduction of group theory is given. Definition 3.1. Group. Let G be a set. We say that G is a group with law of composition ? if the b731afe6-XeAuIkRK1GnSwCkLSQI1eeCZYQrXRz49U6CKDw==-500ddfd78926 following axioms hold: (G1, closure of ?): the assignment (g; h) ! g ? h 2 G, defines a function G × G ! G. We call g ? h the product of g and h. 2 (G2, existence of identity): there exists a eG 2 G such that for every g 2 G, we have g ? eG = g = eG ? g. We call eG an identity of G. (G3, existence of inverse): for every g 2 G, there exists an h 2 G such that h ? g = eG = g ? h. We call such h an inverse of g. (G4, associativity): for any g; h; k 2 G we have (g ? h) ? k = g ? (h ? k). As we can see, the definitions of inversion and composition pattern of the relation are similar to G3 and G4 in the definition of the group. A symmetry pattern can be modeled as a group element of order 2. (In the case of RotatE, it corresponds to the rotation of magnitude π.) We summarize the correspondence in Table 1. For embedding models like TransE, TorusE, RotatE, and DihEdral [19], relations are modeled as elements of groups acting on certain spaces. In particular, TransE models relations as elements of the translation group acting on Rn. TorusE models relations as elements of the translation group acting on the high dimensional torus Tn. RotatE models relations as elements of the group of diagonal complex matrices (where each entry is of length 1) acting on Cn. DihEdral models relations as elements of the Dihedral group acting on Rn. Table 1: The correspondence between relation modeling and group theory. Relation Group Example −1 Symmetry r(x; y) =) r(y; x) gr = gr Marriage −1 Antisymmetry r(x; y) =):r(y; x) gr 6= gr Filiation r (x; y) =) r (y; x) g = g−1 Inverse 1 2 r1 r2 Hypernym and hyponym Composition r1(x; y) ^ r2(y; z) =) r3(x; z) gr1 ? gr2 = gr3 Mother’s husband is father Subrelation r1(x; y) =) r2(x; y) - Father of and parent of Limitation: Combining composition and subrelation patterns, cases like r1(x; y) ^ r1(y; z) =) r1(x; z) would happen. An example of this type of relation pattern can be sibling of or coworker with. However, in the group formulation, gr1 ? gr1 = gr1 . Therefore, gr1 is modeled as the identity in the group. Due to the uniqueness of identity, all relations that exhibit the above pattern will be modeled as the same element in the group. This certainly limits the modeling capacity and is shared by all models that can be formulated under the group action perspective. 4 Main Theorem Assuming that readers are familiar with concepts defined in the appendix, we are now ready to present two lemmas and the main theorem. Lemma 4.1. Schur’s Lemma. The only self-isomorphisms of a finite dimensional irreducible representation of a group G on the complex vector space are given by scalar multiplication. Proof. Fix an isomorphism φ : V ! V of complex representations of G. The linear map φ must have an eigenvalue λ over C, and also some non-zero eigenvector v. But then the G-linear map (because both φ(x) and λx are both linear maps) V ! V x ! φ(x) − λx has the vector v in its kernel, which is again a representation of G.

Load more