<<

HopfE: Knowledge Graph Representation Learning using Inverse Hopf

Anson Bastos Kuldeep Singh Abhishek Nadgeri [email protected] [email protected] [email protected] IIT, Hyderabad Cerence GmbH and Zerotha Research RWTH Aachen and Zerotha Research India Germany Germany Saeedeh Shekarpour Isaiah Onando Mulang’ Johannes Hoffart [email protected] [email protected] [email protected] University of Dayton IBM Research and Zerotha Research Goldman Sachs USA Kenya Germany ABSTRACT 1 INTRODUCTION Recently, several Knowledge Graph (KGE) approaches Publicly available Knowledge Graphs (KGs) (e.g., Freebase [4] and have been devised to represent entities and relations in a dense Wikidata [41]) expose interlinked representation of structured knowl- vector space and employed in downstream tasks such as link predic- edge for several information extraction tasks such as entity linking, tion. A few KGE techniques address interpretability, i.e., mapping relation extraction, and question answering [2, 23, 34]. However, the connectivity patterns of the relations (symmetric/asymmetric, publishing ever-growing KGs is often confronted with incomplete- inverse, and composition) to a geometric interpretation such as ness, inconsistency, and inaccuracy [15]. Knowledge Graph Em- rotation. Other approaches model the representations in higher beddings (KGE) methods have been proposed to overcome these dimensional space such as four-dimensional space (4D) to enhance challenges by learning entity and relation representations (i.e., em- the ability to infer the connectivity patterns (i.e., expressiveness). beddings) in vector spaces by capturing latent structures in the data However, modeling relation and entity in a 4D space often comes at [45]. A typical KGE approach follows a two-step process [11, 42]: the cost of interpretability. This paper proposes HopfE, a novel KGE 1) determining the form of entity and relation representations, 2) approach aiming to achieve interpretability of inferred relations in use a transformation function to learn entity and relation repre- the four-dimensional space. We first model the structural embed- sentations in a vector space for calculating the plausibility of KG dings in 3D and view the relation operator as an triples via a scoring function. Intuitively, KGE approaches consider 푆푂 (3) rotation. Next, we map the entity embedding vector from a set of original "connectivity patterns" in the KGs and learn rep- a 3D Euclidean space to a 4D hypersphere using the inverse Hopf resentations that are approximately able to reason over the given , in which we embed the semantic information from the patterns (expressiveness property of a KGE model [43]). For example, KG ontology. Thus, HopfE considers the structural and semantic in a connectivity pattern such as symmetric relations (e.g. sibling), properties of the entities without losing expressivity and inter- anti-symmetric (e.g., father), inverse relations (e.g., husband and pretability. Our empirical results on four well-known benchmarks wife), or compositional relations (e.g., mother’s brother is uncle); achieve state-of-the-art performance for the KG completion task. expressiveness means to hold these patterns in a vector space. Limitations of state-of-the-art KGE Models: KGs are openly KEYWORDS available, and in addition they are largely explainable because of Representation learning, Embedding, Knowledge Graph the explicit semantics and structure (e.g., direct and labeled relation- ships between neighboring entities). In contrast, KG ACM Reference Format: are often referred to as sub-symbolic representation since they illus- Anson Bastos, Kuldeep Singh, Abhishek Nadgeri, Saeedeh Shekarpour, Isa- trate elements in the vector space, losing the original interpretability

arXiv:2108.05774v1 [cs.IR] 12 Aug 2021 iah Onando Mulang’, and Johannes Hoffart. 2021. HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations. In Proceedings of the deduced from logic or geometric interpretations [29]. Similar issue 30th ACM International Conference on Information and Knowledge Manage- has been observed in a few initial KGE methods [5, 21, 45] that ment (CIKM ’21), Nobember 1–5, 2021, Virtual Event, Australia. ACM, New lack interpretability, i.e., these models fail to link the connectivity York, NY, USA, 11 pages. https://doi.org/10.1145/XXXXXXXXXX patterns to a geometric interpretation such as rotation [35]. For example, in inverse relations, the angle of rotation is negative; in Permission to make digital or hard copies of all or part of this work for + = personal or classroom use is granted without fee provided that copies are not composition, the rotation angle is an addition (푟1 푟2 푟3), etc. made or distributed for profit or commercial advantage and that copies bear Other KGE approaches such as RotatE [35] and NagE [48] provide this notice and the full citation on the first page. Copyrights for components interpretability by introducing rotations in two-dimensional (2D) of this work owned by others than ACM must be honored. Abstracting with and three-dimensional (3D) spaces respectively, to preserve the credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request connectivity patterns. Conversely, QuatE [50] imposes a transfor- permissions from [email protected]. mation on the entity representations by multiplication CIKM ’21, November 1–5, 2021, Virtual Event, Australia with the relation representation. QuatE aims for higher dimensional © 2021 Association for Computing Machinery. ACM ISBN XXXXXXX-XX/20/10...$15.00 expressiveness by performing entity and relational transformation https://doi.org/10.1145/XXXXXXXXXX (a) NagE (b) QuatE (c) HopfE step 1: Rotation (d) HopfE step 2: Fibration

Figure 1: For a given triple in three-dimensions (i,j,k); NagE provides geometric interpretability by rotating the entities in a 3D space (Figure (a)). Conversely, QuatE (Figure (b)) achieves higher order expressiveness by transforming the entities us- ing the relational quaternion in 4D space, although it loses geometric interpretability. Here, the entities are stereo-graphically projected on a 3D for understanding. HopfE first performs a rotation in 3D followed by a fibration to 4D as shownin Figures (c) and (d). This allows retaining geometric interpretability in 4D as entities are now depicted as fibers. In Figure (d), the stereo-graphic projection of two fibrations is shown before performing the rotations. in hypercomplex space (4D in this case) [50]. enable converted to a on the 4D hypersphere using inverse Hopf expressive modeling in four-dimensional space and have more de- Fibration [14]. We propose to use the extra dimension gained by gree of freedom than transformation in complex plane [49, 50, 54]. Hopf mapping to also represent the entity attributes for semanti- Albeit expressive and empirically powerful, QuatE fails to provide cally enriching the embeddings. explicit and understandable geometric interpretations due to its de- Contributions. The key contributions of this work are: 1) Our pendency on unit quaternion-based transformation [22, 48]. Hence, work (theoretically and empirically) legitimizes Hopf Fibration’s it is an open research question as to how can a KGE approach main- application in KG embeddings. To the best of our knowledge, we are tain the geometric interpretability and simultaneously achieves the the first to incorporate Hopf Fibration mapping for any representa- model’s expressiveness in four-dimensional space? tion learning approach. 2) We bridge the important gap between Proposed Hypothesis: The essential research gap to achieve geometric interpretability and four-dimensional expressiveness of a interpretability in four-dimensional space for a KGE embedding knowledge graph embedding method. 3) We provide a way to in- motivates our work. QuatE transforms one (i.e., entity repre- duce entity semantics in 4D, keeping the rotational interpretability sentation) to another in the 4D hypersphere 푆3 [50]. Unlike QuatE, intact. 4) Our empirical results achieves comparable state-of-the-art we hypothesize that it is viable to map a three-dimensional point to performance on four well-known KG completion datasets. a circle in the four-dimensional space. Our rationale for the choices The structure of the paper is follows: Section 2 reviews the related are: 1) the relational rotations in 3D sphere 푆2 are already proven work. Section 3 formalizes the problem. Section 4 describes theo- to be interpretable (i.e., [9, 48]). However, transformations as done retical foundations and Section 5 describes our proposed approach. in QuatE lack geometric interpretability [22]. Hence, performing Section 6 describes experiment setup. Our results are discussed in relation rotations in 푆2 will allow us to inherit these proven inter- Section 7. We conclude in Section 8. pretable properties. 2) Mapping the entity representation of 푆2 to a circle in a higher dimension can illustrate these representations 2 RELATED WORK as fibers46 [ ] (i.e., manifolds) for maintaining geometric interpre- Considering KGE is an extensively studied topic in the literature tations while providing the expressiveness of 4D embeddings (cf., [1, 6, 9, 16, 52], we stick to the work closely related to our ap- Figure 1). In other words, by representing entities as fibers, we can proach. Previous works on KG embeddings are primarily divided 2 aim for a direct mapping between rotations in 푆 to transformation into translation and semantic matching models [42]. Translational 3 of fibers in the 푆 , without compromising interpretability in four- models use distance-based scoring functions which are often based 2 3 dimensional space. 3) By converting a point of 푆 to a circle in 푆 , on learning a translation from the head entity to the tail entity. The we aim to semantically enrich entity embeddings through KG on- first translation-based approach was TransE [5] which models the tology (entity-aliases, entity-types, descriptions, and entity-labels composition of the entities and relations as 푒1 + 푟 = 푒2. The second as textual literals [11]) onto the points on the fiber. category (i.e., semantic matching) uses similarity-based scoring Approach: In this paper, we present HopfE- a novel KGE ap- functions. The basic idea is to measure the plausibility of KG triples proach aiming to increase the expressiveness in higher dimensional by matching the latent semantics of entities and relations expressed space while maintaining the geometric interpretability (Figure 1). in their vector space representations. Distmult [47] is a semantic Specifically, in HopfE, we first rotate the relational embeddings matching model where the entities are represented as vectors. The in the 3D Euclidean space using Quaternion [39] rotations. In the relations are diagonal matrices, and the composition is a bilinear next step, the point (i.e., entity representation) in the 3D sphere is product of the entities and relations. ComplEx [38] is a modification of DistMult where the parameters are in the complex space. RotatE From hereon, "∗" is multiplication operation, "⊙" is an element- [35] was the first to propose the translation as a rotation ofthe wise product, "⊗" denotes Hamilton product. In the following, we entities in the complex plane for geometric interpretability. RotatE describe the details of rotation in 3D sphere. models the rotation in the 2D plane and has symmetric, antisymmet- Theorem 4.1. For a pure quaternion 푣 = 푝 푖 +푝 푗 +푝 푘 in 3D that ric, composition, and inversion properties. This rotation, however, 1 2 3 has real components as zero, the quaternion 푟 = 푎 + 푏푖 + 푐 푗 + 푑푘 with induces a strict compositional dependence between the relations. unit norm gives a rotational map 푟푣푟 −1, where the axis of rotation Yang et al. [48] introduced NagE, that provides a group theoretic denoted by bi+cj+dk, and the angle of rotation by 2 arccos 푎 perspective of rotations similar to RotatE by using Non-Abelian ( ) (푆푂 3 and 푆푈 2퐸) groups. These groups give a geometrical inter- Proof. We divide the proof into three sections where we prove pretation of rotations in 3D, and being non-commutative groups, that the norm of 푣 is preserved, where (푏, 푐,푑) is the eigen vector of they solve the problem of strict compositionality. QuatE [50] has the rotational map, and the angle of rotation can be given in terms modeled the relational rotations in the quaternion four-dimensional −1 of 푎. The rotational map 푅푟 (푣) = 푟푣푟 has the norm ∥푅푟 (푣)∥ = space, thus making the model more expressive. QuatE also stud- − − 푟푣푟 1 = ∥푟 ∥ ∗ ∥푣 ∥ ∗ 푟 1 . We have assumed 푟 to have unit ied that increasing spatial dimensions such as to does norm, it’s inverse will also have unit norm. Hence, the norm of 푣 is not increase performance compared to modeling relation and en- maintained. To prove that 푏, 푐,푑 is the eigen vector of the rotational tities in 4D. We position our work at the intersection of QuatE, and map, it is sufficient to show that the rotation of 푏푖 + 푐 푗 + 푑푘 under NagE approaches (cf. Figure 1). In contrast with QuatE that does the rotation map results in a scaled version of the vector: transformation using quaternion multiplication, we perform 푆푂 (3) rotations using quaternion relation to inherit the geometric inter- 푅푟 (푏푖 + 푐 푗 + 푑푘) = (푎 + 푏푖 + 푐 푗 + 푑푘)⊗ pretability of NagE. The inverse Hopf Fibration lets us perform the (푎 − 푏푖 − 푐 푗 − 푑푘) (푏푖 + 푐 푗 + 푑푘) ⊗ dimensional mapping which has found applications in diverse do- 푎2 + 푏2 + 푐2 + 푑2 mains such as rigid body mechanics [25] and quantum information = 푏푖 + 푐 푗 + 푑푘 theory [28]. Furthermore, there are extensive works for inducing schema and ontology information in the embedding models such as Hence we obtain the same vector with scaling operation. Now we entity types [20, 53], entity descriptions [44], and literals [11]. We prove that the angle of rotation can be given in terms of 푎. We can also aim to induce entity semantic properties (textual literals) into consider 푎 vector perpendicular to the axis vector 푏푖 + 푐 푗 + 푑푘. Let HopfE. Unlike existing methods [11], that concatenate the semantic 푤 = 푐푖 − 푏 푗 be this vector without loss of generality. Then, it’s and latent embeddings, we would like to look at the latent space rotation is given by: for representing entity attributes. 푅푟 (푤) = (푎 + 푏푖 + 푐 푗 + 푑푘) ⊗ (푐푖 − 푏 푗) ⊗ (푎 − 푏푖 − 푐 푗 − 푑푘) 3 PROBLEM FORMULATION = ((푎푐 + 푑푏)푖 + (−푎푏 + 푑푐)푗 − (푏2 + 푐2)푘) We define a KG as a tuple 퐾퐺 = (E, R, T +) where E denotes ⊗ (푎 − 푏푖 − 푐 푗 − 푑푘) the set of entities (vertices), R is the set of relations (edges), and 2 2 2 2 + + = (푐 ∗ (푎 − 푏 − 푐 − 푑 ) + 2푎푑푏)푖+ T ⊆ E × R × E is a set of all triples. A triple τ = (푒ℎ, 푟ℎ푡 , 푒푡 ) ∈ T (− ∗ ( 2 − 2 − 2 − 2) + ) + (− 2 − 2) indicates that, for the relation 푟ℎ푡 ∈ R, 푒ℎ is the head entity (origin 푏 푎 푏 푐 푑 2푎푑푐 푗 2푎푏 2푎푐 푘 of the relation) while 푒푡 is the tail entity. Since 퐾퐺 is a multigraph; The cosine of the angle between w and 푅푟 (푤) is given by 푒ℎ = 푒푡 may hold and |{푟푒ℎ,푒푡 }| ≥ 0 for any two entities. We 푒 푒 푤.푅 (푤) define the tuple (퐴 , 휏 ) = 휑 (푒) obtained from a context retrieval cos휃 = 푟 function 휑, that returns, for any given entity 푒, the set: 퐴푒 which ∥푤 ∥2 is a set of all attributes of the entity such as aliases, descriptions, = 푎2 − 푏2 − 푐2 − 푑2 = 2푎2 − 1 휏푒 ⊂ T + 푒 and type. is the set of all triples with head at . The 휃 = 2 arccos 푎 KB completion task (KBC) predicts the entity pairs ⟨푒푖, 푒푗 ⟩ in the Knowledge Graph that have a relation 푟푖 푗 ∈ R between them. If푏 = 푐 = 0, we take푤 = 푖 to get the same result. Thus, the rotational Similar to other researchers [48, 50], we view KBC as a ranking map 푎 + 푏푖 + 푐 푗 + 푑푘 encodes the axis and angle of rotation. □ task. In addition, we also aim to model KG contextual information to improve the ranking. 4.2 Hopf Fibration Two spaces are said to be homotopic if there exists a set of contin- uous deformations from one space to another [3]. For example, a 4 THEORETICAL FOUNDATIONS solid circular disc is homotopic to a point as it could be deformed 4.1 Quaternions and Rotations in 3D along the radial lines to a point. Quaternions [12] are a numerical representation of vectors in four Definition 4.2 ( Lifting). Given a map 휋 : 퐸 −→ 퐵 be- dimensions. It can be considered as an ordered tuple (푎,푏, 푐,푑) of tween two topological spaces 퐸 and 퐵 and a topological space 푋, numbers represented by 푎+푏푖+푐 푗+푑푘 where 푎 is the real component we say that 휋 has a homotopy lifting property with respect to 푋 and 푏, 푐,푑 are the imaginary components. The√ norm [32] of the if for a homotopy 푔 : 푋 × [0, 1] −→ 퐵, there exists a homotopy quaternion 푎 +푏푖 +푐 푗 +푑푘 can be represented by 푎2 + 푏2 + 푐2 + 푑2. 푓 : 푋 × [0, 1] −→ 퐸 such that 휋 ◦ 푓 = 푔. Definition 4.3 (Fibration). A fibration is a map 푓 : 퐸 −→ 퐵 between 4.3 Optimal Transport two topological spaces 퐸 and 퐵 such that the homotopy lifting The aim of optimal transport [27] is to minimize the cost of trans- property is satisfied for all spaces. port from one (probability) measure to another. The well known First introduced in 1931, Hopf Fibration is a mapping from 푆3 Wasserstein Distance [40] defines the optimal transport plan to to 푆2 [14, 24]. For the quaternion 푟 = 푎 + 푏푖 + 푐 푗 + 푑푘 this map is move an amount of matter from one location to another. We rely given by on the principle of optimal transport considering we aim to map entity representation from one plane to another hyperplane. 푀 (푟) = (푎2 + 푏2 − 푐2 − 푑2, 2(푎푑 + 푏푐), 2(푏푑 − 푎푐)) (1) Definition 4.4. Let 푝 ∈ [1, ∞) and c: 푅푛 × 푅푛 −→[0, ∞) be the This is equivalent to applying the rotational map 푅푟 to the point cost of transporting the measure 휇 to 휈, then the 푝푡ℎ Wasserstein (1,0,0). The Hopf map (푀 (푟)) assigns all 푟 in 푆3 to 푃 in 푆2, such that distance between the measures is given by 푅푟 would rotate (1, 0, 0) to 푃. We could also verify that multiplying 푖푡 1 푒 = 푐표푠(푡) +푖 ∗푠푖푛(푡) to this map doesn’t change the rotation. The ∫  푝 푝 inverse Hopf map gives the pre- of point 푃 by: 푊푝 (휇, 휈) = inf 푐(푥,푦) 휕훾 (4) 훾 ∈Π(휇,휈) 푅푛×푅푛 Lemma 4.1. For a point 푃 = (푝1, 푝2, 푝3) the inverse Hopf map is where Π is the transport plan between the measures 휇 and 휈. given by 푀−1 (푃) = 푟 ′푒푖푡 where 0 ≤ 푡 ≤ 2휋 and 푟 ′ can be given by one of the below equations: Theorem 4.5. The minimum of the euclidean distance between 2 1 the points of the Hopf fibers of two points in the 훿 vicinity in 푆 is ′ (( + ) + + ) 2 푟 = √︁ 1 푝1 푖 푝2 푗 푝3푘 (2) bounded from above by 훿 . 2(1 + 푝1) 2 OR Proof. Consider a point 푃 = (푝1, 푝2, 푝3) in 푆 and another point ′ ′ ′ ′ √ = ( ) 1 + 푝1 푝 푖 푝 푘 푃 푝1, 푝2, 푝3 such that 푟 ′ = ∗ (1 − 3 + 2 ) (3) + + ′ 2 1 푝1 1 푝1 ∥푃 − 푃 ∥ = 훿 Proof. We begin by noting that (from Theorem 4.1), From Lemma 4.1 we know that the the fiber of 푃 is given by ′ 푖푡 ′ 푖푡 ′ 푖푡 −1 푀 (푟 푒 ) = (푟 푒 ) ⊗ 푖 ⊗ (푟 푒 ) 1 ((1 + 푝 )푖 + 푝 푗 + 푝 푘) ⊗ 푒푖푡 ′ 푖푡 √︁ 1 2 3 Substituting for 푟 푒 from Equation 2 we get, 2(1 + 푝1) ′ 푖푡 1 ′ 푟 ⊗ 푒 = ((1 + 푝1)푖 + 푝2 푗 + 푝3푘)⊗ and the fiber of 푃 is given by √︁2(1 + 푝 ) 1 1 ′ ′ ′ ′ ((1 + 푝 )푖 + 푝 푗 + 푝 푘) ⊗ 푒푖푡 (cos 푡 + 푖 ∗ sin 푡)) √︃ ′ 1 2 3 2(1 + 푝 ) 1 1 = ∗ (sin 푡 ∗ (1 + 푝 )+ √︁ 1 ′ 2(1 + 푝1) , where 푡 and 푡 are the respective parameters. The square of the cos 푡 ∗ (1 + 푝1))푖 + (푝2 ∗ cos 푡 + 푝3 ∗ sin 푡) 푗+ euclidean distance (퐷) between any two points of the fibers is given by the 푙 norm as below (푝3 ∗ cos 푡 − 푝2 ∗ sin 푡)푘) 2 ′ ′ ′ 퐷 = (−푘 ∗ sin(푡)(1 + 푝 ) + 푘 푠푖푛(푡 )(푎 + 푝 ))2 Similarly we get the compliment 푟 ′푒푖푡 as, 1 1 ′ ′ ′ = (푘 ∗ cos(푡)(1 + 푝 ) − 푘 푠푖푛(푡 )(푎 + 푝 ))2 ′ 푖푡 1 1 1 푟 ⊗ 푒 = ∗ (sin 푡 ∗ (1 + 푝1) √︁ ( + ) ′ ′ ′ ′ ′ 2 2 1 푝1 = (푘 ∗ (cos(푡)푝2 + sin(푡)푝3) − 푘 (푐표푠(푡 )푝2 + 푠푖푛(푡 )푝3)) − cos 푡 ∗ (1 + 푝1))푖 − (푝2 ∗ cos 푡 + 푝3 ∗ sin 푡)푗+ ′ ′ ′ ′ ′ 2 = (푘 ∗ (cos(푡)푝3 − sin(푡)푝2) − 푘 (푐표푠(푡 )푝3 + 푠푖푛(푡 )푝2)) − (푝3 ∗ cos 푡 − 푝2 ∗ sin 푡)푘) ′ ⊗ 푖푡 2 The Hopf Map of the set of points on 푟 푒 is the point in 푆 that We then proceed by taking the derivative of 퐷 with respect to 푡 and ( , , ) ′ is obtained by rotation of 1 0 0 using any of these quaternions, 푡 , equating them with 0 for the optimal solution. Then, we have ′ 푖푡 ′ 푖 ′ 푀(푟 푒 ) = (푟 푒 푡) ⊗ (푖) ⊗ (푟 푒푖푡) 휕푓 휕푓 = 0, ′ = 0 = 푝1푖 + 푝2 푗 + 푝3푘 = 푃 휕푡 휕푡 ′ therefore, 푀−1 (푃) = 푟 ′푒푖푡 . (also applicable to Equation 3). □ Solving for 푡 and 푡 at 훿 ≈ 0, gives 훿푝 (푝 − 푝 ) ′ 훿 (푝 − 푝 ) A point on the 3D sphere is mapped to a circle by the Equation 푡 = 1 3 2 , 푡 = 3 2 ( + )( 2 + 2) ( + )( 2 + 2) 2. Furthermore, a set of points is mapped to a group of linked rings, 1 푝1 푝3 푝2 1 푝1 푝3 푝2 and a circle in 푆2 is mapped to a in 푆3 as can be seen in the Substituting these values in 퐷 we get 퐷 ≤ O(훿2). □ Figure 2. The property of linked is stated in Lemma 4.2. Lemma 4.2. Every pair of fiber formed by the Hopf Map are linked From Theorem 4.5 we can deduce that the minimum distance to each other forming a set of linked circles. between the fibers is a monotone increasing function of the distance between their corresponding points. Therefore: Figure 2: We show the Hopf fibers of points on 푆2 visualized [19] using the stereo-graphic projection from 푆3 to 푆2. The first row from the top illustrates (left to right): a point, two points, discrete points, arc, and circle in 3D space. The second row (left to right) illustrates the of corresponding fibers in 3D. The third row (left to right) shows the setof points on the grid circle of 푆2 and its corresponding top-view, side-view, and isometric views, respectively.

1 푖휃1 Corollary 4.1. For any set of entities, if there is a configuration operator in the complex plane 퐶 for the relation. Setting 퐴1 = 푒 , 2 satisfying the relations between them in 푆 - then there will also be a 2 1 푖휃2 2 1 푖휃3 − − 퐴1 = 퐴2 = 푒 and 퐴2 = 퐴3 = 푒 , where 휃2 휃1 = 휃3 휃2 = 휃, configuration on the fibers that satisfy the concerning relations. we can see that the set of triples (both positive and negative) can 3 Hence, from Corollary 4.1, the triples expressed in the lower be represented by the mapping in 푆 . □ dimension are the same as in the higher dimension after the map- ping. Now we want to show that the higher dimensional space can Theorem 4.6. The set of triples that can be represented by rotations 2 represent more triples. We state the corresponding Lemma: in 푆 is a strict subset of the set of triples that can be represented in 푆3 after the fibration. Lemma 4.3. The triple set represented by rotations in 푆2 is not equal 3 to the set of triples that can be represented in 푆 after the fibration. Proof. Let 휏2 be the set of triples that can be represented in 푆2 3 3 Proof. Let 휏2 be the set of triples that can be represented in 푆2 and 휏 be the set of triples that can be expressed in 푆 after the 2 3 and 휏3 be the set of triples that can be expressed in 푆3 after the fibration(mapping). We have to show that 휏 ⊂ 휏 . From 4.1 we 2 3 2 3 Fibration(mapping). We have to show that 휏2 ⊈ 휏3. For this, it is have 휏 ⊆ 휏 and from 4.3 we have 휏 ⊈ 휏 . Thus, we can deduce 2 3 enough to show that even a single configuration of triples in 휏3 that 휏 ⊂ 휏 and hence the proof. □ 2 is not in 휏 . Consider three entities 푒1, 푒2 and 푒3 with relation 푟1 existing between them such that (푒1, 푟1, 푒2), (푒2, 푟1, 푒3), (푒1, 푟, 푒3) ∈ 5 OUR APPROACH + − + − 휏 and (푒3, 푟1, 푒2), (푒2, 푟1, 푒1), (푒3, 푟1, 푒1) ∈ 휏 where 휏 and 휏 are 5.1 Modeling rotations using Quaternions the set of positive and negative triples respectively. Let 퐸 , 퐸 , 퐸 ∈ 1 2 3 As noted in the Theorem 4.1, the unit quaternion is isomorphic 푅3 be the representations of these entities in 푆2 and let 푅 ∈ 퐻 (퐻 1 to the 푆푈 (2) group which in turn is homomorphic to the 푆푂 (3) is Hamiltonian Space) be the relation quaternion. Thus we have, rotation group (also observed by NagE [48]). This allows us to use 푅 ⊗ 퐸 ⊗ 푅−1 = 퐸 ; 푅 ⊗ 퐸 ⊗ 푅−1 = 퐸 ; 푅 ⊗ 퐸 ⊗ 푅−1 = 퐸 1 1 1 2 1 2 1 3 1 1 1 3 the quaternion for rotations in the 3D Euclidean space as below: This implies that 퐸1 = 퐸2 = 퐸3 and the axis of rotation is along ′ these vectors. However, then we would also have: 푤 = 푞 ⊗ 푤 ⊗ 푞 (5) ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗ −1 푅1 퐸2 푅1 = 퐸1; 푅1 퐸3 푅1 = 퐸2; 푅1 퐸3 푅1 = 퐸1 2 where 푞 is the quaternion, 푞 is the inverse of 푞, 푤 is the vector which is a contradiction, so the relational operator in 푆 cannot ′ represent the triples’ set. In 푆3, we have each of these entities’ fibers in 3D space and 푤 is the rotated vector. In other words, 푞 is the dimension-wise relation quaternion, 푤 is the dimension-wise entity coinciding with each other from the above analysis. We could now ′ 1 2 1 2 1 select 퐴 , 퐴 ∈ 퐶, 퐴 , 퐴 ∈ 퐶 and 퐴 ∈ 퐶 as the attributes for 푒1, 푒2 representation in 3D space, and 푤 is the rotated entity representa- 1 1 2 2 3 푎−푏푖−푐 푗−푑푘 푐 (∈ ) = 푖휃 tion in the same space. If 푞 = 푎 +푏푖 +푐 푗 +푑푘 then 푞 = √ . and 푒3 respectively. We define 푅 퐶 푒 to be the rotational 푎2+푏2+푐2+푑2 Reusing 푆푂 (3) rotations from NagE in this step helps us inherit 5.4 Score function and Optimization Objective its geometric for the completion of our approach. In previous sections, we learned that lifting the dimensions of an entity would take us to the fiber (circle) in 푆3. For two entities we 5.2 Homotopy Lifting using Inverse Hopf Map would have two linked circles enriched with entity semantics. Now, In this step, we aim to achieve expressiveness in higher dimension we also need to find the distance between the attributes oftwo space, i.e., 4D hypersphere as our main contribution. The inverse entities lying on the linked circles. This could be done, for example, map with the equations described in Lemma 4.1 lifts the point (ro- by considering the corresponding attributes of the two entities or tated entity representation from Equation 5) in 푆2 to a set of points by taking a pairwise distance between the all combinations of the (i.e., fiber) lying in a higher-dimensional space. Once both entities attributes. That would be computationally expensive. Hence, we are mapped as individual fibers in the 4D hypersphere, it is possible propose another variant where we find an optimal matching be- to ground connectivity patterns in rotations as done by NagE and tween the attributes and then take the pairwise distance between RotatE in the lower dimensional spaces. Each point on the fiber has these matched attributes. Table 1 provides an holistic overview of the same geometric property for a given entity. We parametrize KGE details. To find an optimal matching between the set of entity the entities to fix the location of these sets of points on the fiber. attributes, we use the Wasserstein Distance using Equation 4. In Hence, we now have a bunch of points on the fiber for each entity, practice, we have used the Sinkhorn iterations for faster computa- which could represent the entity’s semantic attributes. Formally tion of the Wasserstein distances [7]. Let Π be the optimal matching we define a set of parameters 푡 for each entity corresponding to its between the attributes h푖 ∈ Eℎ and t푗 ∈ E푡 of an entity pair ℎ and 푡 semantic attribute 퐴푒 (cf., section 3). These parameters could be respectively. The score function is1: used to find the set of points on the fiber as below: ∑︁ 1  푓푟 (ℎ, 푟ℎ푡 , 푡) = ∥푀 (O(h푖, 푟ℎ푡 )) − 푀 (t푗 ) ∥ + ∥푀 (O(t푗 , 푟)) − 푀 (h푖 ) ∥ ′ 2 푒푖 = 푟 ⊗ (cos(푡) + sin(푡)푖) (6) [푖,푗 ]∈Π (7) ′ where 푒푖 is the entity quaternion per dimension and 푟 is obtained where h푖 , t푗 are the attributes (semantic properties) of the head from either Equations 2 or 3. From Corollary 4.1, we obtain that the and tail entities respectively, 푟ℎ푡 is the relation. O represents the equation 6, which takes the representation to a higher dimension, rotation operation on each dimension of the entity i.e. O(푒푑, 푟푑 ) = 3 does not cause a loss in expressivity. 푟푑 ⊗ 푒푑 ⊗ 푟푑 where 푟푑 ∈ 퐻 and 푒푑 ∈ 푅 . Similarly, 푀 represents the inverse Hopf Map to the fiber in 푆3, per dimension. In practice, for Claim 5.1. Translation methods, in the euclidean space, require at larger attribute sizes, computing the Wasserstein distance between least 푛 − 1 dimensions to model an n-clique relational pattern between a set of points could be computationally expensive. Therefore, we 푛 entities. define its variant to consider the minimum distance between the two sets’ attributes. The score function transforms to: A direct consequence of Claim 5.1 is that to model dense relations; 1  푓푟 (ℎ, 푟ℎ푡 , 푡) = inf ∥푀 (O(h푖, 푟ℎ푡 )) − 푀 (t푗 ) ∥ + ∥푀 (O(t푗 , 푟ℎ푡 )) − 푀 (h푖 ) ∥ we would need a dimension proportionate to the number of entities. h푖 ∈Eℎ 2 We propose an increase in the model expressivity by increasing the t푗 ∈E푡 entity attribute size while keeping the dimensions same. The entity (8) attribute size (i.e. number of heads) is | 퐴푒 |. While increasing the The loss function aims to minimize the score function for the pos- number of heads will make the model more expressive, it may also itive triples and maximize it for the negative triples. More specif- increasing the training latency. In ablation results, we aim to study ically, we use a self adverserial negative sampling loss similar to the relation between model expressivity and the attribute heads for RotatE [35] as below: ∑︁ a fixed time interval (cf., section 7.1). 퐿 = − log {휎(훾 + 푓푟 (ℎ, 푟ℎ푡 , 푡))} ℎ,푟ℎ푡 ,푡 ∈휏 ∑︁ 5.3 Inducing Semantics − ( − −)  (−( + ( − − −))) 푝 ℎ , 푟ℎ푡 , 푡 log 휎 훾 푓푟 ℎ , 푟ℎ푡 , 푡 − − − Once the point is mapped to the fiber using inverse Hopf Fibration, ℎ ,푟ℎ푡 ,푡 ∈휏 we propose to learn a set of points that would inform about the where 휏 is the set of positive triples and 휏− is the set of negative various semantics/attributes of the entity. One possible way for the triples, 훾 is a margin hyperparamter and 푝 is the probability given operation would be to have the points spread out on the circle (fiber) by the model of the negative sample. based on the semantic attributes of the entity. Taking inspiration from [11, 53] we use the textual literals widely available for an entity 6 EXPERIMENT SETUP in the standard KGs namely: entity description, Instance of (type of entity such as human), Alias (alternative names) and Surface forms Datasets: We executed experiments on the widely used benchmark (label of the entity). Specifically we use the CBOW model26 [ ] to datasets of WN18, WN18RR, FB15k237 and YAGO3-10 [35, 37]. The aggregate the embeddings of the tokens in each semantic attribute. WN18 dataset is extracted from Wordnet which contains lexical These embeddings could be initialized from a standard model such relations between English words. WN18RR removes the inverse as Glove [31]. The parameter 푡 in Equation 6 is learned from the relations in the WN18 dataset. FB15k237 was created from the semantics. We also tried with other methods such as initializing Freebase knowledge base and does not contain inverse relation. the parameters with the semantic embeddings and using an MLP The YAGO3-10 dataset is obtained from Wikipedia, Wikidata, and to learn the final representation after concatenating the structural Geonames. Table 3 summarizes detailed statistics of all datasets. 1 and semantic embeddings, which does not perform better. Eℎ and E푡 are the multi set corresponding to 푒ℎ and 푒푡 in higher dimensions Model Scoring function Parameters O푡푖푚푒 푘 TransE [5] ∥퐸ℎ + 푊푟 − 퐸푡 ∥ 퐸ℎ,푊푟 , 퐸푡 ∈ 푅 O(푘) 푘 DistMult [47] ⟨푊푟 , 퐸ℎ, 퐸푡 ⟩ 퐸ℎ,푊푟 , 퐸푡 ∈ 푅 O(푘) 푘 ComplEx [38] 푅푒 (⟨푊푟 , 퐸ℎ, 퐸푡 ⟩) 퐸ℎ,푊푟 , 퐸푡 ∈ 퐶 O(푘) 푘 RotatE [35] ∥퐸ℎ ⊙ 푊 푟 − 퐸푡 ∥ 퐸ℎ,푊푟 , 퐸푡 ∈ 퐶 , | 푊푟 |= 1 O(푘) 푘 QuatE [50] 퐸ℎ ⊗ 푊푟 · 퐸푡 퐸ℎ,푊푟 , 퐸푡 ∈ 퐻 O(푘) ℎ 푡 3 푘 푘 ℎ 푡 푘 HopfE (ours) ∥푀 (푊푟 ⊗ 퐸ℎ ⊗ 푊푟 , 퐴 ) − 푀 (퐸푡 , 퐴 ) ∥ 퐸ℎ, 퐸푡 ∈ (푅 ) ,푊푟 ∈ 퐻 , 퐴 , 퐴 ∈ 퐶 O(푘)

Table 1: KGE Overview. 푊푟 is the relation representation, 퐸ℎ, 퐸푡 are the entity representations, 푀 is the inverse Hopf map.

Model WN18 WN18RR MR MRR Hits@1 Hits@3 Hits@10 MR MRR Hits@1 Hits@3 Hits@10 TransE [5] - 0.495 0.113 0.888 0.943 3384 0.226 - - 0.501 DistMult [47] 655 0.797 - - 0.946 5110 0.43 0.39 0.44 0.49 ComplEx [38] - 0.941 0.936 0.945 0.947 5261 0.44 0.41 0.46 0.51 RotatE [35] 309 0.949 0.944 0.952 0.959 3340 0.476 0.428 0.492 0.571 NagE [48] - 0.950 0.944 0.953 0.960 - 0.477 0.432 0.493 0.574 QuatE [50] 388 0.949 0.941 0.954 0.960 3472 0.481 0.436 0.500 0.564 HopfE 245 0.949 0.938 0.954 0.960 2885 0.472 0.413 0.500 0.586 HopfE + Semantics 99 0.945 0.934 0.951 0.961 2884 0.482 0.433 0.500 0.579 Table 2: Evaluation metrics on the WN18 and WN18RR datasets. Best results are in bold and second best is underlined.

Dataset #Entities #Relations % N to % N to % 1 to 1 % 1 to however, our hyper parameters are restricted due to hardware lim- N 1 N itations. We inherit results and experiment settings from [48, 50] WN18 40943 36 0.00 59.65 40.22 0.12 for WN18, WN18RR, and FB15K237. We ignore alternative settings WN18RR 40943 22 0.03 74.59 25.33 0.04 and hyper parameters provided by [33, 36]. On the Yago dataset, re- FB15K237 14541 474 4.93 91.26 3.63 0.17 sults are taken from RotatE. To keep the experiment settings same, YAGO310 123182 74 57.12 41.10 1.65 0.12 the configuration of QuatE with N3 regularization and reciprocal Table 3: Dataset Stats including fraction of relation types. learning has been omitted from our comparison considering these are optimizations extraneous to the model and we do not use them. Evaluation Metrics: Following the widely adapted metrics, we report the Mean Rank (MR), Mean Reciprocal Rank (MRR) and the 7 RESULTS cut-off hit ratio (hits@N (N=1,3,10)) metrics on all the datasets. MR reports the average rank in all the correct entities. MRR is the Our evaluation results are shown in the Table 2 and Table 4. On average of the inverse rank of the correct entities. H@N evaluates the WN18 dataset, the majority relation patterns comprise sym- the ratio of correct entities predictions at a top N predicted results. metry/antisymmetry and inversion. Our model which combines Implementation: The Adam optimizer [17] is used for learning both properties (geometric interpretability and 4D expressiveness) with a learning rate of 0.1 with an exponentially decaying learning achieves comparable results to the baselines. On the WN18RR schedule with a decay rate of 0.1. The parameters are initialized us- dataset, the main relation pattern is the symmetry pattern, and the ing the He initialization [13]. We perform a hyperparameter search translation method (TransE) shows a limited performance because on the embedding dimension 푑 ∈ {50, 100, 200, 500}, batch size 푏 ∈ of its inability to infer symmetric patterns. QuatE reports slightly {256, 512, 1024}, negative samples 푛푛푒푔 ∈ {16, 32, 62, 128, 256, 512}, higher results than NagE, and our model achieves the best results on negative adversarial sampling temperature 훼 ∈ {0.5, 1.0} and the all metrics (except Hits@1). Similar to WN18RR, FB15K237 dataset margin paramter훾 ∈ {6, 9, 12, 24}. We release code in public Github2. does not contain inverse relation, and the majority relation pattern We consider two variants of our approach: 1) HopfE, that does not is composition. We see that NagE performs better than QuatE on contain any semantic properties 2) HopfE+Semantics, which has the FB15K237 dataset due to its ability to infer non-abelian hyper entity attributes encoded on the fibers (cf., section 5.3). relations. Except for MR, our model achieves better results than Baselines: We compare with the state-of-the-art link prediction previous baselines illustrating the ability of HopfE in handling com- methods mainly employing translational/semantic matching tech- position relations. On the YAGO3-10 dataset, triples mostly contain niques since our approach is built upon them. Specifically we com- a human’s descriptive attributes, such as marital status, profession, pare with TransE [5], Distmult [47], ComplEx [38], RotatE [35], citizenship, and gender. Our model (HopfE) clearly outperforms NagE [48], and QuatE [50]. We extend 3D rotations similar to NagE, the previous baseline (RotatE) on all metrics. Across all datasets, we observe that adding semantics does not necessarily result in a 2https://github.com/ansonb/HopfE gain in the performance and the behavior varies as per the dataset. Model FB15K237 YAGO3-10 MR MRR Hits@1 Hits@3 Hits@10 MR MRR Hits@1 Hits@3 Hits@10 TransE [5] 357 0.294 - - 0.465 ----- DistMult [47] 254 0.241 0.155 0.263 0.419 5926 0.340 0.240 0.380 0.540 ComplEx [38] 339 0.247 0.158 0.275 0.428 6351 0.360 0.260 0.400 0.550 RotatE [35] 177 0.338 0.241 0.375 0.533 1767 0.495 0.402 0.550 0.670 NagE [48] - 0.340 0.244 0.378 0.530 ----- QuatE [50] 176 0.311 0.221 0.342 0.495 ----- HopfE 212 0.343 0.247 0.379 0.534 1077 0.529 0.438 0.586 0.695 HopfE + Semantics 199 0.330 0.230 0.373 0.528 1260 0.518 0.429 0.570 0.678 Table 4: Evaluation metrics on the FB15K237 and YAGO3-10 datasets. Best results are in bold and second best is underlined.

Model MR MRR Hits@1 Hits@10 gain as observed in alternative approaches such as [10, 18]. Effect of Hopf Fibration: Our rationale here is to understand HopfE + Label 1260 0.518 0.429 0.678 empirical advantage of inverse Hopf Fibration. Hence, we created HopfE + Alias 1281 0.525 0.44 0.684 a variant of HopfE named "HopfE(w/o Hopf)" that only contains HopfE + Instance 1215 0.518 0.428 0.68 rotation in 3D sphere (section 5.1) and does not include mapping HopfE + Desc 1264 0.521 0.436 0.677 from 3D to 4D using Hopf Fibration compromising higher dimen- HopfE + Numeric 2261 0.441 0.357 0.598 sion expressiveness. For the same, we report results on individual Table 5: Ablation study of inducing individual literals on the relational patterns. The results are in Table 7. We observe that YAGO3-10 dataset. Best results are in bold. the performance of HopfE is higher compared to the "HopfE(w/o Hopf)" setting for three types of relations. On WN18, the fraction of 1-to-N links is 0.04 percent. Hence, HopfE faces challenges in learning the corresponding mapping into higher dimensions. Over- Relation RotatE HopfE all, our experiment justifies that increasing dimensional expres- derivationally_related_form 0.947 0.956 sivity does not compromise the overall average empirical results. also_see 0.585 0.626 Geometric interpretation: Having established the importance of verb_group 0.943 0.943 Hopf Fibration in the previous experiment, we now study if the similar_to 1 1 geometric interpretability of rotations in 푆2 is maintained after train- hypernym 0.148 0.161 ing with the Fibrations on WN18 dataset. We analyze the inverse instance_hypernym 0.318 0.347 relations ℎ푎푠_푝푎푟푡 and 푝푎푟푡_표푓 and the compositional relations member_meronym 0.232 0.257 푑푒푟푖푣푎푡푖표푛푎푙푙푦_푟푒푙푎푡푒푑_푓 표푟푚 (r1 in Figure 3), ℎ푦푝푒푟푛푦푚 (r2 in Fig- synset_domain_topic_of 0.341 0.405 ure 3), and 푠푦푛푠푒푡_푑표푚푎푖푛_푡표푝푖푐_표푓 (r3 in Figure 3). As illustrated has_part 0.184 0.199 in Figure 3, the histogram of the sum of the rotation angles of the member_of_domain_usage 0.318 0.34 inverse relations are concentrated at 0푐 and for the compositional member_of_domain_region 0.2 0.392 relations the difference between the angles is concentrated around 푐 Table 6: MRR of individual relations of the WN18RR dataset. 0 as expected, maintaining geometric interpretability. Relation wise performance: Previous ablation study established 7.1 Ablation Studies that the geometric interpretability remains intact for HopfE. Now, we study expressiveness in a higher dimension (in addition to Ta- Effect of Adding Semantics: To understand the impact of adding ble 2 and Table 4). We perform a relation-wise performance on textual literals, we conduct an ablation on the attribute informa- the WN18RR dataset to understand if HopfE can maintain expres- tion induced in the model. We add each entity attribute to study siveness compared to rotational methods which are in the lower their individual impact. The results for the YAGO3-10 dataset are dimensions. We borrow experimental settings from RotatE [35]. As reported in Table 5 because this dataset provides both textual and seen in Table 6, in comparison to the RotatE baseline (considering numeric literals. We observe that different semantic information NagE skips this analysis), the HopfE model can perform better on corresponds to variations in the results. Also, we note that adding these relations. The observed behavior implies that when we aim some semantic attributes causes a drop in performance similar to a for geometric interpretation using Hopf Fibration, the performance decline in performance in Table 4 when semantics are collectively does not decrease for any relation. added. Furthermore, encoding of numeric literals is a limitation of Studying the effect of the number of heads: In a real-world sce- our approach. This is possibly due to the implementation choice we nario, all KG entities will not have the same number of properties, made for encoding literals using character and word embeddings. and few of them may require a large number of attributes to model Hence, there are two open questions for future research: 1) how the KG ontology. In this experiment, we aim to understand how a model can choose semantic information dynamically 2) how to does adding more entity attributes for modeling dense relations incorporate numeric literals in this setting for higher performance Model N to N N to 1 1 to N 1 to 1 MRR Hits@1 MRR Hits@1 MRR Hits@1 MRR Hits@1 HopfE(w/o Hopf) 0.917 0.889 0.187 0.112 0.259 0.175 0.865 0.762 HopfE 0.939 0.931 0.195 0.123 0.206 0.136 0.951 0.929 HopfE + Semantics 0.941 0.932 0.193 0.124 0.253 0.167 0.929 0.881 Table 7: Comparison of HopfE with a setting (HopfE(w/o Hopf)) where we omit Hopf Mapping on WN18 dataset. Inducing Hopf Fibration for dimensional expressivity shows a clear empirical edge in majority relation types. Best values in bold.

using a synthetic dataset because this dataset allows us to have varying number of attribute of the entities. Specifically, we gen- erate an Erdős–Rényi graph [8] with an average degree varying from 10 to 100 in the increments of 10 steps. We use 1, 5, and 10 heads per entity per fiber(dimension). Also, note that since these are random graphs, we evaluate the performance on the train set as the performance on the test set is expected to be low in this setting. The dimensions(=10) and other common parameters are (a) 휙ℎ푎푠_푝푎푟푡 + 휙푝푎푟푡_표푓 (b) ∥푟ℎ푎푠_푝푎푟푡 ∥ ∗ ∥푟ℎ푎푠_푝푎푟푡 ∥ kept the same in all the models. The models are not all trained to convergence but for a fixed number of iterations. The Figure 4a illustrates that increasing the degree of nodes in the data in gen- eral results in a drop in the results, which is as expected. Another observation is that the setting with five heads can perform better than the one with ten heads at higher degrees. The observation possibly indicates that the latter model may require more training to fit better. A similar observation could be gleaned from Figure 4b where we study the effect of increasing the number of heads from (c) 휙 (푟1 ⊗ 푟2) − 휙 (푟1) (d) ∥푟푝푎푟푡_표푓 ∥ 1 to 10 for a graph with an average degree per node as 100. We see that the metric improves from 1 to 5 heads and then begins to drop. We conclude that number of heads along with the training steps is a hyperparameter to be tuned for better results.

8 CONCLUSION Our central research question was to study if a KGE approach can (e) ∥푟 ∥ (f) 휙 (푟 ⊗ 푟 ) − 휙 (푟 ) ℎ푎푠_푝푎푟푡 1 2 3 achieve geometric interpretability along with the expressiveness of four-dimensional space. The proposed theoretical foundations and Figure 3: Analysis of the geometric properties of the inverse associated empirical results of HopfE show that it is possible to and compositional relations. maintain the geometric interpretability and simultaneously increase the model’s expressiveness in four-dimensional space. Furthermore, HopfE enables encoding of the entity attributes to a fibration in a higher dimension for enriching the embeddings with the semantic information prevalent in real-world KGs. Our ablation studies sys- tematically support several choices we made in the paper’s scope, such as the use of Hopf Fibration, the effect of a number of heads, and the individual impact of entity semantic properties. Besides providing empirical evidences complementing the theoretical foun- dations (Sections 4 and 5), our results open up important research questions for future research: 1) In four dimensions space, how to utilize the semantic attributes (an essential aspect of KGE [30]) more (a) (b) optimally to positively impacting the link prediction performance and induce semantic class hierarchies similar to [51]. 2) How to Figure 4: Effect of attribute heads on the performance. utilize logic on the attributes (or description logic) for fine-grained impacts performance (cf., Claim 5.1)? Hence, we study the effect reasoning and logical interpretability of a KGE model in 4D? We of increasing the number of heads on the prediction performance point readers to the open research directions as viable next steps. REFERENCES on Research and Development in Information Retrieval. 105–114. [1] Carl Allen, I Balazevic, and Timothy Hospedales. 2021. Interpreting [24] David W Lyons. 2003. An elementary introduction to the Hopf fibra- knowledge graph relation representation from word embeddings. In tion. Mathematics magazine 76, 2 (2003), 87–98. International Conference on Learning Representations. [25] Jerrold E Marsden and Tudor S Ratiu. 1995. Introduction to mechanics [2] Anson Bastos, Abhishek Nadgeri, Kuldeep Singh, Isaiah Onando Mu- and symmetry. Physics Today 48, 12 (1995), 65. lang, Saeedeh Shekarpour, Johannes Hoffart, and Manohar Kaul. 2021. [26] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Ef- RECON: Relation Extraction using Knowledge Graph Context in a ficient Estimation of Word Representations in Vector Space. arXiv Graph Neural Network. In Proceedings of The Web Conference (WWW). preprint arXiv:1301.3781 (2013). [3] Hans J Baues and Hans Joachim Baues. 1989. Algebraic homotopy. [27] Gaspard Monge. 1781. Memoir on the theory of cuttings and embank- Vol. 15. Cambridge university press. ments. Histoire de l’Acad ’e mie Royale des Sciences de Paris (1781). [4] Kurt D. Bollacker, Robert P. Cook, and Patrick Tufts. 2007. Freebase: A [28] Rémy Mosseri and Rossen Dandoloff. 2001. of entangled Shared Database of Structured General Human Knowledge. In AAAI. states, Bloch and Hopf fibrations. Journal of Physics A: Mathe- [5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, matical and General 34, 47 (2001), 10243. and Oksana Yakhnenko. 2013. Translating embeddings for modeling [29] Matteo PALMONARI and Pasquale MINERVINI. 2020. Knowledge multi-relational data. In NeurlPS. 1–9. graph embeddings and explainable AI. Knowledge Graphs for Explain- [6] Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, able Artificial Intelligence: Foundations, Applications and Challenges 47 and Christopher Ré. 2020. Low-Dimensional Hyperbolic Knowledge (2020), 49. Graph Embeddings. In Proceedings of the 58th Annual Meeting of the [30] Heiko Paulheim. 2018. Make embeddings semantic again!. In Interna- Association for Computational Linguistics. 6901–6914. tional Semantic Web Conference (P&D/Industry/BlueSky). [7] Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of [31] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. optimal transport. Advances in neural information processing systems Glove: Global Vectors for Word Representation. In Proceedings of the 26 (2013), 2292–2300. 2014 Conference on Empirical Methods in Natural Language Processing, [8] Paul Erdös and Alfréd Rényi. 2011. On the evolution of random graphs. EMNLP 2014, A meeting of SIGDAT, a Special Interest Group of the ACL. In The structure and dynamics of networks. Princeton University Press, ACL, 1532–1543. 38–82. [32] Charles Chapman Pugh and CC Pugh. 2002. Real mathematical analysis. [9] Chang Gao, Chengjie Sun, Lili Shan, Lei Lin, and Mingjiang Wang. 2020. Vol. 2011. Springer. Rotate3D: Representing Relations as Rotations in Three-Dimensional [33] Daniel Ruffinelli, Samuel Broscheit, and Rainer Gemulla. 2019. Youcan Space for Knowledge Graph Embedding. In Proceedings of the 29th ACM teach an old dog new tricks! on training knowledge graph embeddings. International Conference on Information & Knowledge Management. In International Conference on Learning Representations. 385–394. [10] Alberto Garcia-Duran and Mathias Niepert. 2017. Kblrn: End-to-end [34] Ahmad Sakor, Kuldeep Singh, Anery Patel, and Maria-Esther Vidal. learning of knowledge base representations with latent, relational, and 2020. Falcon 2.0: An entity and relation linking tool over wikidata. In numerical features. arXiv preprint arXiv:1709.04676 (2017). Proceedings of the 29th ACM International Conference on Information & [11] Genet Asefa Gesese, Russa Biswas, Mehwish Alam, and Harald Sack. Knowledge Management. 3141–3148. 2019. A survey on knowledge graph embeddings with literals: Which [35] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2018. Ro- model links better literal-ly? Semantic Web Preprint (2019), 1–31. tatE: Knowledge Graph Embedding by Relational Rotation in Complex [12] William Rowan Hamilton. 1844. LXXVIII. On quaternions; or on a new Space. In International Conference on Learning Representations. system of imaginaries in Algebra: To the editors of the Philosophical [36] Zhiqing Sun, Shikhar Vashishth, Soumya Sanyal, Partha Talukdar, and Magazine and Journal. The London, Edinburgh, and Dublin Philosophical Yiming Yang. 2020. A Re-evaluation of Knowledge Graph Completion Magazine and Journal of Science 25, 169 (1844), 489–495. Methods. In Proceedings of the 58th Annual Meeting of the Association [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delv- for Computational Linguistics. 5516–5522. ing deep into rectifiers: Surpassing human-level performance on ima- [37] Kristina Toutanova and Danqi Chen. 2015. Observed versus latent genet classification. In Proceedings of the IEEE international conference features for knowledge base and text inference. In Proceedings of the 3rd on computer vision. 1026–1034. workshop on continuous vector space models and their compositionality. [14] H Hopf. 1931. About the mapping of the three-dimensional sphere 57–66. [38] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and onto the spherical surface. Math. Ann. 104 (1931), 637–665. Guillaume Bouchard. 2016. Complex embeddings for simple link [15] Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge prediction. In International Conference on Machine Learning. PMLR, graph completion with adaptive sparse transfer matrix. In Proceedings 2071–2080. of the AAAI Conference on Artificial Intelligence, Vol. 30. [39] Leandra Vicci. 2001. Quaternions and rotations in 3-space: The algebra [16] Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and Philip S and its geometric interpretation. Chapel Hill, NC, USA, Tech. Rep Yu. 2021. A survey on knowledge graphs: Representation, acquisition (2001). and applications. EEE Transactions on Neural Networks and Learning [40] Cédric Villani. 2008. Optimal transport: old and new. Vol. 338. Springer Systems (2021). Science & Business Media. [17] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Sto- [41] Denny Vrandecic. 2012. Wikidata: a new platform for collaborative chastic Optimization. In ICLR. data collection. In Proceedings of the 21st World Wide Web Conference, [18] Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens WWW 2012 (Companion Volume). 1063–1064. Lehmann, and Asja Fischer. 2019. Incorporating literals into knowledge [42] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embeddings. In International Semantic Web Conference. Springer, graph embedding: A survey of approaches and applications. IEEE 347–363. Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724– [19] Samuel J. Li. NA. Hopf Fibration Visualization. https://samuelj.li/hopf- 2743. fibration/ [43] Yanjie Wang, Rainer Gemulla, and Hui Li. 2018. On multi-relational link [20] Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2016. Knowledge repre- prediction with bilinear models. In Proceedings of the AAAI Conference sentation learning with entities, attributes and relations. ethnicity 1 on Artificial Intelligence, Vol. 32. (2016), 41–52. [44] Zhigang Wang, Juanzi Li, Zhiyuan Liu, and Jie Tang. 2016. Text- [21] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. enhanced representation learning for knowledge graph. In Proceedings Learning entity and relation embeddings for knowledge graph com- of International Joint Conference on Artificial Intelligent (IJCAI). 4–17. pletion. In Proceedings of the AAAI Conference on Artificial Intelligence, [45] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Vol. 29. Knowledge graph embedding by translating on hyperplanes. In Pro- [22] Haonan Lu and Hailin Hu. 2020. DensE: An Enhanced Non-Abelian ceedings of the AAAI Conference on Artificial Intelligence, Vol. 28. Group Representation for Knowledge Graph Embedding. arXiv [46] Hassler Whitney. 1940. On the theory of sphere-bundles. Proceedings preprint arXiv:2008.04548 (2020). of the National Academy of Sciences of the United States of America 26, [23] Xiaolu Lu, Soumajit Pramanik, Rishiraj Saha Roy, Abdalghani Abuja- 2 (1940), 148. bal, Yafang Wang, and Gerhard Weikum. 2019. Answering complex [47] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. questions by joining multi-document evidence with quasi knowledge 2015. Embedding Entities and Relations for Learning and Inference in graphs. In Proceedings of the 42nd International ACM SIGIR Conference Knowledge Bases. In 3rd International Conference on Learning Repre- [51] Zhanqiu Zhang, Jianyu Cai, Yongdong Zhang, and Jie Wang. 2020. sentations, ICLR. Learning hierarchy-aware knowledge graph embeddings for link pre- [48] Tong Yang, Long Sha, and Pengyu Hong. 2020. NagE: Non-abelian diction. In Proceedings of the AAAI Conference on Artificial Intelligence, group embedding for knowledge graphs. In Proceedings of the 29th ACM Vol. 34. 3065–3072. International Conference on Information & Knowledge Management. [52] Feng Zhao, Tao Xu, Langjunqing Jin, and Hai Jin. 2020. Convolutional 1735–1742. Network Embedding of Text-enhanced Representation for Knowledge [49] Aston Zhang, Yi Tay, SHUAI Zhang, Alvin Chan, Anh Tuan Luu, Siu Graph Completion. IEEE Internet of Things Journal (2020). Hui, and Jie Fu. 2020. Beyond Fully-Connected Layers with Quater- [53] Yu Zhao, Ruobing Xie, Kang Liu, WANGXiaojie, et al. 2020. Connecting nions: Parameterization of Hypercomplex Multiplications with 1/푛 Embeddings for Knowledge Graph Entity Typing. In Proceedings of the Parameters. In International Conference on Learning Representations. 58th Annual Meeting of the Association for Computational Linguistics. [50] Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. 2019. Quaternion Knowledge 6419–6428. Graph Embeddings. In Advances in Neural Information Processing Sys- [54] Xuanyu Zhu, Yi Xu, Hongteng Xu, and Changjian Chen. 2018. Quater- tems 32: Annual Conference on Neural Information Processing Systems nion convolutional neural networks. In Proceedings of the European 2019, NeurIPS 2019. 2731–2741. Conference on Computer Vision (ECCV). 631–647.