HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations
Anson Bastos Kuldeep Singh Abhishek Nadgeri [email protected] [email protected] [email protected] IIT, Hyderabad Cerence GmbH and Zerotha Research RWTH Aachen and Zerotha Research India Germany Germany Saeedeh Shekarpour Isaiah Onando Mulang’ Johannes Hoffart [email protected] [email protected] [email protected] University of Dayton IBM Research and Zerotha Research Goldman Sachs USA Kenya Germany ABSTRACT 1 INTRODUCTION Recently, several Knowledge Graph Embedding (KGE) approaches Publicly available Knowledge Graphs (KGs) (e.g., Freebase [4] and have been devised to represent entities and relations in a dense Wikidata [41]) expose interlinked representation of structured knowl- vector space and employed in downstream tasks such as link predic- edge for several information extraction tasks such as entity linking, tion. A few KGE techniques address interpretability, i.e., mapping relation extraction, and question answering [2, 23, 34]. However, the connectivity patterns of the relations (symmetric/asymmetric, publishing ever-growing KGs is often confronted with incomplete- inverse, and composition) to a geometric interpretation such as ness, inconsistency, and inaccuracy [15]. Knowledge Graph Em- rotation. Other approaches model the representations in higher beddings (KGE) methods have been proposed to overcome these dimensional space such as four-dimensional space (4D) to enhance challenges by learning entity and relation representations (i.e., em- the ability to infer the connectivity patterns (i.e., expressiveness). beddings) in vector spaces by capturing latent structures in the data However, modeling relation and entity in a 4D space often comes at [45]. A typical KGE approach follows a two-step process [11, 42]: the cost of interpretability. This paper proposes HopfE, a novel KGE 1) determining the form of entity and relation representations, 2) approach aiming to achieve interpretability of inferred relations in use a transformation function to learn entity and relation repre- the four-dimensional space. We first model the structural embed- sentations in a vector space for calculating the plausibility of KG dings in 3D Euclidean space and view the relation operator as an triples via a scoring function. Intuitively, KGE approaches consider 푆푂 (3) rotation. Next, we map the entity embedding vector from a set of original "connectivity patterns" in the KGs and learn rep- a 3D Euclidean space to a 4D hypersphere using the inverse Hopf resentations that are approximately able to reason over the given Fibration, in which we embed the semantic information from the patterns (expressiveness property of a KGE model [43]). For example, KG ontology. Thus, HopfE considers the structural and semantic in a connectivity pattern such as symmetric relations (e.g. sibling), properties of the entities without losing expressivity and inter- anti-symmetric (e.g., father), inverse relations (e.g., husband and pretability. Our empirical results on four well-known benchmarks wife), or compositional relations (e.g., mother’s brother is uncle); achieve state-of-the-art performance for the KG completion task. expressiveness means to hold these patterns in a vector space. Limitations of state-of-the-art KGE Models: KGs are openly KEYWORDS available, and in addition they are largely explainable because of Representation learning, Embedding, Knowledge Graph the explicit semantics and structure (e.g., direct and labeled relation- ships between neighboring entities). In contrast, KG embeddings ACM Reference Format: are often referred to as sub-symbolic representation since they illus- Anson Bastos, Kuldeep Singh, Abhishek Nadgeri, Saeedeh Shekarpour, Isa- trate elements in the vector space, losing the original interpretability
arXiv:2108.05774v1 [cs.IR] 12 Aug 2021 iah Onando Mulang’, and Johannes Hoffart. 2021. HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations. In Proceedings of the deduced from logic or geometric interpretations [29]. Similar issue 30th ACM International Conference on Information and Knowledge Manage- has been observed in a few initial KGE methods [5, 21, 45] that ment (CIKM ’21), Nobember 1–5, 2021, Virtual Event, Australia. ACM, New lack interpretability, i.e., these models fail to link the connectivity York, NY, USA, 11 pages. https://doi.org/10.1145/XXXXXXXXXX patterns to a geometric interpretation such as rotation [35]. For example, in inverse relations, the angle of rotation is negative; in Permission to make digital or hard copies of all or part of this work for + = personal or classroom use is granted without fee provided that copies are not composition, the rotation angle is an addition (푟1 푟2 푟3), etc. made or distributed for profit or commercial advantage and that copies bear Other KGE approaches such as RotatE [35] and NagE [48] provide this notice and the full citation on the first page. Copyrights for components interpretability by introducing rotations in two-dimensional (2D) of this work owned by others than ACM must be honored. Abstracting with and three-dimensional (3D) spaces respectively, to preserve the credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request connectivity patterns. Conversely, QuatE [50] imposes a transfor- permissions from [email protected]. mation on the entity representations by quaternion multiplication CIKM ’21, November 1–5, 2021, Virtual Event, Australia with the relation representation. QuatE aims for higher dimensional © 2021 Association for Computing Machinery. ACM ISBN XXXXXXX-XX/20/10...$15.00 expressiveness by performing entity and relational transformation https://doi.org/10.1145/XXXXXXXXXX (a) NagE (b) QuatE (c) HopfE step 1: Rotation (d) HopfE step 2: Fibration
Figure 1: For a given triple
1 푖휃1 Corollary 4.1. For any set of entities, if there is a configuration operator in the complex plane 퐶 for the relation. Setting 퐴1 = 푒 , 2 satisfying the relations between them in 푆 - then there will also be a 2 1 푖휃2 2 1 푖휃3 − − 퐴1 = 퐴2 = 푒 and 퐴2 = 퐴3 = 푒 , where 휃2 휃1 = 휃3 휃2 = 휃, configuration on the fibers that satisfy the concerning relations. we can see that the set of triples (both positive and negative) can 3 Hence, from Corollary 4.1, the triples expressed in the lower be represented by the mapping in 푆 . □ dimension are the same as in the higher dimension after the map- ping. Now we want to show that the higher dimensional space can Theorem 4.6. The set of triples that can be represented by rotations 2 represent more triples. We state the corresponding Lemma: in 푆 is a strict subset of the set of triples that can be represented in 푆3 after the fibration. Lemma 4.3. The triple set represented by rotations in 푆2 is not equal 3 to the set of triples that can be represented in 푆 after the fibration. Proof. Let 휏2 be the set of triples that can be represented in 푆2 3 3 Proof. Let 휏2 be the set of triples that can be represented in 푆2 and 휏 be the set of triples that can be expressed in 푆 after the 2 3 and 휏3 be the set of triples that can be expressed in 푆3 after the fibration(mapping). We have to show that 휏 ⊂ 휏 . From 4.1 we 2 3 2 3 Fibration(mapping). We have to show that 휏2 ⊈ 휏3. For this, it is have 휏 ⊆ 휏 and from 4.3 we have 휏 ⊈ 휏 . Thus, we can deduce 2 3 enough to show that even a single configuration of triples in 휏3 that 휏 ⊂ 휏 and hence the proof. □ 2 is not in 휏 . Consider three entities 푒1, 푒2 and 푒3 with relation 푟1 existing between them such that (푒1, 푟1, 푒2), (푒2, 푟1, 푒3), (푒1, 푟, 푒3) ∈ 5 OUR APPROACH + − + − 휏 and (푒3, 푟1, 푒2), (푒2, 푟1, 푒1), (푒3, 푟1, 푒1) ∈ 휏 where 휏 and 휏 are 5.1 Modeling rotations using Quaternions the set of positive and negative triples respectively. Let 퐸 , 퐸 , 퐸 ∈ 1 2 3 As noted in the Theorem 4.1, the unit quaternion is isomorphic 푅3 be the representations of these entities in 푆2 and let 푅 ∈ 퐻 (퐻 1 to the 푆푈 (2) group which in turn is homomorphic to the 푆푂 (3) is Hamiltonian Space) be the relation quaternion. Thus we have, rotation group (also observed by NagE [48]). This allows us to use 푅 ⊗ 퐸 ⊗ 푅−1 = 퐸 ; 푅 ⊗ 퐸 ⊗ 푅−1 = 퐸 ; 푅 ⊗ 퐸 ⊗ 푅−1 = 퐸 1 1 1 2 1 2 1 3 1 1 1 3 the quaternion for rotations in the 3D Euclidean space as below: This implies that 퐸1 = 퐸2 = 퐸3 and the axis of rotation is along ′ these vectors. However, then we would also have: 푤 = 푞 ⊗ 푤 ⊗ 푞 (5) ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗ −1 푅1 퐸2 푅1 = 퐸1; 푅1 퐸3 푅1 = 퐸2; 푅1 퐸3 푅1 = 퐸1 2 where 푞 is the quaternion, 푞 is the inverse of 푞, 푤 is the vector which is a contradiction, so the relational operator in 푆 cannot ′ represent the triples’ set. In 푆3, we have each of these entities’ fibers in 3D space and 푤 is the rotated vector. In other words, 푞 is the dimension-wise relation quaternion, 푤 is the dimension-wise entity coinciding with each other from the above analysis. We could now ′ 1 2 1 2 1 select 퐴 , 퐴 ∈ 퐶, 퐴 , 퐴 ∈ 퐶 and 퐴 ∈ 퐶 as the attributes for 푒1, 푒2 representation in 3D space, and 푤 is the rotated entity representa- 1 1 2 2 3 푎−푏푖−푐 푗−푑푘 푐 (∈ ) = 푖휃 tion in the same space. If 푞 = 푎 +푏푖 +푐 푗 +푑푘 then 푞 = √ . and 푒3 respectively. We define 푅 퐶 푒 to be the rotational 푎2+푏2+푐2+푑2 Reusing 푆푂 (3) rotations from NagE in this step helps us inherit 5.4 Score function and Optimization Objective its geometric for the completion of our approach. In previous sections, we learned that lifting the dimensions of an entity would take us to the fiber (circle) in 푆3. For two entities we 5.2 Homotopy Lifting using Inverse Hopf Map would have two linked circles enriched with entity semantics. Now, In this step, we aim to achieve expressiveness in higher dimension we also need to find the distance between the attributes oftwo space, i.e., 4D hypersphere as our main contribution. The inverse entities lying on the linked circles. This could be done, for example, map with the equations described in Lemma 4.1 lifts the point (ro- by considering the corresponding attributes of the two entities or tated entity representation from Equation 5) in 푆2 to a set of points by taking a pairwise distance between the all combinations of the (i.e., fiber) lying in a higher-dimensional space. Once both entities attributes. That would be computationally expensive. Hence, we are mapped as individual fibers in the 4D hypersphere, it is possible propose another variant where we find an optimal matching be- to ground connectivity patterns in rotations as done by NagE and tween the attributes and then take the pairwise distance between RotatE in the lower dimensional spaces. Each point on the fiber has these matched attributes. Table 1 provides an holistic overview of the same geometric property for a given entity. We parametrize KGE details. To find an optimal matching between the set of entity the entities to fix the location of these sets of points on the fiber. attributes, we use the Wasserstein Distance using Equation 4. In Hence, we now have a bunch of points on the fiber for each entity, practice, we have used the Sinkhorn iterations for faster computa- which could represent the entity’s semantic attributes. Formally tion of the Wasserstein distances [7]. Let Π be the optimal matching we define a set of parameters 푡 for each entity corresponding to its between the attributes h푖 ∈ Eℎ and t푗 ∈ E푡 of an entity pair ℎ and 푡 semantic attribute 퐴푒 (cf., section 3). These parameters could be respectively. The score function is1: used to find the set of points on the fiber as below: ∑︁ 1 푓푟 (ℎ, 푟ℎ푡 , 푡) = ∥푀 (O(h푖, 푟ℎ푡 )) − 푀 (t푗 ) ∥ + ∥푀 (O(t푗 , 푟)) − 푀 (h푖 ) ∥ ′ 2 푒푖 = 푟 ⊗ (cos(푡) + sin(푡)푖) (6) [푖,푗 ]∈Π (7) ′ where 푒푖 is the entity quaternion per dimension and 푟 is obtained where h푖 , t푗 are the attributes (semantic properties) of the head from either Equations 2 or 3. From Corollary 4.1, we obtain that the and tail entities respectively, 푟ℎ푡 is the relation. O represents the equation 6, which takes the representation to a higher dimension, rotation operation on each dimension of the entity i.e. O(푒푑, 푟푑 ) = 3 does not cause a loss in expressivity. 푟푑 ⊗ 푒푑 ⊗ 푟푑 where 푟푑 ∈ 퐻 and 푒푑 ∈ 푅 . Similarly, 푀 represents the inverse Hopf Map to the fiber in 푆3, per dimension. In practice, for Claim 5.1. Translation methods, in the euclidean space, require at larger attribute sizes, computing the Wasserstein distance between least 푛 − 1 dimensions to model an n-clique relational pattern between a set of points could be computationally expensive. Therefore, we 푛 entities. define its variant to consider the minimum distance between the two sets’ attributes. The score function transforms to: A direct consequence of Claim 5.1 is that to model dense relations; 1 푓푟 (ℎ, 푟ℎ푡 , 푡) = inf ∥푀 (O(h푖, 푟ℎ푡 )) − 푀 (t푗 ) ∥ + ∥푀 (O(t푗 , 푟ℎ푡 )) − 푀 (h푖 ) ∥ we would need a dimension proportionate to the number of entities. h푖 ∈Eℎ 2 We propose an increase in the model expressivity by increasing the t푗 ∈E푡 entity attribute size while keeping the dimensions same. The entity (8) attribute size (i.e. number of heads) is | 퐴푒 |. While increasing the The loss function aims to minimize the score function for the pos- number of heads will make the model more expressive, it may also itive triples and maximize it for the negative triples. More specif- increasing the training latency. In ablation results, we aim to study ically, we use a self adverserial negative sampling loss similar to the relation between model expressivity and the attribute heads for RotatE [35] as below: ∑︁ a fixed time interval (cf., section 7.1). 퐿 = − log {휎(훾 + 푓푟 (ℎ, 푟ℎ푡 , 푡))} ℎ,푟ℎ푡 ,푡 ∈휏 ∑︁ 5.3 Inducing Semantics − ( − −) (−( + ( − − −))) 푝 ℎ , 푟ℎ푡 , 푡 log 휎 훾 푓푟 ℎ , 푟ℎ푡 , 푡 − − − Once the point is mapped to the fiber using inverse Hopf Fibration, ℎ ,푟ℎ푡 ,푡 ∈휏 we propose to learn a set of points that would inform about the where 휏 is the set of positive triples and 휏− is the set of negative various semantics/attributes of the entity. One possible way for the triples, 훾 is a margin hyperparamter and 푝 is the probability given operation would be to have the points spread out on the circle (fiber) by the model of the negative sample. based on the semantic attributes of the entity. Taking inspiration from [11, 53] we use the textual literals widely available for an entity 6 EXPERIMENT SETUP in the standard KGs namely: entity description, Instance of (type of entity such as human), Alias (alternative names) and Surface forms Datasets: We executed experiments on the widely used benchmark (label of the entity). Specifically we use the CBOW model26 [ ] to datasets of WN18, WN18RR, FB15k237 and YAGO3-10 [35, 37]. The aggregate the embeddings of the tokens in each semantic attribute. WN18 dataset is extracted from Wordnet which contains lexical These embeddings could be initialized from a standard model such relations between English words. WN18RR removes the inverse as Glove [31]. The parameter 푡 in Equation 6 is learned from the relations in the WN18 dataset. FB15k237 was created from the semantics. We also tried with other methods such as initializing Freebase knowledge base and does not contain inverse relation. the parameters with the semantic embeddings and using an MLP The YAGO3-10 dataset is obtained from Wikipedia, Wikidata, and to learn the final representation after concatenating the structural Geonames. Table 3 summarizes detailed statistics of all datasets. 1 and semantic embeddings, which does not perform better. Eℎ and E푡 are the multi set corresponding to 푒ℎ and 푒푡 in higher dimensions Model Scoring function Parameters O푡푖푚푒 푘 TransE [5] ∥퐸ℎ + 푊푟 − 퐸푡 ∥ 퐸ℎ,푊푟 , 퐸푡 ∈ 푅 O(푘) 푘 DistMult [47] ⟨푊푟 , 퐸ℎ, 퐸푡 ⟩ 퐸ℎ,푊푟 , 퐸푡 ∈ 푅 O(푘) 푘 ComplEx [38] 푅푒 (⟨푊푟 , 퐸ℎ, 퐸푡 ⟩) 퐸ℎ,푊푟 , 퐸푡 ∈ 퐶 O(푘) 푘 RotatE [35] ∥퐸ℎ ⊙ 푊 푟 − 퐸푡 ∥ 퐸ℎ,푊푟 , 퐸푡 ∈ 퐶 , | 푊푟 |= 1 O(푘) 푘 QuatE [50] 퐸ℎ ⊗ 푊푟 · 퐸푡 퐸ℎ,푊푟 , 퐸푡 ∈ 퐻 O(푘) ℎ 푡 3 푘 푘 ℎ 푡 푘 HopfE (ours) ∥푀 (푊푟 ⊗ 퐸ℎ ⊗ 푊푟 , 퐴 ) − 푀 (퐸푡 , 퐴 ) ∥ 퐸ℎ, 퐸푡 ∈ (푅 ) ,푊푟 ∈ 퐻 , 퐴 , 퐴 ∈ 퐶 O(푘)
Table 1: KGE Overview. 푊푟 is the relation representation, 퐸ℎ, 퐸푡 are the entity representations, 푀 is the inverse Hopf map.
Model WN18 WN18RR MR MRR Hits@1 Hits@3 Hits@10 MR MRR Hits@1 Hits@3 Hits@10 TransE [5] - 0.495 0.113 0.888 0.943 3384 0.226 - - 0.501 DistMult [47] 655 0.797 - - 0.946 5110 0.43 0.39 0.44 0.49 ComplEx [38] - 0.941 0.936 0.945 0.947 5261 0.44 0.41 0.46 0.51 RotatE [35] 309 0.949 0.944 0.952 0.959 3340 0.476 0.428 0.492 0.571 NagE [48] - 0.950 0.944 0.953 0.960 - 0.477 0.432 0.493 0.574 QuatE [50] 388 0.949 0.941 0.954 0.960 3472 0.481 0.436 0.500 0.564 HopfE 245 0.949 0.938 0.954 0.960 2885 0.472 0.413 0.500 0.586 HopfE + Semantics 99 0.945 0.934 0.951 0.961 2884 0.482 0.433 0.500 0.579 Table 2: Evaluation metrics on the WN18 and WN18RR datasets. Best results are in bold and second best is underlined.
Dataset #Entities #Relations % N to % N to % 1 to 1 % 1 to however, our hyper parameters are restricted due to hardware lim- N 1 N itations. We inherit results and experiment settings from [48, 50] WN18 40943 36 0.00 59.65 40.22 0.12 for WN18, WN18RR, and FB15K237. We ignore alternative settings WN18RR 40943 22 0.03 74.59 25.33 0.04 and hyper parameters provided by [33, 36]. On the Yago dataset, re- FB15K237 14541 474 4.93 91.26 3.63 0.17 sults are taken from RotatE. To keep the experiment settings same, YAGO310 123182 74 57.12 41.10 1.65 0.12 the configuration of QuatE with N3 regularization and reciprocal Table 3: Dataset Stats including fraction of relation types. learning has been omitted from our comparison considering these are optimizations extraneous to the model and we do not use them. Evaluation Metrics: Following the widely adapted metrics, we report the Mean Rank (MR), Mean Reciprocal Rank (MRR) and the 7 RESULTS cut-off hit ratio (hits@N (N=1,3,10)) metrics on all the datasets. MR reports the average rank in all the correct entities. MRR is the Our evaluation results are shown in the Table 2 and Table 4. On average of the inverse rank of the correct entities. H@N evaluates the WN18 dataset, the majority relation patterns comprise sym- the ratio of correct entities predictions at a top N predicted results. metry/antisymmetry and inversion. Our model which combines Implementation: The Adam optimizer [17] is used for learning both properties (geometric interpretability and 4D expressiveness) with a learning rate of 0.1 with an exponentially decaying learning achieves comparable results to the baselines. On the WN18RR schedule with a decay rate of 0.1. The parameters are initialized us- dataset, the main relation pattern is the symmetry pattern, and the ing the He initialization [13]. We perform a hyperparameter search translation method (TransE) shows a limited performance because on the embedding dimension 푑 ∈ {50, 100, 200, 500}, batch size 푏 ∈ of its inability to infer symmetric patterns. QuatE reports slightly {256, 512, 1024}, negative samples 푛푛푒푔 ∈ {16, 32, 62, 128, 256, 512}, higher results than NagE, and our model achieves the best results on negative adversarial sampling temperature 훼 ∈ {0.5, 1.0} and the all metrics (except Hits@1). Similar to WN18RR, FB15K237 dataset margin paramter훾 ∈ {6, 9, 12, 24}. We release code in public Github2. does not contain inverse relation, and the majority relation pattern We consider two variants of our approach: 1) HopfE, that does not is composition. We see that NagE performs better than QuatE on contain any semantic properties 2) HopfE+Semantics, which has the FB15K237 dataset due to its ability to infer non-abelian hyper entity attributes encoded on the fibers (cf., section 5.3). relations. Except for MR, our model achieves better results than Baselines: We compare with the state-of-the-art link prediction previous baselines illustrating the ability of HopfE in handling com- methods mainly employing translational/semantic matching tech- position relations. On the YAGO3-10 dataset, triples mostly contain niques since our approach is built upon them. Specifically we com- a human’s descriptive attributes, such as marital status, profession, pare with TransE [5], Distmult [47], ComplEx [38], RotatE [35], citizenship, and gender. Our model (HopfE) clearly outperforms NagE [48], and QuatE [50]. We extend 3D rotations similar to NagE, the previous baseline (RotatE) on all metrics. Across all datasets, we observe that adding semantics does not necessarily result in a 2https://github.com/ansonb/HopfE gain in the performance and the behavior varies as per the dataset. Model FB15K237 YAGO3-10 MR MRR Hits@1 Hits@3 Hits@10 MR MRR Hits@1 Hits@3 Hits@10 TransE [5] 357 0.294 - - 0.465 ----- DistMult [47] 254 0.241 0.155 0.263 0.419 5926 0.340 0.240 0.380 0.540 ComplEx [38] 339 0.247 0.158 0.275 0.428 6351 0.360 0.260 0.400 0.550 RotatE [35] 177 0.338 0.241 0.375 0.533 1767 0.495 0.402 0.550 0.670 NagE [48] - 0.340 0.244 0.378 0.530 ----- QuatE [50] 176 0.311 0.221 0.342 0.495 ----- HopfE 212 0.343 0.247 0.379 0.534 1077 0.529 0.438 0.586 0.695 HopfE + Semantics 199 0.330 0.230 0.373 0.528 1260 0.518 0.429 0.570 0.678 Table 4: Evaluation metrics on the FB15K237 and YAGO3-10 datasets. Best results are in bold and second best is underlined.
Model MR MRR Hits@1 Hits@10 gain as observed in alternative approaches such as [10, 18]. Effect of Hopf Fibration: Our rationale here is to understand HopfE + Label 1260 0.518 0.429 0.678 empirical advantage of inverse Hopf Fibration. Hence, we created HopfE + Alias 1281 0.525 0.44 0.684 a variant of HopfE named "HopfE(w/o Hopf)" that only contains HopfE + Instance 1215 0.518 0.428 0.68 rotation in 3D sphere (section 5.1) and does not include mapping HopfE + Desc 1264 0.521 0.436 0.677 from 3D to 4D using Hopf Fibration compromising higher dimen- HopfE + Numeric 2261 0.441 0.357 0.598 sion expressiveness. For the same, we report results on individual Table 5: Ablation study of inducing individual literals on the relational patterns. The results are in Table 7. We observe that YAGO3-10 dataset. Best results are in bold. the performance of HopfE is higher compared to the "HopfE(w/o Hopf)" setting for three types of relations. On WN18, the fraction of 1-to-N links is 0.04 percent. Hence, HopfE faces challenges in learning the corresponding mapping into higher dimensions. Over- Relation RotatE HopfE all, our experiment justifies that increasing dimensional expres- derivationally_related_form 0.947 0.956 sivity does not compromise the overall average empirical results. also_see 0.585 0.626 Geometric interpretation: Having established the importance of verb_group 0.943 0.943 Hopf Fibration in the previous experiment, we now study if the similar_to 1 1 geometric interpretability of rotations in 푆2 is maintained after train- hypernym 0.148 0.161 ing with the Fibrations on WN18 dataset. We analyze the inverse instance_hypernym 0.318 0.347 relations ℎ푎푠_푝푎푟푡 and 푝푎푟푡_표푓 and the compositional relations member_meronym 0.232 0.257 푑푒푟푖푣푎푡푖표푛푎푙푙푦_푟푒푙푎푡푒푑_푓 표푟푚 (r1 in Figure 3), ℎ푦푝푒푟푛푦푚 (r2 in Fig- synset_domain_topic_of 0.341 0.405 ure 3), and 푠푦푛푠푒푡_푑표푚푎푖푛_푡표푝푖푐_표푓 (r3 in Figure 3). As illustrated has_part 0.184 0.199 in Figure 3, the histogram of the sum of the rotation angles of the member_of_domain_usage 0.318 0.34 inverse relations are concentrated at 0푐 and for the compositional member_of_domain_region 0.2 0.392 relations the difference between the angles is concentrated around 푐 Table 6: MRR of individual relations of the WN18RR dataset. 0 as expected, maintaining geometric interpretability. Relation wise performance: Previous ablation study established 7.1 Ablation Studies that the geometric interpretability remains intact for HopfE. Now, we study expressiveness in a higher dimension (in addition to Ta- Effect of Adding Semantics: To understand the impact of adding ble 2 and Table 4). We perform a relation-wise performance on textual literals, we conduct an ablation on the attribute informa- the WN18RR dataset to understand if HopfE can maintain expres- tion induced in the model. We add each entity attribute to study siveness compared to rotational methods which are in the lower their individual impact. The results for the YAGO3-10 dataset are dimensions. We borrow experimental settings from RotatE [35]. As reported in Table 5 because this dataset provides both textual and seen in Table 6, in comparison to the RotatE baseline (considering numeric literals. We observe that different semantic information NagE skips this analysis), the HopfE model can perform better on corresponds to variations in the results. Also, we note that adding these relations. The observed behavior implies that when we aim some semantic attributes causes a drop in performance similar to a for geometric interpretation using Hopf Fibration, the performance decline in performance in Table 4 when semantics are collectively does not decrease for any relation. added. Furthermore, encoding of numeric literals is a limitation of Studying the effect of the number of heads: In a real-world sce- our approach. This is possibly due to the implementation choice we nario, all KG entities will not have the same number of properties, made for encoding literals using character and word embeddings. and few of them may require a large number of attributes to model Hence, there are two open questions for future research: 1) how the KG ontology. In this experiment, we aim to understand how a model can choose semantic information dynamically 2) how to does adding more entity attributes for modeling dense relations incorporate numeric literals in this setting for higher performance Model N to N N to 1 1 to N 1 to 1 MRR Hits@1 MRR Hits@1 MRR Hits@1 MRR Hits@1 HopfE(w/o Hopf) 0.917 0.889 0.187 0.112 0.259 0.175 0.865 0.762 HopfE 0.939 0.931 0.195 0.123 0.206 0.136 0.951 0.929 HopfE + Semantics 0.941 0.932 0.193 0.124 0.253 0.167 0.929 0.881 Table 7: Comparison of HopfE with a setting (HopfE(w/o Hopf)) where we omit Hopf Mapping on WN18 dataset. Inducing Hopf Fibration for dimensional expressivity shows a clear empirical edge in majority relation types. Best values in bold.
using a synthetic dataset because this dataset allows us to have varying number of attribute of the entities. Specifically, we gen- erate an Erdős–Rényi graph [8] with an average degree varying from 10 to 100 in the increments of 10 steps. We use 1, 5, and 10 heads per entity per fiber(dimension). Also, note that since these are random graphs, we evaluate the performance on the train set as the performance on the test set is expected to be low in this setting. The dimensions(=10) and other common parameters are (a) 휙ℎ푎푠_푝푎푟푡 + 휙푝푎푟푡_표푓 (b) ∥푟ℎ푎푠_푝푎푟푡 ∥ ∗ ∥푟ℎ푎푠_푝푎푟푡 ∥ kept the same in all the models. The models are not all trained to convergence but for a fixed number of iterations. The Figure 4a illustrates that increasing the degree of nodes in the data in gen- eral results in a drop in the results, which is as expected. Another observation is that the setting with five heads can perform better than the one with ten heads at higher degrees. The observation possibly indicates that the latter model may require more training to fit better. A similar observation could be gleaned from Figure 4b where we study the effect of increasing the number of heads from (c) 휙 (푟1 ⊗ 푟2) − 휙 (푟1) (d) ∥푟푝푎푟푡_표푓 ∥ 1 to 10 for a graph with an average degree per node as 100. We see that the metric improves from 1 to 5 heads and then begins to drop. We conclude that number of heads along with the training steps is a hyperparameter to be tuned for better results.
8 CONCLUSION Our central research question was to study if a KGE approach can (e) ∥푟 ∥ (f) 휙 (푟 ⊗ 푟 ) − 휙 (푟 ) ℎ푎푠_푝푎푟푡 1 2 3 achieve geometric interpretability along with the expressiveness of four-dimensional space. The proposed theoretical foundations and Figure 3: Analysis of the geometric properties of the inverse associated empirical results of HopfE show that it is possible to and compositional relations. maintain the geometric interpretability and simultaneously increase the model’s expressiveness in four-dimensional space. Furthermore, HopfE enables encoding of the entity attributes to a fibration in a higher dimension for enriching the embeddings with the semantic information prevalent in real-world KGs. Our ablation studies sys- tematically support several choices we made in the paper’s scope, such as the use of Hopf Fibration, the effect of a number of heads, and the individual impact of entity semantic properties. Besides providing empirical evidences complementing the theoretical foun- dations (Sections 4 and 5), our results open up important research questions for future research: 1) In four dimensions space, how to utilize the semantic attributes (an essential aspect of KGE [30]) more (a) (b) optimally to positively impacting the link prediction performance and induce semantic class hierarchies similar to [51]. 2) How to Figure 4: Effect of attribute heads on the performance. utilize logic on the attributes (or description logic) for fine-grained impacts performance (cf., Claim 5.1)? Hence, we study the effect reasoning and logical interpretability of a KGE model in 4D? We of increasing the number of heads on the prediction performance point readers to the open research directions as viable next steps. REFERENCES on Research and Development in Information Retrieval. 105–114. [1] Carl Allen, I Balazevic, and Timothy Hospedales. 2021. Interpreting [24] David W Lyons. 2003. An elementary introduction to the Hopf fibra- knowledge graph relation representation from word embeddings. In tion. Mathematics magazine 76, 2 (2003), 87–98. International Conference on Learning Representations. [25] Jerrold E Marsden and Tudor S Ratiu. 1995. Introduction to mechanics [2] Anson Bastos, Abhishek Nadgeri, Kuldeep Singh, Isaiah Onando Mu- and symmetry. Physics Today 48, 12 (1995), 65. lang, Saeedeh Shekarpour, Johannes Hoffart, and Manohar Kaul. 2021. [26] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Ef- RECON: Relation Extraction using Knowledge Graph Context in a ficient Estimation of Word Representations in Vector Space. arXiv Graph Neural Network. In Proceedings of The Web Conference (WWW). preprint arXiv:1301.3781 (2013). [3] Hans J Baues and Hans Joachim Baues. 1989. Algebraic homotopy. [27] Gaspard Monge. 1781. Memoir on the theory of cuttings and embank- Vol. 15. Cambridge university press. ments. Histoire de l’Acad ’e mie Royale des Sciences de Paris (1781). [4] Kurt D. Bollacker, Robert P. Cook, and Patrick Tufts. 2007. Freebase: A [28] Rémy Mosseri and Rossen Dandoloff. 2001. Geometry of entangled Shared Database of Structured General Human Knowledge. In AAAI. states, Bloch spheres and Hopf fibrations. Journal of Physics A: Mathe- [5] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, matical and General 34, 47 (2001), 10243. and Oksana Yakhnenko. 2013. Translating embeddings for modeling [29] Matteo PALMONARI and Pasquale MINERVINI. 2020. Knowledge multi-relational data. In NeurlPS. 1–9. graph embeddings and explainable AI. Knowledge Graphs for Explain- [6] Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, able Artificial Intelligence: Foundations, Applications and Challenges 47 and Christopher Ré. 2020. Low-Dimensional Hyperbolic Knowledge (2020), 49. Graph Embeddings. In Proceedings of the 58th Annual Meeting of the [30] Heiko Paulheim. 2018. Make embeddings semantic again!. In Interna- Association for Computational Linguistics. 6901–6914. tional Semantic Web Conference (P&D/Industry/BlueSky). [7] Marco Cuturi. 2013. Sinkhorn distances: Lightspeed computation of [31] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. optimal transport. Advances in neural information processing systems Glove: Global Vectors for Word Representation. In Proceedings of the 26 (2013), 2292–2300. 2014 Conference on Empirical Methods in Natural Language Processing, [8] Paul Erdös and Alfréd Rényi. 2011. On the evolution of random graphs. EMNLP 2014, A meeting of SIGDAT, a Special Interest Group of the ACL. In The structure and dynamics of networks. Princeton University Press, ACL, 1532–1543. 38–82. [32] Charles Chapman Pugh and CC Pugh. 2002. Real mathematical analysis. [9] Chang Gao, Chengjie Sun, Lili Shan, Lei Lin, and Mingjiang Wang. 2020. Vol. 2011. Springer. Rotate3D: Representing Relations as Rotations in Three-Dimensional [33] Daniel Ruffinelli, Samuel Broscheit, and Rainer Gemulla. 2019. Youcan Space for Knowledge Graph Embedding. In Proceedings of the 29th ACM teach an old dog new tricks! on training knowledge graph embeddings. International Conference on Information & Knowledge Management. In International Conference on Learning Representations. 385–394. [10] Alberto Garcia-Duran and Mathias Niepert. 2017. Kblrn: End-to-end [34] Ahmad Sakor, Kuldeep Singh, Anery Patel, and Maria-Esther Vidal. learning of knowledge base representations with latent, relational, and 2020. Falcon 2.0: An entity and relation linking tool over wikidata. In numerical features. arXiv preprint arXiv:1709.04676 (2017). Proceedings of the 29th ACM International Conference on Information & [11] Genet Asefa Gesese, Russa Biswas, Mehwish Alam, and Harald Sack. Knowledge Management. 3141–3148. 2019. A survey on knowledge graph embeddings with literals: Which [35] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2018. Ro- model links better literal-ly? Semantic Web Preprint (2019), 1–31. tatE: Knowledge Graph Embedding by Relational Rotation in Complex [12] William Rowan Hamilton. 1844. LXXVIII. On quaternions; or on a new Space. In International Conference on Learning Representations. system of imaginaries in Algebra: To the editors of the Philosophical [36] Zhiqing Sun, Shikhar Vashishth, Soumya Sanyal, Partha Talukdar, and Magazine and Journal. The London, Edinburgh, and Dublin Philosophical Yiming Yang. 2020. A Re-evaluation of Knowledge Graph Completion Magazine and Journal of Science 25, 169 (1844), 489–495. Methods. In Proceedings of the 58th Annual Meeting of the Association [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delv- for Computational Linguistics. 5516–5522. ing deep into rectifiers: Surpassing human-level performance on ima- [37] Kristina Toutanova and Danqi Chen. 2015. Observed versus latent genet classification. In Proceedings of the IEEE international conference features for knowledge base and text inference. In Proceedings of the 3rd on computer vision. 1026–1034. workshop on continuous vector space models and their compositionality. [14] H Hopf. 1931. About the mapping of the three-dimensional sphere 57–66. [38] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and onto the spherical surface. Math. Ann. 104 (1931), 637–665. Guillaume Bouchard. 2016. Complex embeddings for simple link [15] Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge prediction. In International Conference on Machine Learning. PMLR, graph completion with adaptive sparse transfer matrix. In Proceedings 2071–2080. of the AAAI Conference on Artificial Intelligence, Vol. 30. [39] Leandra Vicci. 2001. Quaternions and rotations in 3-space: The algebra [16] Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and Philip S and its geometric interpretation. Chapel Hill, NC, USA, Tech. Rep Yu. 2021. A survey on knowledge graphs: Representation, acquisition (2001). and applications. EEE Transactions on Neural Networks and Learning [40] Cédric Villani. 2008. Optimal transport: old and new. Vol. 338. Springer Systems (2021). Science & Business Media. [17] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Sto- [41] Denny Vrandecic. 2012. Wikidata: a new platform for collaborative chastic Optimization. In ICLR. data collection. In Proceedings of the 21st World Wide Web Conference, [18] Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens WWW 2012 (Companion Volume). 1063–1064. Lehmann, and Asja Fischer. 2019. Incorporating literals into knowledge [42] Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embeddings. In International Semantic Web Conference. Springer, graph embedding: A survey of approaches and applications. IEEE 347–363. Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724– [19] Samuel J. Li. NA. Hopf Fibration Visualization. https://samuelj.li/hopf- 2743. fibration/ [43] Yanjie Wang, Rainer Gemulla, and Hui Li. 2018. On multi-relational link [20] Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2016. Knowledge repre- prediction with bilinear models. In Proceedings of the AAAI Conference sentation learning with entities, attributes and relations. ethnicity 1 on Artificial Intelligence, Vol. 32. (2016), 41–52. [44] Zhigang Wang, Juanzi Li, Zhiyuan Liu, and Jie Tang. 2016. Text- [21] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. enhanced representation learning for knowledge graph. In Proceedings Learning entity and relation embeddings for knowledge graph com- of International Joint Conference on Artificial Intelligent (IJCAI). 4–17. pletion. In Proceedings of the AAAI Conference on Artificial Intelligence, [45] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Vol. 29. Knowledge graph embedding by translating on hyperplanes. In Pro- [22] Haonan Lu and Hailin Hu. 2020. DensE: An Enhanced Non-Abelian ceedings of the AAAI Conference on Artificial Intelligence, Vol. 28. Group Representation for Knowledge Graph Embedding. arXiv [46] Hassler Whitney. 1940. On the theory of sphere-bundles. Proceedings preprint arXiv:2008.04548 (2020). of the National Academy of Sciences of the United States of America 26, [23] Xiaolu Lu, Soumajit Pramanik, Rishiraj Saha Roy, Abdalghani Abuja- 2 (1940), 148. bal, Yafang Wang, and Gerhard Weikum. 2019. Answering complex [47] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. questions by joining multi-document evidence with quasi knowledge 2015. Embedding Entities and Relations for Learning and Inference in graphs. In Proceedings of the 42nd International ACM SIGIR Conference Knowledge Bases. In 3rd International Conference on Learning Repre- [51] Zhanqiu Zhang, Jianyu Cai, Yongdong Zhang, and Jie Wang. 2020. sentations, ICLR. Learning hierarchy-aware knowledge graph embeddings for link pre- [48] Tong Yang, Long Sha, and Pengyu Hong. 2020. NagE: Non-abelian diction. In Proceedings of the AAAI Conference on Artificial Intelligence, group embedding for knowledge graphs. In Proceedings of the 29th ACM Vol. 34. 3065–3072. International Conference on Information & Knowledge Management. [52] Feng Zhao, Tao Xu, Langjunqing Jin, and Hai Jin. 2020. Convolutional 1735–1742. Network Embedding of Text-enhanced Representation for Knowledge [49] Aston Zhang, Yi Tay, SHUAI Zhang, Alvin Chan, Anh Tuan Luu, Siu Graph Completion. IEEE Internet of Things Journal (2020). Hui, and Jie Fu. 2020. Beyond Fully-Connected Layers with Quater- [53] Yu Zhao, Ruobing Xie, Kang Liu, WANGXiaojie, et al. 2020. Connecting nions: Parameterization of Hypercomplex Multiplications with 1/푛 Embeddings for Knowledge Graph Entity Typing. In Proceedings of the Parameters. In International Conference on Learning Representations. 58th Annual Meeting of the Association for Computational Linguistics. [50] Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. 2019. Quaternion Knowledge 6419–6428. Graph Embeddings. In Advances in Neural Information Processing Sys- [54] Xuanyu Zhu, Yi Xu, Hongteng Xu, and Changjian Chen. 2018. Quater- tems 32: Annual Conference on Neural Information Processing Systems nion convolutional neural networks. In Proceedings of the European 2019, NeurIPS 2019. 2731–2741. Conference on Computer Vision (ECCV). 631–647.