<<

Hyperbolic Entailment Cones

A. in the Model One can sanity check that indeed the formula from theorem

n n 1 satisfies the conditions: The hyperboloid model is (H , h·, ·i1), where H := {x ∈ n,1 R : hx, xi1 = −1, x0 > 0}. The hyperboloid model can be viewed from the extrinsically as embedded in the pseudo- • d (γ(0), γ(t)) = t, ∀t ∈ [0, 1] n,1 D Riemannian manifold (R , h·, ·i1) and n,1 inducing its metric. The Minkowski metric tensor gR of signature (n, 1) has the components • γ(0) = x −1 0 ... 0 n,1  0 1 ... 0 gR =    0 0 ... 0 • γ˙ (0) = v 0 0 ... 1

The associated inner-product is hx, yi1 := −x0y0 + • lim γ(t) := γ(∞) ∈ ∂ n Pn t→∞ D i=1 xiyi. Note that the hyperboloid model is a Rieman- nian manifold because the associated with gH is positive definite. C. Proof of Corollary 1.1 n Proof. Denote u = √ 1 v. Using the notations from In the extrinsic view, the tangent space at H can be de- gD(v,v) n n,1 x scribed as TxH = {v ∈ R : hv, xi1 = 0}. See Robbin p Thm.1, one has expx(v) = γx,u( gxD(v, v)). Using Eq.3 & Salamon(2011); Parkkonen(2013). and6, one derives the result.

Geodesics of Hn are given by the following theorem (Eq D. Proof of Corollary 1.2 (6.4.10) in Robbin & Salamon(2011)): n n Proof. For any γx,v(t), consider the plane spanned Theorem 6. Let x ∈ H and v ∈ TxH such that hv, vi = n by the vectors x and v. Then, from Thm.1, this plane 1. The unique unit-speed geodesic φx,v : [0, 1] → H with ˙ contains all the points of γx,v(t), i.e. φx,v(0) = x and φx,v(0) = v is

φx,v(t) = x cosh(t) + v sinh(t). (38) {γx,v(t): t ∈ R} ⊆ {ax + bv : a, b ∈ R} (41) B. Proof of Theorem1 Proof. From theorem6, appendixA, we know the expres- sion of the unit-speed geodesics of the hyperboloid model n E. Proof of Lemma2 H . We can use the Egregium theorem to project the n n geodesics of H to the geodesics of D . We can do that Proof. Assume the contrary and let x ∈ n \{0} s.t. n n D because we know an ψ : D → H between the ψ(kxk) > π . We will show that transitivity implies that two spaces: 2 x0 π ψ(x) := (λ − 1, λ x), ψ−1(x , x0) = 0 ψ(x) 0 x x 0 (39) ∀x ∈ ∂Sx : ψ(kx k) ≤ (42) 1 + x0 2

Formally, let x ∈ n, v ∈ T n with gD(v, v) = 1. Also, D xD If the above is true, by moving x0 on any arbitrary (continu- let γ : [0, 1] → n be the unique unit-speed geodesic in n D D ous) curve on the cone border ∂Sψ(x) that ends in x, one with γ(0) = x and γ˙ (0) = v. Then, by Egregium theorem, x will get a contradiction due to the continuity of ψ(k · k). φ := ψ ◦ γ is also a unit-speed geodesic in Hn. From theorem6, we have that φ(t) = x0 cosh(t) + v0 sinh(t), for We now prove the remaining fact, namely Eq. 42. Let 0 n 0 n 0 ψ(x) ψ(x) some x ∈ H , v ∈ Tx0 H . One derives their expression: any arbitrary x ∈ ∂Sx . Also, let y ∈ ∂Sx be any 0 arbitrary point on the geodesic half-line connecting x with x = ψ ◦ γ(0) = (λx − 1, λxx) (40) x0 starting from x0 (i.e. excluding the segment from x to ∂ψ(y , y)  2  0 ˙ 0 λxhx, vi x0). Moreover, let z be any arbitrary point on the spoke v = φ(0) = γ˙ (0) = 2 ∂y λ hx, vix + λxv 0 0 γ(0) x through x radiating from x , namely z ∈ Ax0 (notation from Eq. 15). Then, based on the properties of hyperbolic −1 0 Inverting once again, γ(t) = ψ ◦φ(t), one gets the closed- angles discussed before (based on Eq.8), the angles ∠yx z 0 form expression for γ stated in the theorem. and ∠zx x are well-defined. Hyperbolic Entailment Cones

Further, let b ∈ ∂Dn be the intersection of the spoke through x with the border of Dn. Following the same argument as in the proof of lemma2, one proves Eq. 45 which gives:

0 0 ∠yx z ≥ ψ(x ) (49) In addition, the angle at x0 between the geodesics xy and Oz can be written in two ways:

0 0 ∠Ox x = ∠yx z (50)

0 ψ(x) Since x ∈ ∂Sx , one proves

0 0 ∠Oxx = π − ∠x xb = π − ψ(x) (51) We apply hyperbolic law of sines (Eq. 10) in the hyperbolic O, x, x0, y, z From Cor. 1.2 we know that the points are triangle Oxx0: coplanar. We denote this plane by P. Furthermore, the metric of the Poincare´ ball is conformal with the Euclidean sin( Oxx0) sin( Ox0x) ∠ = ∠ (52) metric. Given these two facts, we derive that 0 sinh(dD(O, x )) sinh(dD(O, x)) yx0z + zx0x = (yx0x) = π (43) Putting together Eqs. 48,49,50,51,52, and using the fact ∠ ∠ ∠ π that sin(·) is an increasing function on [0, 2 ], we derive the thus conclusion of this helper lemma. π min( yx0z, zx0x) ≤ (44) ∠ ∠ 2 We now return to the proof of our theorem. Consider any arbitrary r, r0 ∈ (0, 1) ∩ Dom(ψ) with r < r0. Then, we It only remains to prove that claim that is enough to prove that yx0z ≥ ψ(x0)& zx0x ≥ ψ(x0) (45) n 0 ψ(x) 0 0 ∠ ∠ ∃x ∈ D , x ∈ ∂Sx s.t. kxk = r, kx k = r (53)

0 0 Indeed, if the above is true, then one can use the fact5, i.e. Indeed, assume w.l.o.g. that ∠yx z < ψ(x ). Since 0 0    ∠yx z < ψ(x ), there exists a point t in the plane P such 1 + r 2r sinh(kxk ) = sinh ln = (54) that D 1 − r 1 − r2

0 0 0 ∠Oxt < ∠Oxy & ψ(x ) ≥ ∠tx z > ∠yx z (46) and apply lemma7 to derive 0 ψ(x0) ψ(x) h(r ) ≤ h(r) (55) Then, clearly, t ∈ Sx0 , and also t∈ / Sx , which contradicts the transitivity property (Eq. 20). which is enough for proving the non-increasing property of function h. F. Proof of Theorem3 We are only left to prove the fact 53. Let any arbitrary Proof. We first need to prove the following fact: x ∈ Dn s.t. kxk = r. Also, consider any arbitrary geodesic n ψ(x) Lemma 7. Transitivity implies that for all x ∈ D \{0}, γx,v : R+ → ∂Sx that takes values on the cone border, 0 ψ(x) i.e. (v, x) = ψ(x). We know that ∀x ∈ ∂Sx : ∠

0 0 kγx,v(0)k = kxk = r (56) sin(ψ(kx k)) sinh(kx kD) ≤ sin(ψ(kxk)) sinh(kxkD). (47) and that this geodesic ”ends” on the ball’s border ∂Dn, i.e.

k lim γx,v(t)k = 1 (57) Proof. We will use the exact same figure and notations of t→∞ points y, z as in the proof of lemma2. In addition, we Thus, because the function kγx,v(·)k is continuous, we ob- assume w.l.o.g that 0 0 tain that for any r ∈ (r, 1) there exists an t ∈ R+ s.t. π kγ (t0)k = r0. By setting x0 := γ (t0) ∈ ∂Sψ(x) we yx0z ≤ (48) x,v x,v x ∠ 2 obtain the desired result. Hyperbolic Entailment Cones G. Proof of Theorem5 ψ(x) Proof. For any y ∈ Sx , the axial symmetry property implies that π − ∠Oxy ≤ ψ(x). Applying the hyperbolic cosine law in the triangle Oxy and writing the above angle inequality in terms of the cosines of the two angles, one gets

− cosh(kyk ) + cosh(kxk ) cosh(d (x, y)) cos ∠Oxy = D D D sinh(kxkD) sinh(dD(x, y)) (58)

Eq. 28 is then derived from the above by an algebraic refor- mulation.

H. Training Details For all methods except Order embeddings, we observe that initialization is very important. Being able to properly dis- entangle embeddings from different subparts of the graph in the initial learning stage is essential in order to train qualitative models. We conjecture that initialization is hard because these models are trained to minimize highly non- convex loss functions. In practice, we obtain our best re- sults when initializing the embeddings corresponding to the hyperbolic cones using the Poincare´ embeddings pre- trained for 100 epochs. The embeddings for the Euclidean cones are initialized using Simple Euclidean embeddings pre-trained also for 100 epochs. For the Simple Euclidean embeddings and Poincare´ embeddings, we find the burn-in strategy of (Nickel & Kiela, 2017) to be essential for a good initial disentanglement. We also observe that the Poincare´ embeddings are heavily collapsed to the unit ball border (as also pictured in Fig.3) and so we rescale them by a factor of 0.7 before starting the training of the hyperbolic cones. Each model is trained for 200 epochs after the initialization stage, except for order embeddings which were trained for 500 epochs. During training, 10 negative edges are gener- ated per positive edge by randomly corrupting one of its end points. We use batch size of 10 for all models. For both cone models we use a margin of γ = 0.01. All Euclidean models and baselines are trained using stochastic gradient descent. For the hyperbolic models, we do not find significant empirical improvements when using full Riemannian optimization instead of approximating it with a retraction map as done in (Nickel & Kiela, 2017). We thus use the retraction approximation since it is faster. For the cone models, we always project outside of the  ball cen- tered on the origin during learning as constrained by Eq. 26 and its Euclidean version. For both we use  = 0.1. A learn- ing rate of 1e-4 is used for both Euclidean and hyperbolic cone models.