Directed Graph Embeddings in Pseudo-Riemannian Manifolds

Directed Graph Embeddings in Pseudo-Riemannian Manifolds Aaron Sim 1 Maciej Wiatrak 1 Angus Brayne 1 Paid´ ´ı Creed 1 Saee Paliwal 1 Abstract at tractable ranks, without loss of information (Nickel et al., 2011; Bouchard et al., 2015), has spawned numerous em- The inductive biases of graph representation learn- bedding models with non-Euclidean geometries. Examples ing algorithms are often encoded in the back- range from complex manifolds for simultaneously encoding ground geometry of their embedding space. In symmetric and anti-symmetric relations (Trouillon et al., this paper, we show that general directed graphs 2016), to statistical manifolds for representing uncertainty can be effectively represented by an embedding (Vilnis & McCallum, 2015). model that combines three components: a pseudo- Riemannian metric structure, a non-trivial global One key development was the introduction of hyperbolic em- topology, and a unique likelihood function that beddings for representation learning (Nickel & Kiela, 2017; explicitly incorporates a preferred direction in 2018). The ability to uncover latent graph hierarchies was embedding space. We demonstrate the representa- applied to directed acyclic graph (DAG) structures in meth- tional capabilities of this method by applying it to ods like Hyperbolic Entailment Cones (Ganea et al., 2018), the task of link prediction on a series of synthetic building upon Order-Embeddings (Vendrov et al., 2016), and real directed graphs from natural language and Hyperbolic Disk embeddings (Suzuki et al., 2019), with applications and biology. In particular, we show the latter achieving good performance on complex DAGs that low-dimensional cylindrical Minkowski and with exponentially growing numbers of ancestors and de- anti-de Sitter spacetimes can produce equal or bet- scendants. Further extensions include hyperbolic generali- ter graph representations than curved Riemannian sations of manifold learning algorithms (Sala et al., 2018), manifolds of higher dimensions. product manifolds (Gu et al., 2018), and the inclusion of hyperbolic isometries (Chami et al., 2020). While these methods continue to capture more complex 1. Introduction graph topologies they are largely limited to DAGs with Representation learning of symbolic objects is a central transitive relations, thus failing to represent many naturally area of focus in machine learning. Alongside the design of occurring graphs, where cycles and non-transitive relations deep learning architectures and general learning algorithms, are common features. incorporating the right level of inductive biases is key to In this paper we introduce pseudo-Riemannian embeddings efficiently building faithful and generalisable entity and of both DAGs and graphs with cycles. Together with a relational embeddings (Battaglia et al., 2018). novel likelihood function with explicitly broken isometries, In graph representation learning, the embedding space ge- we are able to represent a wider suite of graph structures. ometry itself encodes many such inductive biases, even in In summary, the model makes the following contributions the simplest of spaces. For instance, vertices embedded to graph representation learning, which we will expand in arXiv:2106.08678v1 [stat.ML] 16 Jun 2021 as points in Euclidean manifolds, with inter-node distance more detail below: guiding graph traversal and link probabilities (Grover & Leskovec, 2016; Perozzi et al., 2014), carry the underlying • The ability to disentangle semantic and edge-based assumption of homophily with node similarity as a metric similarities using the distinction of space- and timelike function. separation of nodes in pseudo-Riemannian manifolds. The growing recognition that Euclidean geometry lacks the • The ability to capture directed cycles by introducing flexibility to encode complex relationships on large graphs a compact timelike embedding dimension. Here we consider Minkowski spacetime with a circle time di- 1BenevolentAI, London, UK. Correspondence to: Aaron Sim <[email protected]>. mension and anti-de Sitter spacetime. Proceedings of the 38 th International Conference on Machine • The ability to represent chains in a directed graph that Learning, PMLR 139, 2021. Copyright 2021 by the author(s). flexibly violate local transitivity. We achieve this by Directed Graph Embeddings in Pseudo-Riemannian Manifolds way of a novel edge probability function that decays, partially ordered sets as sets of formal disks. asymmetrically, into the past and future timelike direc- These approaches all retain the partial-order transitivity as- tions. sumption where squared distances decrease monotonically into the future and past. We relax that assumption in our We illustrate the aforementioned features of our model by work, alongside considering graphs with cycles and mani- conducting a series of experiments on several small, simu- folds other than Minkowski spacetime. lated toy networks. Because of our emphasis on graph topol- Pseudo-Riemannian manifold optimization was formalized ogy, we focus on the standard graph embedding challenge in Gao et al.(2018) and specialized in Law & Stam(2020) of link prediction. Link prediction is the task of inferring the to general quadric surfaces in Lorentzian manifolds, which missing edges of, and often solely from, a partially observed includes anti-de Sitter spacetime as a special case. graph (Nickel et al., 2015). Premised on the assumption that the structures of real world graphs emerge from underlying For the remainder of the paper, nodes are points on a mani- mechanistic latent models (e.g. a biological evolutionary fold M, the probability of edges are functions of the node process responsible for the growth of a protein-protein inter- coordinates, and the challenge is to infer the optimal embed- action network, linguistic rules informing a language graph, dings via pseudo-Riemannian SGD on the node coordinates. etc), performance on the link prediction task hinges critically on one’s ability to render expressive graph representations, 2. Background which pseudo-Riemannian embeddings allow for beyond existing embedding methods. In this section we will provide a very brief overview of the relevant topics in differential geometry.2 With this in mind, we highlight the quality of pseudo- Riemannian embeddings over Euclidean and hyperbolic 2.1. Riemannian Manifold Optimization embeddings in link prediction experiments using both synthetic protein-protein interaction (PPI) networks and the The key difference between gradient-based optimization DREAM5 gold standard emulations of causal gene regula- of smooth functions f on Euclidean vs. non-Euclidean tory networks. Additionally, we show that our method has manifolds is that for the latter, the trivial isomorphsim, for comparable performance to DAG-specific methods such as any p 2 M, between a manifold M and the tangent space Disk Embeddings on the WordNet link prediction bench- TpM no longer holds in general. In particular, the stochastic 0 mark. Finally, we explore the ability of anti-de Sitter em- gradient descent (SGD) update step p p − λrfjp for beddings to further capture unique graph structures by ex- learning rate λ and gradient rf is generalized in two areas ploiting critical features of the manifold, such as its intrinsic (Bonnabel, 2013): 1 N S × R topology for representing directed cycles of dif- First, rf is replaced with the Riemannian gradient vector ferent lengths. field rf ! grad f := g−1df; (1) 1.1. Related Work −1 ∗ where g : Tp M! TpM is the inverse of the positive The disadvantages of Euclidean geometry compared to definite metric g, and df the differential one-form. Second, Minkowski spacetime for graph representation learning was the exponential map expp : TpM!M generalizes the first highlighted in Sun et al.(2015). It was followed by vector space addition in the update equation. For any vp 2 Clough & Evans(2017) who explore DAG representations TpM the first-order Taylor expansion is in Minkowski space, borrowing ideas from the theory of Causal Sets (Bombelli et al., 1987). More broadly, the f(expp(vp)) ≈ f(p) + g(grad fjp; vp); (2) asymptotic equivalence between complex networks and from which we infer that grad f defines the direction of large-scale causal structure of de Sitter spacetime was pro- steepest descent, i.e. the Riemannian-SGD (RSGD) update posed and studied in Krioukov et al.(2012). Our work is step is simply notably conceptually similar to the hyperbolic disk embedding approach (Suzuki et al., 2019) that embeds a set of 0 p expp(−λ grad fjp): (3) symbolic objects with a partial order relation as generalized formal disks in a quasi-metric space (X; d). A formal The classes of manifolds considered here all have analytic disk (x; r) 2 X × R is defined by a center x 2 X and a expressions for the exponential map (see below). The curves 1 radius r 2 R. Inclusion of disks defines a partial order traced out by expp(tvp) for t 2 R are called geodesics - the on formal disks, which enables a natural representation of generalisation of straight lines to curved manifolds. 1Suzuki et al.(2019) generalize the standard definition of a 2For a longer introduction, see Robbin & Salamon(2013) and formal disk/ball to allow for negative radii. Isham(1999). Directed Graph Embeddings in Pseudo-Riemannian Manifolds 2.2. Pseudo-Riemannian Extension where by simple substitution N 1 A pseudo-Riemannian (or, equivalently, semi-Riemannian) 2 X 2 manifold is a manifold where g is non-degenerate but no r ≡ r(x) = 1 + xi : (8) longer positive definite. If g is diagonal with ±1 entries, it is i=1 a Lorentzian manifold. If g has just one negative eigenvalue, We define a circle time coordinate to be the arc length it is commonly called a spacetime manifold. vp is labelled ( −1 timelike if g(vp; vp) is negative, spacelike if positive, and r sin (x−1=r); x0 ≥ 0; t := rθ = −1 (9) lightlike or null if zero. r(π − sin (x−1=r)); x0 < 0; It was first noted in Gao et al.(2018) that grad f is not a with x-dependent periodicity t ∼ t + 2πr(x). guaranteed ascent direction when optimizing f on pseudo- Riemannian manifolds, because its squared norm is no The canonical coordinates and metric gL are not intrinsic longer strictly positive (see eq.

Directed Graph Embeddings in Pseudo-Riemannian Manifolds

The Penrose and Hawking Singularity Theorems Revisited

Closed Timelike Curves, Singularities and Causality: a Survey from Gödel to Chronological Protection

Light Rays, Singularities, and All That

Problem Set 4 (Causal Structure)

Theoretical Computer Science Causal Set Topology

Quantum Fluctuations and Thermodynamic Processes in The

On the Categorical and Topological Structure of Timelike Homotopy Classes of Paths in Smooth Spacetimes

K-Causal Structure of Space-Time in General Relativity

Flows Revisited: the Model Category Structure and Its Left Determinedness

Arxiv:2105.02304V1 [Quant-Ph] 5 May 2021 Relativity and Quantum Theory

Time in Causal Structure Learning

DIRECTED POINCARÉ DUALITY Contents 1. Introduction 1 2. Outline 2 3. Ditopology 5 3.1. Streams 5 3.2. Cubical Sets 7 3.3. Compa