Fast Left-Kan Extensions Using the Chase

Noname manuscript No. (will be inserted by the editor) Fast Left-Kan Extensions Using The Chase David I. Spivak · Ryan Wisnesky August 30, 2021 Abstract We show how any chase algorithm from relational database theory that computes initial models on finite-limit theories can be used to compute left-Kan extensions of set-valued functors, and prove that the core chase and the parallel chase both compute initial models on finite-limit theories. We also describe an optimized implementation of the parallel chase specialized to left-Kan computations that achieves an order of magnitude improvement in our performance benchmarks compared to the next fastest left-Kan extension algorithm we are aware of. Keywords Computational category theory · left-Kan extensions · the Chase · Data migration · Data integration 1 Introduction Left-Kan extensions [8] are used for many purposes in automated reasoning: to enumerate the elements of finitely-presented algebraic structures such as monoids; to construct semi-decision procedures for Thue (equational) systems; to compute the cosets of groups; to compute the orbits of a group action; to compute quotients of sets by equivalence relations; and more. Left-Kan extensions are described category-theoretically, and we assume a knowledge of category theory [3] in this paper, but see the next section for a review. Let C and D be categories and F : C ! D a functor. Given a functor J : D ! Set, where D ! Set (also written SetD) is the category of functors from D to the category of sets, Set, we define ∆F (J): C ! Set := J ◦ F , and think of ∆F as a functor from D ! Set to C ! Set. ∆F has a left adjoint, which we write as ΣF , taking functors in C ! Set to functors in D ! Set. Given a functor I : C ! Set, the functor ΣF (I): D ! Set is called the left-Kan extension [8] of I along F . Left-Kan extensions always exist, up to unique isomorphism, but they need not be finite, (i.e., ΣF (I)(d) may have infinite cardinality for some object d 2 D, even when I(c) has finite cardinality for every object c 2 C). In this paper we Both authors at Conexus AI 2 David I. Spivak, Ryan Wisnesky describe how to compute finite left-Kan extensions when C, D, and F are finitely presented and I is finite, a semi-computable problem originally solved in [8] and significantly improved upon in [7]. 1.1 Motivation Our interest in left-Kan extensions comes from their use in data migration [24, 28,25], where C and D represent database schemas, F represents a \schema map- ping" [16] defining a translation from schema C to D, and I represents an input C-database (often called an instance) that we wish to migrate to schema D. Our implementation of the fastest left-Kan algorithm we knew of from existing literature [7] was impractical for large input instances, yet it bore a striking op- erational resemblance to an algorithm from relational database theory known as the chase [10], which is also used to solve data migration problems, and for which efficient implementations are known [5]. The chase takes a set of formulae F in a subset of first-order logic known to logicians as existential Horn logic [10], to category theorists as regular logic [22] to database theorists as embedded dependen- cies [10], and to topologists as lifting problems [27], and constructs an F-model chaseF (I) that is weakly initial among other such \F-repairs" of I. 1.2 Contributions In this paper, we: { show how any chase algorithm that computes initial models on finite-limit theories [1] (regular theories where every 9 quantifier is read as \exists-unique") can be used to compute left-Kan extensions of set-valued functors; and, { prove that the core chase [10] and the parallel chase [10] compute initial models on finite-limit theories; and, { describe an optimized left-Kan extension algorithm, inspired by the parallel chase, that achieves an order of magnitude improvement in our performance benchmarks compared to the next fastest left-Kan extension algorithm we are aware of [7]. 1.3 Outline This paper is structured as follows. In the next section we review category theory [3] and then describe a running example of a left-Kan extension. Then in section 2 we describe how to compute left-Kan extensions using chases that pro- duce initial models on finite-limit theories, as well as prove that the core chase and parallel chase construct initial models on finite-limit theories. In section 3 we describe our particular left-Kan algorithm implementation, compare it to the algorithm in [7], and provide experimental performance results. We conclude in section 4 by discussing the differences between the chase as used in relational database theory and as used in this paper. We assume knowledge of formal logic and algebraic specification at the level of [2], and knowledge of left-Kan extensions at the level of [8] and knowledge of the chase at the level of [10] is helpful. Fast Left-Kan Extensions Using The Chase 3 1.4 Review of Category Theory In this section, we review standard definitions and results from category theory [3]. A category C is an axiomatically-defined algebraic structure similar to a group, ring, or monoid. It consists of: { a set (or class), Ob(C), the members of which we call objects, and { for every two objects c1; c2, a set (or class) C(c1; c2), the members of which we call morphisms (or arrows) from c1 to c2, and { for every three objects c1; c2; c3, a function ◦c1;c2;c3 : C(c2; c3) × C(c1; c2) ! C(c1; c3), which we call composition, and { for every object c, an arrow idc 2 C(c; c), which we call the identity for c. We may write f : c1 ! c2 instead of f 2 C(c1; c2), and drop object subscripts on id and ◦, when doing so does not create ambiguity. The sets and functions above must obey axioms stating that ◦ is associative and id is its unit: id ◦ f = f f ◦ id = f f ◦ (g ◦ h) = (f ◦ g) ◦ h Two morphisms f : c1 ! c2 and g : c2 ! c1 such that f ◦ g = id and g ◦ f = id are said to be an isomorphism. We may write f; g or f:g to indicate g ◦ f and write c 2 C to indicate c 2 Ob(c) when it is clear c is an object. An example category is the category of sets and total functions, Set, which has for objects all the \small" sets in some set theory, such as ZFC, and for morphisms X to Y the total functions X ! Y represented as sets. The isomorphisms of Set are exactly the bijections. Programming languages often form categories, with types as objects and programs taking inputs of type t1 and returning outputs of type t2 as morphisms t1 ! t2. A relational database schema consisting of single-part foreign keys and single-part unique identifiers also forms a category, say C, and we may consider C-databases as functors C ! Set [24], with the pleasant property that natural transformations of such functors correspond exactly to the C-database homomorphisms in the sense of relational database theory [10] (bearing in mind some caveats discussed in the conclusion to this paper). A functor F : C ! D between categories C and D consists of: { a function F : Ob(C) ! Ob(D), and { for every c1; c2 2 Ob(C), a function Fc1;c2 : C(c1; c2) ! D(F (c1);F (c2)), where we may omit object subscripts when they can be inferred, such that F (idc) = idF (c) F (f ◦ g) = F (f) ◦ F (g): A natural transformation h : F ! G between functors F; G : C ! D consists of a family of morphisms hc : F (c) ! G(c), indexed by objects in C, called the components of h, such that for every f : c1 ! c2 in C, hc2 ◦ F (f) = G(f) ◦ hc1 . Given (small [3]) categories C and D, the functors from C to D form a category, C ! D (written also DC ), whose morphisms are natural transformations. The category of all (small) categories forms a category in the obvious way. A natural transformation is called a natural isomorphism when all of its components are isomorphisms. The family of equations defining a natural transformation may be rendered as a 4 David I. Spivak, Ryan Wisnesky commutative diagram: F (f) F (c1) / F (c2) hc1 hc2 G(c1) / G(c2) G(f) Such a diagram indicates that all compositions of morphisms that start at the same object and end at the same object are to commute (be equal morphisms) in D; in this case, there are two such paths, east-south and south-east. Because categories are algebraic objects, they can be defined by generators and relations (or as we like to say, generators and equations) in a manner similar to e.g. groups. Formally, let G be a directed multi-graph with nodes N and edges E, and let a path from node n1 to node nk consist of a continuous, possibly 0-length list of edges n1 !e1 n2 ! ::: !ek nk. An equation p = q (from node n1 to node nk) is a pair of paths p and q (each from n1 to nk), and given a set of equations Eq , we say that two paths p and q (each from n1 to nk) are equivalent according to Eq if there is a way to re-write (in the sense of a Thue system [3]) p into q according to Eq and the usual axioms of equality (reflexivity, symmetry, transitivity) and congruence (if p = q, then p:e = q:e).

Load more