
Category Theory as a Unifying Database Formalism ABSTRACT from the system development side. For example consider the Database theory developed from mathematical logic and set various formal advances regarding schema mapping, aggre- theory, but many formalisms have been introduced over time gates and user-defined-functions, deductive databases, in- to tackle specific extensions of these core theories. This pa- complete data and management of nulls, database recovery per makes the case for using category theory as a unifying and concurrency control, real-time DB. DS: are we propos- formalism for databases, by arguing that category theory ing to handle each of these? can capture in a uniform and succinct way the basic rela- While for each of these areas precise formal models, rooted tional model (e.g., schemas, constraints, queries, updates), in formal logic, have been provided, it remains often hard and its many extensions (e.g., aggregates, transactions, trig- to precisely understand the correspondences between differ- gers, schema mapping, pivoting, anonymization, and semi- ent formalizations — at the very least this understanding structured data). We also show how existing results from the requires a significant time investment. This task is often large corpus of theorems developed within pure mathematics daunting, making it harder to operate across the boundaries can be naturally imported, and how automated proof assis- of different theories, which in turn works as a disincentive tants such as COQ could be leveraged. Finally, we observe to anyone trying to reduce this gap. that category theory has been successfully applied to other A similar problem faced the mathematics community in areas of computer science (e.g., programming languages), the first half of the 20th century. Mathematics was subdi- this provides hope about the feasibility of our attempt, and vided into many subfields, each with its own jargon and way suggests that by sharing a rigorous formal footing we could of doing business. In order to advance, a unifying language enable fascinating cross-area analysis and theories. and formalism was needed. Category theory was invented in the 1940s by Mac Lane and Eilenberg to strengthen the con- nection between topology and algebra, but it quickly spread 1. INTRODUCTION to neighboring fields. By providing a precise language for DS: Much of the theory below would work out if we used comparing different universes of discourse, category theory only finite schemas (only a finite number of non-equivalent has been a unifying force for mathematics. paths) and allowed only finitely many rows in each table. CC: next two paragraphs need smoothing Category the- Would you prefer this? If so, should we still call the cate- ory is no stranger to computer science. It has been remark- gory of states on “ –Set” or should we replace Set with C C ably successful in formalizing the semantics of programming something like Fin or FSet? languages []. In fact, it has also been applied many times The field of Database research developed at the boundary to database theory in the past. These attempts did not of active industrial development and solid formal theories, catch on in mainstream applications, perhaps because those and here more than in any other field this duality has pro- models attempted to be too faithful to the relational model. duced a fascinating interplay between system builders and Because these formalisms did not seem to offer enough ad- theoreticians. The flow of innovation has not always been a vantages to justify the learning curve, some database theo- unidirectional transfer from new advances in theory to cor- rists developed a slight aversion to category theory. CC: We responding systems, but rather an intricate back and forth, need to check who’s in the PC, and avoid to make enemies! where often systems have been built before a full under- However in this paper, we argue that database theory and standing of the theoretical implications was available. category theory are naturally compatible, in fact that a ba- The core of database theory has developed from rather old sic database schema and a category are roughly the same and well-understood mathematics: mathematical logic and thing. Once a simple dictionary is set up, classical category set theory. From this common starting point the database theoretic results and constructions capture many of the for- community has built a host of extensions, which were needed mal results that appear in database literature. to provide the formal underpinnings for a large number of In particular, we present an application of categories and new functionalities and new solutions to problems arising functors to uniformly represent schemas, integrity constraints, Permission to make digital or hard copies of all or part of this work for database states, queries, updates, and show that this is not personal or classroom use is granted without fee provided that copies are only natural but enlightening. And we show, with a simple not made or distributed for profit or commercial advantage and that copies online demo, that simple algorithms can be used to trans- bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific late SQL schemas (with primary keys and foreign keys) and permission and/or a fee. queries into the formalism of categories and functors. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. 1 by other communities, such as the theoretical programming Definition 2.0.1. A category consists of the following C language community, there is an interesting opportunity of components: bridging results and enabling theories and analysis to span 1. A set of objects Ob –eachelementx Ob is called across different areas, as an example consider security mod- C ∈ C an object of . els C DS: This paper can serve as a very fast introduction of 2. For each x, y Ob asetArr (x, y) —eachelement ∈ C C databases to anyone familiar with computer science. Database f Arr (x, y) is called an arrow from x to y in , C theorists should note the economy of the definitions con- ∈ f C and is denoted as f : x y or as x y. tained here. → −→ 3. For each object x Ob achosenarrowidx Arr (x, x) ∈ C ∈ C Contributions. In summary this paper makes the following cal led the identity arrow on x. contributions: 4. For each x, y, z Ob ,afunction ∈ C comp : Arr (x, y) Arr (y, z) Arr (x, z) • C C × C → C cal led the composition law for —we denote comp (f,g) Next, we showcase the expressivity of category theory in 2. BACKGROUND C C • simply by f g. • modeling classical database problems, many of which re- This section provides the reader with• a crash course in To be a category the above components must also satisfy quired significant extensions of the relational model, such as category theory. More details can beThe found rest of this in paper [?, is? organized, ?, ? as]. follows: Section 2 the following properties: provides a crash course in category theory, Section 3 presents Identity law: For each arrow f : x y,thefollowingequa- → schema mapping, transactions, and user defined aggregates. the basic application of category theory to databases that we tions hold: The reader already familiar with thepropose, definition Section 4 shows of howcategory some of the classical database results can be proved in this new framework, Section 5 show- idx f = f and f idy = f. We show existing results in DB theory can be proved using • • and functor, as well as the category caseSet the powerof sets, of category can theory safely beyond classical results on category theories. skip to Section 2.1. The goal of thisa simple section case, and is argues to that provide much more can be done, Sec- Associative law: Given a sequence of composable arrows tion 6 provides summary of related work, and Section ?? f g h w x y z, We then argue that by adopting this formalism we can database researchers with a core introductionsummarizes our conclusions. to category −→ −→ −→ also: (i) inherit results from the large corpus of theorems theory, so we slightly abuse our notation and postpone de- the following equation holds: 2. BACKGROUND f (g h)=(f g) h. produced in the pure category theory research, and (ii) lever- tailed comments on some set-theoreticThis issues section provides (e.g., the the reader dis- with a crash course in • • • • category theory. More details can be found in [?, ?, ?, ?]. CC: I pulled this triangle comment from the definition age automated proof assistants such as COQ. tinction between sets and classes) toThe Appendix reader already B. familiar with the definition of category Given the following triangle: and functor, as well as the category Set of sets, can safely f Finally, since category theory has already been embraced skip to Section 2.1. The goal of this section is to provide x y Intuitively, a category is a multi-graph (i.e., a graph that database researchers with a core introduction to category • • by other communities such as the theoretical programming g can have multiple edges between thetheory, same so wetwo slightly nodes)abuse our notation to- and postpone de- h tailed comments on some set-theoretic issues (e.g., the dis- language community, there is an interesting opportunity of z gether with an equivalence relation ontinction finite between paths sets and classes)(i.e., to we Appendix A. • Intuitively, a category is a multi-graph (i.e., a graph that We say the triangle commutes iff f g =comp (f,g)=h. bridging results and enabling theories and analysis to span • C can declare two paths between the samecan have nodes multiple edgesto be between equiv- the same nodes) together Mathematicians often use categories to formalize rather across different areas.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-