Relational Databases and Indexed Categories

Relational Databases and Indexed Categories RobertRosebrugh and R J Wood ABSTRACT A description of relational databases in categorical terminology given here has as intended application the study of database dynamics in par ticular we view i up dates as database ob jects in a suitable category indexed by a top os ii Lfuzzy databases as database ob jects in sheaves Indexed cate gories are constructed to mo del the databases on a xed family of domains and also all databases for a varying family of domains Further weshow that the pro cess of constructing the relational completion of a relational database is a monad in a category of functors Intro duction We use the term relation for a sub ob ject of a nite pro duct of ob jects in a categoryFollowing the relational database literature we use the term domain for an ob ject of the ambient category and warn readers that these are not the ordered ob jects whichgoby the name domain elsewhere in theoretical Computer Science A relational database as dened byEFCodd is rst of all a family of relations or tables on a family of domains A heavily used example of domain is the set of character strings over an alphab et Thus domains should b e logically p ermitted to b e innite though in practice they are always nite sets eg character strings up to a xed maximum length The theory of databases as families of relations views domains simply as discrete ob jects We adopt that p oint of view for this pap er though the domains of practice usually have at least an order structure Avery brief example will serve to illustrate the concepts mentioned so far Weintro duce three domains name address phonewhich can each b e viewed as sets of character strings satisfying appropriate constraints An example of a database on this family of domains is the family of two relations addressbook phonebook where addressbook is a sub ob ject of name addressand phonebook is a sub ob ject of name phone Clearly the storage and manipulation of databases is an imp ortant part of computing practise The theory of relational databases is welldevelop ed and the relational mo del for databases is now the most widely implemented Earlier database paradigms network and hierarchical are still found in many older sys tems They are not as amenable to theoretical treatment do not provide a p ortable conceptual structure and are of decreasing interest Moreover there is active current research on enhancements and extensions of the relational mo del Current editions of the texts by Date or Ullman contain p ointers to this work Research partially supp orted by grants from NSERC Canada Diagrams typ eset using Catmac This pap er is in nal form and no verison of it will b e submitted for publication elsewhere R ROSEBRUGH R J WOOD The theory of families we use is the theory of indexedcategories as studied byPareandSchumacher Indexed categories are a widely used categorical to ol but have only b egun to b e explicitly used in theoretical computer science relatively recently The relational algebra of relational database theory involves op erations which are settheoretic and other op erations which can b e dened by a language involving only constants variables of domain or relation typ e and equality An ob jective of this article is to construct the rela tional completion of a database as the action of a monad so that relationally complete databases are algebras for this monad Section gives some exam ples and then the construction of a required family In Section we describ e databases as families of relations in an Sindexed category A and construct an indexed category of databases for a xed family of domains We then return to examples including up dates and fuzzy databases Section considers the eect of varying domains and attributes and nds an indexed category of all databases in an indexed category In Section we construct the relational completion monad We nd that the endofunctor part of the monad is an endofunctor on the bration which arises from the indexed category of databases Finallywe observe that relationally complete databases are categories of relations The Setting We will freely use the notion of indexed categorysowe rst describ e the basic language of indexed categories We b egin with a base category Swhichis required to have nite limits Moreover for our description of database ob jects S must allow construction of free monoids It suces to assume that S is an elementary top os with natural numb ers ob ject N Appropriate examples of S include the category of sets and functions setany top os of diagrams or presheaves or any Grothendieck top os A top os which will interest us b elow is set the top os whose ob jects are functions in set and whose arrows are commutative squares I An Sindexedcategory A is given by a category A for each ob ject I in I J S and a functor A A for each arrow J I in S These substitution functors are sub ject to isomorphisms making them compatible with identities and comp osition in S and coherent with asso ciativity For example if K J is also in S then there is a canonical isomorphism For a complete description see We will often want A to b e just S with a canonical indexed structure We I denote it by S with S dened to b e the slice category SI and the required substitutions dened by pullbacks We detail two examples of S now Example When we take S to b e the category setwe nd that the setindexed category set has for any set I ordinary I indexed families of sets as its I indexed families This follows since setI has functions with co domain I as ob jects Such a function x X I saymay b e identied with a family of sets X dened by X x i and conversely In fact any category i iI i is setindexed again taking I indexed families to b e just ordinary families of ob jects RELATIONAL DATABASES AND INDEXED CATEGORIES Example When S is set we get a more interesting indexing The indexing ob jects are now functions in set eg I I I and an I indexed family X b eing an arrowofset is a pair of functions x X I x X I making a commutative square in set x X Ix Substitutions are dened by pullback which are computed p ointwise Example Another example of a set indexed category arises when we allow the ob ject X ab ove to b e replaced bya partial function whichwe will X denote X X X X Thus when I we obtain the category d whose ob jects are partial functions and whose morphisms from X to Y say are pairs of functions f X Y and f X Y such that the restriction of f to X factors through Y by f and Yf f X AnI indexed family is a d d d d x x pair X I X I so that x X Ix with x the restriction of x d d to X A morphism in I indexed families is a pair f f of functions so that d in X X X X d f f R R Y x Y Y Y d y y I I I f restricts to f X Y and f X Yf Substitution is still accomplished d d d d by p ointwise pullback including on the domain of full denition X We denote d pf the resulting indexed category by set Wewant to dene a relational database to b e a J indexed family of relations in A on some I indexed family of domains say AAcentral feature of indexed category theory is that it identies a J indexed family of structures as a struc ture in the category of J families eg a J indexed family of groups is a group in J families Similarlya J indexed family of relations is a single relation in the J category A of J indexed families To dene relational database in A weneed to b e able to say when a J indexed family of relations is a family of sub ob jects of nite pro ducts of domains in A In order to make this requirement precise we will need some notation and some hyp otheses on A The remainder of this section provides this background For an ob ject I of S we denote the free monoid on I by M I Henceforth we assume that M I exists in S It is well known that M I exists in anytopos with a natural numb ers ob ject We also need to assume and do so for the remainder of this pap er that A has nite products This requires that each I I A has nite pro ducts preserved by the IfA is an ob ject of A we will need the M I indexed family of nite pro ducts of memb ers of A denoted P A When S is set the family desired has as bre overaword w i i i M I k R ROSEBRUGH R J WOOD w the nite pro duct whose description is A A A A We conclude i i i k this section by nding sucient conditions for the existence of P A Under suitable hyp otheses the required family of nite pro ducts can b e constructed as a solution to a recursion problem for the indexed functor crossing with A We recall that for Sindexed categories A and Banindexed I I I functor F A B is a family of functors F A B one for each I in SFurther for any arrow J I in Swemust have the squares I F I I A B J J A B J F commuting up to coherent isomorphism A recursion problem on A is a pair C with an indexed endofunctor of A and C in A The recursion N problem C has a solution if there is an ob ject C in A such that C C N and s C C An hyp othesis we shall need on A is that it has an indexed functor E A S with small bres An indexed functor E has smal l bres when the I I I ob jects of A whose image under E is a given ob ject of S form a family indexed by an ob ject of SI An indexed functor with co domain S and small bres was called an efunctor in The name refers to elements since I the idea is that E gives a very rough idea of the cardinalityofanobjectin I A Examples include the identity functor on S and forgetful functors

Relational Databases and Indexed Categories

Families of Sets and Extended Operations Families of Sets

Learning from Streams

A Transition to Advanced Mathematics

Arxiv:Cs/0403027V2 [Cs.OH] 11 May 2004 Napoc Ommrn Optn Ne Inexactitude Under Computing Membrane to Approach an Fteraoeetoe Ae 7,A Bulwc N H P˘Aun Pro Gh

An Extensible Theory of Indexed Types

Equivalents to the Axiom of Choice and Their Uses A

Set Theory in Computer Science a Gentle Introduction to Mathematical Modeling I

About the Limits of Inverse Systems in the Category S (B) of Segal Topological Algebras

Indexed Containers

Non-Idempotent Intersection Types in Logical Form

Semitopological Coproducts and Free Objects on N Totally Ordered Sets in Some Categories of Complete, Distributive, Modular, and Algebraic Lattices

MATH 220 Homework 7 Solutions