
MultiCategory: Multi-model Query Processing Meets Category Theory and Functional Programming Valter Uotila Dieter Gawlick Gregory Pogossiants Jiaheng Lu Zhen Hua Liu SATS Technologies University of Helsinki Souripriya Das [email protected] [email protected] Oracle Corporation [email protected] ABSTRACT Relational data: The variety of data is one of the important issues in the era of Big locations Data. The data are naturally organized in different formats and mod- els, including structured data, semi-structured data, and unstruc- locationId cityName country 10 Helsinki Finland Key value data tured data. Prior research has envisioned an approach to abstract 11 Beijing China located: multi-model data with a schema category and an instance category Location Customer by using category theory. In this paper, we demonstrate a system, Graph data: customers called MultiCategory, which processes multi-model queries based XML data: orders Key value data ordered: orderedBy: on category theory and functional programming. This demo is and products Customer Order Customer name: Mary centered around four main scenarios to show a tangible system. <Orders> creditLimit: 5000 <Order> <OrderNo>34e5e79</OrderNo> knows location: 10 knows First, we show how to build a schema category and an instance <Product> <ProductNo>2343f</ProductNo> <ProductName>Toy</ProductName> <Price>10</Price> name: Erica category by loading different models of data, including relational, </Product> name: Bob ... creditLimit: 8000 </Order> location: 10 creditLimit: 4000 XML, key-value, and graph data. Second, we show a few examples ... </Orders> location: 11 of query processing by using the functional programming language knows Haskell. Third, we demo the flexible outputs with different models of data for the same input query. Fourth, to better understand the Figure 1: A multi-model data environment category theoretical structure behind the queries, we offer a variety of graphical hooks to explore and visualize queries as graphs with respect to the schema category, as well as the query processing Category theory was developed by mathematicians in the 1940s procedure with Haskell. and has been successfully applied in many areas of science in- cluding computer science. Recent research initiatives have applied category theory for the database area. In particular, Spivak [6, 7] 1 INTRODUCTION used a schema category and an instance functor to model relational The variety of data is one of the most important issues in modern databases. Liu et al. [2] promoted category theory to play the role data management systems to cope with the challenge of Big Data. of the new mathematical foundation to reason about declarative In many applications, data sources are naturally organized in differ- constructions and transformations between various data models. ent formats and models, including structured data, semi-structured While the previous works have envisioned the theoretical signif- data, and unstructured data. To address the challenge of variety, icance to model and manage data with category theory, this demon- multi-model databases have begun to emerge with a single data- stration shows our initiative to showcase a proof-of-concept im- base platform to manage multi-model data together, with a fully plementation of MultiCategory, a system to support multi-model integrated backend to handle the demands for performance and query processing based on category theory. The core parts of the scalability [3]. system have been coded with the functional programming language arXiv:2109.00929v1 [cs.DB] 30 Aug 2021 Let us consider an example of a multi-model data environment. Haskell, which is widely recognized to have a strong connection to Figure 1 illustrates an application of E-commerce, which contains category theory. The data storing framework of MultiCategory customers, a social network, and orders information with four is established on the concepts of schema and instance categories distinct data models. The property graph data bear information [6], and the query processing structure is based on catamorphism about mutual relationships between the customers, i.e. who knows and foldable data structures [1]. With these key properties, we can whom, and some customer properties such as name and credit limit. create a system that has a consistent integration with relational, The geographic location of customers is stored in a relational table. hierarchical, and graph data models and we show how category In XML documents, each order has an ID and a sequence of ordered theory can be used to achieve valuable perspectives for multi-model products, each of which includes product number, name, and price. query representation and processing. The fourth type of data, key/value pairs, contains the relations In brief, the demonstration of MultiCategory offers the follow- between different data sets. In a typical application like customer- ing to the audience: 360-view, users of databases demand to analyze the information • category theoretical and functional programming oriented from these four different data sources together to enable a holistic methods of querying and accessing multi-model data with a analysis of customer behaviors. unified schema; Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, and Gregory Pogossiants • a unified query language endowed with Haskell’s lambda Schema category zipCode, locationId orderId expressions allowing the users to submit one query to access Integer different models of data seamlessly; String creditLimit, Order customerName • the flexibility of output the same result with different models, orderedBy customerId which provides the users an opportunity to exploit the same productPrice productName, city, address, contains(order, product) Customer country data with different representations; productId knows(customer1, customer 2) • to better understand the theoretical structure behind the Product Location Boolean contains(order, product) queries, this demo also provides an interactive visualizer to located understand the schema and instance categories, as well as the query processing procedure. Instance functor In our demo, attendees are welcome to compose their queries that Collection constructor functors follow the syntax of our query language to search the multi-model Order Customer Location Product Integer String Boolean datasets. The source code of this system is available in GitHub [8] Hierarchical Graph of Set of and the demo video can be watched online on YouTube [9]. instance of Customer Location Set of Orders instances instances Set of Set of { True, Product Integers Strings False } instances 2 PRELIMINARIES In this section, we first review the mathematical definition ofa Instance category category [4], followed by the descriptions of schema and instance orderId zipCode, locationId Set of Hierarchical Integers categories which are influenced by [6, 7]. instance of Set of Strings Orders creditLimit, orderedBy Definition 2.1. A category C consists of a collection of objects customerId customerName Graph of city, address, productPrice country denoted by $1 9 ¹Cº and a collection of morphisms denoted by contains(order, product) Customer productName, instances productId 퐻><¹Cº. For each morphism 5 2 퐻><¹Cº there exists an object Set of Location Set of Product instances 퐴 2 $1 9 ¹Cº that is a domain of 5 and an object 퐵 2 $1 9 ¹Cº knows(customer1, customer2) instances { True, contains(order, product) located that is a target of 5 . In this case we denote 5 : 퐴 ! 퐵. We re- False } quire that all the defined compositions of morphisms are included in C: if 5 : 퐴 ! 퐵 2 퐻><¹Cº and 6 : 퐵 ! 퐶 2 퐻><¹Cº, then 6 ◦ 5 : 퐴 ! 퐶 2 퐻><¹Cº. We assume that the composition op- Figure 2: An example of a category theoretical construction eration is associative and that for every object 퐴 2 $1 9 ¹Cº there ! ◦ = exists an identity morphism id퐴 : 퐴 퐴 so that 5 id퐴 5 and In schema and instance categories, we can follow any path to ◦ = id퐴 5 5 whenever the composition is defined. form a well-defined function between the start node and the end Informally speaking, we can understand a category as a graph node of the path thanks to the composition rule in Definition 2.1. endowed with the composition rule. For example, there is a morphism (edge) in the instance category Figure 2 constructs a unified schema category, which represents that gives us that the customer 퐴 makes the order 퐵 and another the schema information of a multi-model data environment in morphism that gives us that the order 퐵 includes the product 퐶. Figure 1. Conceptually, an object in a schema category includes two Based on the composition rule of these morphisms there is a well- kinds of data types: (1) the first collection of data types consists ofa defined morphism that gives us that the customer 퐴 buys the prod- string, integer, rational, boolean, etc., called predefined data types; uct 퐶. This compositionality property is important to guarantee the and (2) the second collection of data types includes entities, such correctness of programs to traverse through multiple data models. as customers and products. Morphisms are defined to be the typed functions between the data types, such as a customer is located in a 3 SYSTEM OVERVIEW certain location and an order is ordered by a customer. Furthermore, In this section, we provide an overview of MultiCategory’s ar- it is important to note that a schema category presents a single chitecture, query language, and query processing mechanism. For unified view for different models of data. Based on this view,we more details about technical solutions, a tutorial, and an installation develop a unified query mechanism to process different models of guide you can find from MultiCategory’s documentation and in data seamlessly. Github [8]. An instance category models how the concrete data instances Figure 3 depicts the architecture of MultiCategory that con- are stored. Each object of the schema category is mapped to the sists of the frontend and the backend. In particular, the frontend corresponding typed Haskell data structure in the instance category creates a web interface and data visualizations for relational data, (see Figure 2).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-