Multi-Model Query Processing Meets Category Theory and Functional Programming

Multi-Model Query Processing Meets Category Theory and Functional Programming

MultiCategory: Multi-model Query Processing Meets Category Theory and Functional Programming Valter Uotila Dieter Gawlick Gregory Pogossiants Jiaheng Lu Zhen Hua Liu SATS Technologies University of Helsinki Souripriya Das [email protected] [email protected] Oracle Corporation [email protected] ABSTRACT Relational data: The variety of data is one of the important issues in the era of Big locations Data. The data are naturally organized in different formats and mod- locationId cityName country els, including structured data, semi-structured data, and unstruc- 10 Helsinki Finland Key value data 11 Beijing China located: tured data. Prior research has envisioned an approach to abstract Location Customer multi-model data with a schema category and an instance category Graph data: customers by using category theory. In this paper, we demonstrate a system, XML data: orders Key value data ordered: orderedBy: and products Customer called MultiCategory, which processes multi-model queries based Order Customer name: Mary <Orders> creditLimit: 5000 <Order> <OrderNo>34e5e79</OrderNo> location: 10 on category theory and functional programming. This demo is <Product> knows knows <ProductNo>2343f</ProductNo> <ProductName>Toy</ProductName> <Price>10</Price> name: Erica centered around four main scenarios to show a tangible system. </Product> name: Bob ... creditLimit: 8000 </Order> location: 10 creditLimit: 4000 ... First, we show how to build a schema category and an instance </Orders> location: 11 category by loading different models of data, including relational, knows XML, key-value, and graph data. Second, we show a few examples of query processing by using the functional programming language Figure 1: A multi-model data environment Haskell. Third, we demo the flexible outputs with different models of data for the same input query. Fourth, to better understand the category theoretical structure behind the queries, we offer a variety integrated backend to handle the demands for performance and of graphical hooks to explore and visualize queries as graphs with scalability [3]. respect to the schema category, as well as the query processing Let us consider an example of a multi-model data environment. procedure with Haskell. Figure 1 illustrates an application of E-commerce, which contains customers, a social network, and orders information with four PVLDB Reference Format: distinct data models. The property graph data bear information Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, about mutual relationships between the customers, i.e. who knows and Gregory Pogossiants. MultiCategory: Multi-model Query Processing whom, and some customer properties such as name and credit limit. Meets Category Theory and Functional Programming. PVLDB, 14(12): 2663 The geographic location of customers is stored in a relational table. - 2666, 2021. In XML documents, each order has an ID and a sequence of ordered doi:10.14778/3476311.3476314 products, each of which includes product number, name, and price. PVLDB Artifact Availability: The fourth type of data, key/value pairs, contains the relations The source code, data, and/or other artifacts have been made available at between different data sets. In a typical application like customer- https://multicategory.github.io/. 360-view, users of databases demand to analyze the information from these four different data sources together to enable a holistic 1 INTRODUCTION analysis of customer behaviors. The variety of data is one of the most important issues in modern Category theory was developed by mathematicians in the 1940s data management systems to cope with the challenge of Big Data. and has been successfully applied in many areas of science in- In many applications, data sources are naturally organized in differ- cluding computer science. Recent research initiatives have applied ent formats and models, including structured data, semi-structured category theory for the database area. In particular, Spivak [6, 7] data, and unstructured data. To address the challenge of variety, used a schema category and an instance functor to model relational multi-model databases have begun to emerge with a single data- databases. Liu et al. [2] promoted category theory to play the role base platform to manage multi-model data together, with a fully of the new mathematical foundation to reason about declarative constructions and transformations between various data models. This work is licensed under the Creative Commons BY-NC-ND 4.0 International While the previous works have envisioned the theoretical signif- License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of icance to model and manage data with category theory, this demon- this license. For any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights stration shows our initiative to showcase a proof-of-concept im- licensed to the VLDB Endowment. plementation of MultiCategory, a system to support multi-model Proceedings of the VLDB Endowment, Vol. 14, No. 12 ISSN 2150-8097. doi:10.14778/3476311.3476314 query processing based on category theory. The core parts of the system have been coded with the functional programming language 2663 Haskell, which is widely recognized to have a strong connection to Schema category zipCode, locationId orderId category theory. The data storing framework of MultiCategory Integer is established on the concepts of schema and instance categories String creditLimit, Order customerName [6], and the query processing structure is based on catamorphism orderedBy customerId and foldable data structures [1]. With these key properties, we can productPrice productName, city, address, contains(order, product) Customer country create a system that has a consistent integration with relational, productId knows(customer1, customer 2) hierarchical, and graph data models and we show how category Product Location Boolean contains(order, product) theory can be used to achieve valuable perspectives for multi-model located query representation and processing. In brief, the demonstration of MultiCategory offers the follow- Instance functor ing to the audience: Collection constructor functors • category theoretical and functional programming oriented Order Customer Location Product Integer String Boolean methods of querying and accessing multi-model data with a Hierarchical Graph of Set of unified schema; instance of Customer Location Set of Orders instances instances Set of Set of { True, Product • a unified query language endowed with Haskell’s lambda Integers Strings False } instances expressions allowing the users to submit one query to access different models of data seamlessly; • the flexibility of output the same result with different models, Instance category orderId zipCode, locationId which provides the users an opportunity to exploit the same Set of Hierarchical Integers instance of Set of data with different representations; Strings Orders creditLimit, orderedBy • to better understand the theoretical structure behind the customerId customerName Graph of city, address, queries, this demo also provides an interactive visualizer to productPrice country contains(order, product) Customer productName, instances productId understand the schema and instance categories, as well as Set of Location Set of Product instances the query processing procedure. knows(customer1, customer2) instances { True, In our demo, attendees are welcome to compose their queries that contains(order, product) located False } follow the syntax of our query language to search the multi-model datasets. The source code of this system is available in GitHub [8] and the demo video can be watched online on YouTube [9]. Figure 2: An example of a category theoretical construction 2 PRELIMINARIES In this section, we first review the mathematical definition ofa unified view for different models of data. Based on this view,we category [4], followed by the descriptions of schema and instance develop a unified query mechanism to process different models of categories which are influenced by [6, 7]. data seamlessly. An instance category models how the concrete data instances Definition 2.1. A category C consists of a collection of objects are stored. Each object of the schema category is mapped to the denoted by $1 9 ¹Cº and a collection of morphisms denoted by corresponding typed Haskell data structure in the instance category 퐻><¹Cº. For each morphism 5 2 퐻><¹Cº there exists an object (see Figure 2). Each morphism in the schema category is mapped to 퐴 2 $1 9 ¹Cº that is a domain of 5 and an object 퐵 2 $1 9 ¹Cº a concrete Haskell function in the instance category. The mapping that is a target of 5 . In this case we denote 5 : 퐴 ! 퐵. We re- between these categories is called an instance functor which is quire that all the defined compositions of morphisms are included defined on objects by collection constructor functors [1]. As shown in C: if 5 : 퐴 ! 퐵 2 퐻><¹Cº and 6 : 퐵 ! 퐶 2 퐻><¹Cº, then in our demo, queries are formulated based on the schema category 6 ◦ 5 : 퐴 ! 퐶 2 퐻><¹Cº. We assume that the composition op- and the answers are retrieved from the instance category based on eration is associative and that for every object 퐴 2 $1 9 ¹Cº there the instance functor and the collection constructor functors. exists an identity morphism id퐴 : 퐴 ! 퐴 so that 5 ◦ id퐴 = 5 and In schema and instance categories, we can follow any path to id퐴 ◦ 5 =

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us