What Is a Multi-Model Database and Why Use It?
Total Page:16
File Type:pdf, Size:1020Kb
White Paper: What is a multi-model database and why use it? An ArangoDB White Paper (April 2020) What is a multi-model database and why use it? An ArangoDB White Paper (April 2020) Creating software applications is very much like life itself. You never know what might happen tomorrow or in 2 months from now. We commit to technology choices based on assumptions about the future. Choosing the right database for an application is fraught with difficulty, since database choice involves commitment to a particular model up front, that can easily be invalidated by unpredictably changing requirements in the future. Making the right choice involves prioritizing capabilities and making compromises based on what you know at the time, what you think might happen, the skills and resources you have at the time, and a lot of luck. If your use case requires relational, document, and graph capabilities, do you pick three different kinds of database technologies or do you pick the one you think is the most dominant? Often compromises lead to multiple kinds of databases in the same project, resulting in operational friction, data inconsistency, redundancy, and latency. Picking the wrong database model can be very costly. What if you could future proof database choice? Native multi-model databases insulate you from requirements changes and data model lock in. This white paper explains what multi-model databases are, why it makes sense to use them, and illustrates how to apply them using an aircraft fleet management use case. C opyright ArangoDB Inc. What is a multi-model database and why use it? ArangoDB White Paper – April 2020 Table of Contents 1 What is a native multi-model database? 3 2 Why native multi-model? 4 3 Data modeling with native multi-model databases 6 3.1 Aircraft fleet maintenance: A case study 6 3.1.1 A data model for an aircraft fleet 7 3.1.2 Queries for aircraft fleet maintenance 8 3.1.3 Using multi-model querying 12 3.2 Lessons learned for data modeling 13 4 Further use cases for native multi-model databases 14 2 Copyright ArangoDB Inc. What is a multi-model database and why use it? ArangoDB White Paper – April 2020 1 What is a native multi-model database? Polyglot persistence is becoming the new normal [zdnet 2020], since the explosion of single model NoSQL databases in 2010, however, the proliferation of different kinds of databases and the complex orchestrations needed to keep the systems-of-systems in synch can become extremely complex, burdensome and expensive. By contrast, native multi model databases, like ArangoDB, allow polyglot persistence in the same datastore. This eliminates the complex data orchestration, data transformation, and diversity of database knowledge needed to support polyglot persistence. Native multi-model databases, like ArangoDB, support multiple data models (e.g., key-value, document, graph, SQL) seamlessly (at the same time) in one core system using a single unified query language. This definition excludes databases that support one model at a time and need to be switched between data models. It also excludes databases that use views to create facades for different data models (layered multi-model). In a true multi-model database, the different models coexist as first class citizens and are interoperable in a multi-model query language. So what is a native multi-model database? In short, a native multi-model database has one core, one query language, but multiple data models. A native multi-model database is a combination of several data stores in one system. You can store data as key/value pairs, graphs or documents and can access your data seamlessly with one declarative query language, combining different models in a single query. You can build high-performance applications and scale horizontally using all data models to their full extent. Figure 1: A native multi-model has multiple coexisting data models accessed via one query language. 3 Copyright ArangoDB Inc. What is a multi-model database and why use it? ArangoDB White Paper – April 2020 2 Why native multi-model? Over the past decade, software architects have realized that there are benefits to leveraging a variety of data models for different parts of the persistence layer of software projects. As a result: Polyglot persistence has become increasingly popular. However, the earlier applications of polyglot persistence combined multiple single model databases in one software system. In this earlier approach, you would use a relational database to persist structured tabular data; a document store for unstructured object-like data; a key/value store for a hash table; and a graph database for highly linked referential data. The combination of multiple single model databases in the same project, often leads to operational friction (more complicated deployment, more frequent upgrades) as well as data consistency and duplication issues. A single native multi-model database provides a superior solution for the following three reasons: Firstly, most use cases require diverse data models and choosing a particular single data model causes you to either force fit the other data models into the chosen model or causes you to use multiple databases accompanied by increasingly bizantine orchestration logic, high latencies, and data consistency issues - A multi-model allows you to avoid doing this. Secondly, requirements do change over time and multi-model databases allow you to easily adapt to these changes without having to “rip and replace” databases. Thirdly, database proliferation within enterprises is costly, can require expensive specialized skills, and complex orchestrations and transformations to keep data in sync and consistent. Multi-model databases allow enterprises to replace multiple systems, while avoiding external orchestration and transformation. 4 Copyright ArangoDB Inc. What is a multi-model database and why use it? ArangoDB White Paper – April 2020 Figure 2: tables, documents, graphs and key/value pairs: different data models. How do native multi-model databases work? How do key-values, documents and graphs coexist seamlessly? The answer is very simple. Documents in a document collection usually have a unique primary key that encodes document identity, which makes a document store into a key/value store, where the keys are strings and the values are JSON documents. The fact that the values are JSON does not impose a performance penalty, but offers a good amount of flexibility. The graph data model can be implemented by storing a JSON document for each vertex and a JSON document for each edge. The edges are kept in special edge collections that ensure that every edge has _from and _to attributes which reference the starting and ending vertices of the edge as well as the direction of a relationship. Having unified the data for the three data models in this way, it remains to implement a common query language that allows users to express document queries, key/value lookups, “graphy queries,” and arbitrary combinations of these. By “graphy queries”, I mean queries that involve the particular connectivity features coming from the edges, e.g. ● Graph Traversals ● Shortest_Path ● and Pattern Matching A Pattern Matching query in a multi-model database identifies all paths that follow an arbitrary complex combination of conditions. These conditions are composed of conditions on each single document or edge and conditions on the overall layout created by these 5 Copyright ArangoDB Inc. What is a multi-model database and why use it? ArangoDB White Paper – April 2020 objects. After all the theory, let’s dive into an actual use case of Aircraft Fleet Management including data modeling aspects and actual multi-model queries. 3 Data modeling with native multi-model databases 3.1 Aircraft fleet maintenance: A case study One area where the flexibility of a native multi-model database is extremely well suited is the management of large amounts of hierarchical data, such as in an aircraft fleet. An aircraft fleet consists of several types of aircraft, and a typical aircraft is composed of several million parts: from assemblies and subassemblies to individual components. One can think of an aircraft as a collection of parts flying in formation where the “formation” is a hierarchy of “items”. To organize the maintenance of a fleet of aircrafts, one has to manage a multitude of data at different levels in this hierarchy. Parts and components have: ● names ● serial numbers ● manufacturer information ● maintenance intervals ● maintenance dates ● information about subcontractors ● links to manuals and documentation ● contact persons ● warranty and service contract information, to name but a few. This data for each item is attached to nodes in an aircraft’s bill of materials hierarchy. It is important to note that there may be three component hierarchies for an aircraft: as-designed, as-built, and as-maintained. In this paper we are focused on the as-maintained component hierarchy of an aircraft. The difference is that in an as-maintained hierarchy we see the aircraft in terms of replaceable and repairable components. In addition the configuration of the aircraft, and therefore the hierarchy will change over time as parts are replaced or refurbished on the individual aircraft. This data is tracked in order to provide information and answer questions. Questions can include but are not limited to the following examples: 6 Copyright ArangoDB Inc. What is a multi-model database and why use it? ArangoDB White Paper – April 2020 ● What are all the component parts in a given part? ● Given a (broken) part, what is the smallest component of the aircraft that contains the part and for which there is a maintenance procedure? ● Which parts of this aircraft need maintenance next week? 3.1.1 A data model for an aircraft fleet How do we model the data about our aircraft fleet using a multi-model database? A multi-model database provides a number of ways of accomplishing this, however, one approach that is very natural and offers fast query performance is to use a JSON document to hold the data for each item in the hierarchy and to represent the hierarchical relationships between the item documents as a graph.