Transactional Processing for Polyglot Persistence

Transactional Processing for Polyglot Persistence Ricardo Jimenez-Peris ∗, Marta Patino-Martinez˜ y, Ivan´ Brondino y and Valerio Vianello y ∗LeanXcale Madrid Email: [email protected] y Universidad Politecnica´ de Madrid Spain Email:mpatino,ibrondino,vvianello@fi.upm.es Abstract—NoSQL data stores have emerged in last years as a data stores are non-transactional or at most allow transactions solution to provide scalability and flexibility in data modelling updating a single row. This fact creates many difficulties in for operational databases. These data stores have proven that the advent of failures and/or concurrent updates. It has been they are better suited for some kinds of problems than relational databases. In order to scale, they relaxed the properties reported that BitCoin was stolen a lot of money because they 1 provided by relational databases, mainly transactions. However, were using NoSQL technology . A hacker ran in parallel transactional semantics is still needed by most applications. In massive transfers of money over a set of accounts, since there this paper we describe how CoherentPaaS provides scalable was no transactional isolation, the hacker was able to transfer holistic transactions across SQL and NoSQL data stores such as the original amount many times before the first transfer of document-oriented data stores, key-value data stores and graph databases. money was able to update the value of the account to zero. Index Terms—NoSQL, databases, transaction processing. The lack of atomicity also causes many problems when multiple rows should be updated as part of a business op- I. INTRODUCTION eration, a failure in the middle will result in an inconsistent database since only a fraction of the rows would be updated. Companies have traditionally used transactional SQL For instance, if a key-value data store such as HBase is databases to keep their operational data. In the last few years, used in a banking application, if there is a failure during a with the advent of NoSQL, many companies are embracing money transfer between two accounts it might happen that the new NoSQL data stores. These data stores include docu- withdrawal from the origin account completes but the deposit ment oriented data stores, such as MongoDB [1], key-value in the destination account is not performed. This scenario will data stores, such as Apache HBase [2] and graph databases, lead to an inconsistent database. such as Neo4J [3] and Sparksee [4]. Companies use NoSQL The above problems become more acute in a Polyglot technology due to different reasons. In some cases, because Persistence environment. It is very typical to store structured the flexibility of the data models that enables to store semi- data in an operational SQL database and the semi-structured structured data that has a high degree of variability that makes part of the data on a NoSQL data store such as a key-value too hard to store them on relational databases oriented towards or document-oriented data store. However, both pieces of data fully structured data. This is for instance the case for key- are part of an integral conceptual business operation. A failure value data stores that are schemaless and enable to store tuples after one data store is updated and before a second data store arbitrary schemas. In some other cases, it is because of the is updated results in an inconsistent logical database (the set need of a different query language or API to access the data in of data stores being used). a more comfortable manner. For instance, document-oriented The CoherentPaaS FP7 European project [6] is addressing data stores such as MongoDB offer an API that enables to the problems that companies are facing when they use Polyglot query semi-structure documents in a flexible and comfortable Persistence. This paper addresses the set of pains related to way. Sometimes it is due to efficiency as happens with graph the lack of transactional ACID properties within and across databases. Graph databases enable to perform traversals of a data stores in these Polyglot environments. CoherentPaaS is graph with a single invocation to the graph database what leveraging the ultra-scalable transactional technology from avoids a myriad of exchanges between the client and server LeanXcale [7] to bring scalable transactional processing to side needed in a relational database to make a graph traversal. Polyglot Persistence environments and to NoSQL data stores. Another reason why NoSQL solutions have been adopted is The rest of paper presents an overview of CoherentPaaS due to some of them satisfy the scalability requirements of (Section II), then transactions are presented in Section ??. some applications that traditional transactional SQL databases Finally, we present an overview on how transactions have fail to fulfill. been scaled and the modifications needed on the data stores This trend has resulted in a situation where many companies in order to provide transactional semantics and transactions now store their operational data in a diverse set of technologies across different data stores (Section IV). both SQL and NoSQL leading to the so-called Polyglot Persis- tence [5]. Companies with Polyglot Persistence environments 1http://hackingdistributed.com/2014/04/06/another-one-bites-the-dust- are facing now new challenges. The first one is that NoSQL flexcoin/ PaaS Management constrains the concurrency of updates. The issue is that read Network and write operations of concurrent transactions conflict that is, Monitoring Machine to Machine Applications Elastic Cloud Manager Deployer an arbitrary predicate can conflict with any concurrent update Cloud Applications Cloud Common Query Language Engine on the same table. When the predicate is not performed over Transactions NO SQL indexed columns the only way to guarantee serializability is Document Cloud DS Cloud Data Stores Cloud Data GraphDB Cloud DS by locking the whole table what prevents doing updates over Key-Value CDS Local Txn Mngs Like - the table concurrently with the predicate reads. SQL Column-oriented CDS Cloud Data Stores Cloud Data In-memory DB Concurrency Different isolation levels have been proposed. For instance, Controllers SQL scalable DB CEP Oriented the ANSI isolation levels enable to reduce the number of - CEP CEP CEP Commit Processing Sequencers Table Complex Event Complex Event Processing conflicts by reducing the level of isolation. Unfortunately, Loggers the ANSI isolation levels below serializability can exhibit LeanXcale Transaction Managemer serious anomalies in most applications that results in data Fig. 1. CoherenPaaS components inconsistencies. Fortunately, in 1995 a new isolation level was proposed called Snapshot Isolation (SI) [?] that is very close to serializability but, it avoids the conflicts between reads and II. COHERENTPAAS writes, therefore avoiding on one hand the bottleneck between CoherentPaaS project goal is to reduce the effort required predicate reads and updates, and on the other hand, avoiding to build and increase the quality of the cloud applications the interference between long read queries and updates that using multiple cloud data management technologies via a also result in highly constraining the potential concurrency single query language, a uniform programming model, and across transactions. ACID-based global transactional semantics. Figure 1 shows Snapshot isolation provides for any running transaction a the components of the transactional management system, photograph or snapshot of the committed state of the database LeanXcale, the common query language, CloudMdsQL [10], as of the start time of the transaction. In this way, it provides used to access all the data stores using a common program- a high level of isolation since a running transaction does not ming language, and the different types of data stores that are observe any changes for concurrent transactions that commit. integrated in CoherentPaaS. All types of data stores are being Snapshot isolation can be enforced by using two rules: the read considered in CoherentPaaS, SQL databases, columnar, graph rule and the write rule. The read rule ensures that a transaction data stores and document oriented data stores. observes all the updates of the transactions committed before the transactions starts. The write rule guarantees that no two III. TRANSACTIONS concurrent transactions can update the same data item and Transactions are the most convenient way to guarantee data commit. That is, at least one of them should be rollback. In consistency in the advent of failures and concurrent accesses. this way, in snapshot isolation only write-write conflicts can They provide the so-called ACID properties: Atomicity, Con- happen, but never read-write conflicts. There are two ways sistency, Isolation and Durability. Atomicity provides all-or- for implementing the write rule: first updater wins or first nothing semantics in the advent of failures. That is, in the committer wins. If there are two concurrent transactions trying advent of failures either all the updates in a transaction succeed to update the same item, the one that updates first the item will or the final effect is as none of them have been performed. commit if the first updater rule is implemented. The second Consistency is enforced by the application. Basically, a trans- updater will abort. If the first committer rule is implemented, action that starts with a consistent database should produce the transaction that commits in the first place is the one that a new consistent database,

Transactional Processing for Polyglot Persistence

Management of Polyglot Persistent Environments for Low Latency Data Access in Big Data

Nosql Distilled: a Brief Guide to the Emerging World of Polyglot Persistence

Polyglot Persistency an Adaptive Data Layer for Digital Systems Table of Contents

Multi-Model Database Management Systems - a Look Forward Zhen Hua Liu 1 , Jiaheng Lu2, Dieter Gawlick1, Heli Helskyaho2,3

Noxperanto: Crowdsourced Polyglot Persistence Antonio Maccioni, Orlando Cassano, Yongming Luo, Juan Castrejón, Genoveva Vargas-Solar

PDDM: a Database Design Method for Polyglot Persistence

A Big Data Roadmap for the DB2 Professional

Multi-Model Databases: a New Journey to Handle the Variety of Data

Nosql Distilled: a Brief Guide to the Emerging World of Polyglot

Polyglot Persistence: Handling Multiple Datastores

Polyglot Database Architectures = Polyglot Challenges

POLYGLOT PERSISTENCE for ADDRESSING DATA MANAGEMENT on the CLOUD Genoveva Vargas-Solar, CNRS, LIG-LAFMIA, Juan Carlos Castrejon, U