A Drop-In Middleware for Serializable DB Clustering Across Geo-Distributed Sites

A Drop-in Middleware for Serializable DB Clustering across Geo-distributed Sites Enrique Saurez1, Bharath Balasubramanian2, Richard Schlichting3 Brendan Tschaen2 Shankaranarayanan Puzhavakath Narayanan,2 Zhe Huang2, Umakishore Ramachandran1 Georgia Institute of Technology1 AT&T Labs - Research2 United States Naval Academy3 [email protected], [email protected], [email protected], [email protected], fsnarayanan, [email protected], [email protected] ABSTRACT formance needs of clients.1 However, many of these services Many geo-distributed services at web-scale companies still use databases (DBs) like MariaDB [54] and PostgreSQL [50] rely on databases (DBs) primarily optimized for single-site that are primarily optimized for single site deployments even performance. At AT&T this is exemplified by services in the when they have clustering solutions. For eample, in Mari- network control plane that rely on third-party software that aDB Galera [28] synchronous clustering [29] all replicas are uses DBs like MariaDB and PostgreSQL, which do not pro- updated on each commit, which is prohibitively expensive vide strict serializability across sites without a significant across sites with WAN latencies on the order of hundreds performance impact. Moreover, it is often impractical for of milliseconds. Similarly, in PostgreSQL master-slave [49] these services to re-purpose their code to use newer DBs op- clustering, requests from all sites are sent to a single master timized for geo-distribution. In this paper, a novel drop-in replica, compromising on performance and availability. solution for DB clustering across sites called Metric is pre- Although new geo-distributed DBs have been developed sented that can be used by services without changing a single that improve the performance of cross-site transactionality line of code. Metric leverages the single-site performance of (e.g., Spanner [19], CockroachDB [17], TiDB [48]), it is often an existing service's DB and combines it with a cross-site impractical to re-purpose the code of services tied to spe- clustering solution based on an entry-consistent redo log cific DBs to use these new solutions. This is especially true that is specifically tailored for geo-distribution. Detailed when the existing service involves third-party software. For correctness arguments are presented and extensive evalu- example, AT&T's multi-site Service Orchestrator (SO [42]) ations with various benchmarks show that Metric outper- that deploys complex virtual network functions (VNFs) re- forms other solutions for the access patterns in our produc- lies on a third-party business process engine Camunda [8] tion use-cases where service replicas access different tables that maintains state in MariaDB. Similarly, AT&T's Data on different sites. In particular, Metric achieves up to 56% Collection, Analytics and Events service (DCAE [21]) relies on a third-party tool called Cloudify [16] that uses Post- less latency and 5.2x higher throughput than MariaDB and 2 PostgreSQL clustering, and up to 90% less latency and 26x greSQL. While middleware for DB clustering does exist, it higher throughput than CockroachDB and TiDB, systems does not provide multi-master strict serializibility [15, 34] that are designed to support geo-distribution. and/or requires extensive annotation of service code [30]. In this paper, we present a novel solution called Metric PVLDB Reference Format: that serves as a replacement for existing DB clustering so- Enrique Saurez, Bharath Balasubramanian, Richard Schlichting, lutions. The primary challenge in designing such a system Brendan Tschaen, Shankaranarayanan Puzhavakath Narayanan, is to satisfy all of the following goals simultaneously: Zhe Huang, Umakishore Ramachandran. A Drop-in Middle- ware for Serializable DB Clustering across Geo-distributed Sites. • Require no changes or annotations of the service code PVLDB, 13(12): 3340-3353, 2020. or its DB aside from turning off the latter's default DOI: https://doi.org/10.14778/3415478.3415555 clustering solution across sites; in other words, it 1. INTRODUCTION should serve as a drop-in solution. • Provide the service or user of the middleware the ab- Services built by AT&T and other web-scale companies straction of a replicated multi-master DB across sites, such as Google and Amazon are often deployed across geo- where all replicas can concurrently process requests. distributed sites to satisfy the locality, availability, and per- • Guarantee that all transactions are strictly serializable [47, 33]. This work is licensed under the Creative Commons Attribution- • Build a system that outperforms a service's existing NonCommercial-NoDerivatives 4.0 International License. To view a copy DB clustering solution, in terms of the end-to-end la- of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For tency for transaction execution and throughput. any use beyond those covered by this license, obtain permission by emailing 1 [email protected]. Copyright is held by the owner/author(s). Publication rights A site is a data center at a physical location connected with other sites through a wide-area-network (WAN). licensed to the VLDB Endowment. 2 Proceedings of the VLDB Endowment, Vol. 13, No. 12 Both SO and the DCAE are part of AT&T's network con- ISSN 2150-8097. trol plane, which is open-sourced through the Open Network DOI: https://doi.org/10.14778/3415478.3415555 Automation Platform (ONAP) effort [44]. micro-benchmarks, use-case workloads, and TPC-C workloads. For the access patterns described above where service replicas access different tables on different sites, Metric achieves up to 56% less latency and 5.2x higher throughput than MariaDB's Galera synchronous clustering solution and PostgreSQL's master-slave clustering solution. Metric also outperforms DBs optimized for geo-distribution on the same access patterns, and achieves up to 90% less latency and 26.2x higher throughput than CockroachDB and TiDB. We also evaluated Metric for access patterns that devi- Figure 1: Overview of a Metric deployment where in- ate from the expected workload, where service replicas fre- stances of Metric and the SQL DB are deployed on each site quently access the same tables across sites. As expected, with a geo-distributed redo log deployed across the sites. Metric's performance drops relative to the other solutions Service replicas issue requests to the Metric process closest mentioned above, which demonstrate up to 90% less latency to them. and 32x higher throughput than Metric. We present several mitigation strategies in §9 as future work. • Support new DBs easily with minimal additions or In summary, this paper makes these contributions: changes to the middleware. • A novel approach to providing drop-in DB clustering Metric achieves these goals through a novel design that across sites supported by detailed correctness argu- leverages the single-site guarantees of the service's existing ments showing strict serializability (§3, §4). DB coupled with the use of an entry-consistent (EC) key- • An implementation [25] with clustering support for value store [5, 3] to maintain a geo-distributed redo log of MariaDB and PostgreSQL that is being deployed in DB records. The EC store provides critical functionality in production for multiple use-cases and that is open- the form of fault-tolerant lock-based critical sections that sourced through ONAP (§5, §6). are used by Metric to obtain table-level locks that guaran- • Experimental results validating Metric's effectiveness tee exclusive access to the latest values of records accessed (§7). by a given transaction. Figure 1 illustrates this approach, where a Metric process executes the operations in a trans- A previous Metric workshop paper oriented towards edge action locally on a DB, with just one round-trip per trans- use-cases [57] presents some of the initial design ideas related action across sites to commit modified records in the EC to the system. These include how the ownership API can log. This is operationally not only much more efficient than be exposed to a client or service, and an approach for guar- anteeing transactionality only to the owner of certain tables both MariaDB synchronous and PostgreSQL master-slave 3 clustering, but is also similar to optimized geo-distributed in the DB. While we retain the name for legacy reasons, DBs [19, 17], despite the drop-in nature of the Metric so- this paper significantly extends these concepts to encom- lution. Further, Metric makes effective use of the EC pass strict serializability guarantees, and presents complete store's higher level abstraction of critical sections that han- correctness arguments, details of an implementation, and an dle failures to build a redo log. The above-mentioned geo- experimental evaluation. distributed DBs design their redo log from first principles| 2. ARCHITECTURE AND OVERVIEW a complex error prone process, especially considering the Architecture. Metric provides the abstraction of a repli- wider array of failures in geo-distributed systems. cated geo-distributed DB that can be accessed by applica- A key aspect of Metric's drop-in solution is that it sup- tions implementing higher-level services. As shown in Fig- ports general SQL queries and parses the query to determine ure 1, each application is generally composed of multiple automatically the potentially impacted tables over which to service replicas that are hosted on different sites for locality, acquire table-level locks. For our use-cases such as DCAE availability, and fault-tolerance. Metric itself is also geo- and SO, transactions are naturally partitioned across ser- distributed, with one replica per site. Service replicas sub- vice replicas and have no overlap in the DB records they mit transactions to the closest Metric process, usually at the access, meaning that a given table is usually accessed only same site and often on the same machine. A Metric process by processes within a specific site. For example, SO replicas is in turn associated with an instance of a SQL DB, referred typically deploy different VNFs, with each replica modify- to as the Metric process's local DB.

A Drop-In Middleware for Serializable DB Clustering Across Geo-Distributed Sites

Concurrency Control Basics

Cache Serializability: Reducing Inconsistency in Edge Transactions

A Theory of Global Concurrency Control in Multidatabase Systems

Chapter 14: Concurrency Control

Analysis and Comparison of Concurrency Control Techniques

Where We Are Snapshot Isolation Snapshot Isolation

An Evaluation of Distributed Concurrency Control

Relative Serializability: an Approach for Relaxing the Atomicityoftransactions

Serializability and Heterogeneous Trust from Two Phase Commit to Blockchains

4. Distributed Concurrency Control 4

Serializable Snapshot Isolation for Replicated Databases in High-Update Scenarios

Postgresql Concurrency Issues 1