Cloud-Native Database Systems at Alibaba: Opportunities and Challenges

Cloud-Native Database Systems at Alibaba: Opportunities and Challenges Feifei Li Alibaba Group [email protected] ABSTRACT 1.25 125 Response time 1.00 100 Cloud-native databases become increasingly important for Normalized TPS the era of cloud computing, due to the needs for elasticity 0.75 75 and on-demand usage by various applications. These challenges from cloud applications present new opportunities 0.50 50 for cloud-native databases that cannot be fully addressed 0.25 25 Normalized TPS by traditional on-premise enterprise database systems. A Response time (ms) 0 0 cloud-native database leverages software-hardware co-design 23:30 23:40 23:50 00:00 00:10 00:20 00:30 to explore accelerations offered by new hardware such as Time RDMA, NVM, kernel bypassing protocols such as DPDK. Meanwhile, new design architectures, such as shared stor- Figure 1: More than 100× increase in TPS along age, enable a cloud-native database to decouple computa- with stable response time observed during Alibaba's tion from storage and provide excellent elasticity. For highly Singles' Day Shopping Festival in 2018. concurrent workloads that require horizontal scalability, a cloud-native database can leverage a shared-nothing layer to 1. INTRODUCTION provide distributed query and transaction processing. Ap- With more and more applications and systems moving to plications also require cloud-native databases to offer high the cloud, cloud-native database systems start to gain wide availability through distributed consensus protocols. support and popularity. Cloud database services provided At Alibaba, we have explored a suite of technologies to de- by cloud service vendors, such as AWS, Microsoft Azure, sign cloud-native database systems. Our storage engine, X- Alibaba Cloud, and Google Cloud have contributed to the Engine and PolarFS, improves both write and read through- development of cloud-native databases. As a result, in re- puts by using a LSM-tree design and self-adapted sepa- cent years the market share of cloud databases has been ration of hot and cold data records. Based on these ef- growing rapidly. More and more enterprises and organiza- forts, we have designed and implemented POLARDB and tions have migrated their businesses from on-premise data its distributed version POLARDB-X, which has successfully centers to the cloud. The cloud platforms provide high elas- supported the extreme transaction workloads during the ticity, stringent service-level agreement (SLA) to ensure re- 2018 Global Shopping Festival on November 11, 2018, and liability, and easy manageability with reduced operational achieved commercial success on Alibaba Cloud. We have cost. The cloud databases play a key role in supporting also designed an OLAP system called AnalyticDB (ADB in cloud-based businesses. It becomes the central hub that short) for enabling real-time interactive data analytics for connects underlying resources (IaaS) to various applications big data. We have explored a self-driving database platform (SaaS), making it a key system for the cloud. to achieve autoscaling and intelligent database management. At the Alibaba group, database systems need to support We will report key technologies and lessons learned to high- a rich and complex business ecosystem that spans over en- light the technical challenges and opportunities for cloud- tertainment and digital media, e-commerce and e-payment, native database systems at Alibaba. and various new retail and o2o (offline to online) business PVLDB Reference Format: operations. During the 2018 Singles' Day Global Shopping Feifei Li. Cloud-Native Database Systems at Alibaba: Opportu- Festival (Nov. 11, 2018), Alibaba's databases process up to nities and Challenges. PVLDB, 12(12): 2263 - 2272, 2019. 491,000 sales transactions per second, which translates to DOI: https://doi.org/10.14778/3352063.3352141 more than 70 million transactions per second. The traditional on-premise deployment of databases are not able to This work is licensed under the Creative Commons Attribution- catch up with the complexity of such business operations, NonCommercial-NoDerivatives 4.0 International License. To view a copy due to the needs for elasticity, scalability, and manageability. of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For For example, as shown in Figure 1, the TPS is suddenly in- any use beyond those covered by this license, obtain permission by emailing creased in the first second of Singles' Day Shopping Festival, [email protected]. Copyright is held by the owner/author(s). Publication rights which is about 122 times higher than that of the previous licensed to the VLDB Endowment. second. When we simply deploy a MySQL or PostgreSQL on Proceedings of the VLDB Endowment, Vol. 12, No. 12 ISSN 2150-8097. a cloud instance store with a local SSD and a high I/O VM, DOI: https://doi.org/10.14778/3352063.3352141 the resulting database instance has limited capacity that is 2263 三种不同架构 not suitable for providing scalable database services. It can- Single Shared Everything not survive underlying disk drive failures; and the database Instance Shared Nothing instance has to manage data replication for reliability. In ad- Network Network Network dition, the instance uses a general-purpose file system, such as ext4 or XFS. When using low I/O latency hardware like DB DB DB DB DB DB DB DB RDMA or PCIe SSD, the message-passing cost between kernel space and user space may quickly saturate the through- Local disk Shared storage put. In contrast, databases with a cloud-native design (such Disk Disk Disk Disk as decouple of computation and storage, autoscaling) are Scale up Scale out more appealing, which are able to provision more compute and storage capacity, and provide faster recovery and lower Figure 2: Different database system architectures. cost [30, 7]. There are also other essential capacities that are of criti- chitecture facilitates and eases intra-system communication cal importance for cloud-native databases: multi-model that and coordination. supports heterogeneous data sources and diverse query in- With the rapid growth in both amounts of data and peak terfaces; autonomy and intelligence that automatically man- workloads of those encountered by giant Internet enterprises, ages and tunes database instances to reduce the cost of such as that in Google, Amazon, Microsoft and Alibaba, manual operations; software-hardware co-design that lever- it has been observed that the single-instance architecture ages the advantages of high-performance hardware; and high has inherent limitations. The capacity of a single machine availability that meets stringent SLA needs (e.g., RPO=0 fails to meet ever-increasing business demands. Therefore, with very small RTO). With these designs in mind, cloud- the shared storage architecture was proposed, represented native databases have gained rapid growth for cloud-based by AWS Aurora [30] and Alibaba POLARDB [5]. In this deployment. model, the underlying storage layer (usually consists of mul- In this paper, we report the recent progress in build- tiple nodes) is decoupled and each data record in storage ing cloud-native enterprise databases on Alibaba Cloud [1], can be accessed by any upper database kernels running on which also support the entire business operations within the any node. By exploiting a fast network such as RDMA, a Alibaba group from its various business units (from enter- database can interact with the shared distributed storage tainment to e-commerce to logistics). To cover a wide va- layer the same way as with a single (shared) local disk. On riety of application needs, we have provided a broad spec- top of this shared storage, we can easily launch multiple trum of database systems and tools as shown in Figure 3. In compute nodes to create replicas of a single database, hav- particular, we have developed POLARDB, a shared-storage ing the identical view on the same data. Therefore, requests OLTP database that provisions 100TB of storage capacity can be distributed to different (read-only) nodes for parallel and 1 million QPS per processing node. To further scale processing. However, to avoid write conflicts and to avoid out the capacity, we have developed POLARDB-X, a dis- the complexity of dealing with distributed transaction pro- tributed OLTP database that integrates shared-nothing and cessing and distributed commits, there is usually a single shared-storage designs. We have also developed Analyt- node that processes all write requests (e.g., INSERT, UPDATE, icDB, an OLAP database as a next-generation data ware- DELETE) to a database. This architecture enables dynamic house for high-concurrency, low-latency, and real-time ana- adjustment of query capacity on demand by changing the lytical queries at PB scale. To manage numerous database number of read-only nodes. It is also feasible to enable instances hosted on our cloud, we have built SDDP, an au- writes to multiple nodes (i.e., multi-master) to expand write tonomous database operation platform that automatically capacity, but usually requires complex concurrency control manages instances and tunes performance with minimal DBA mechanisms and consensus protocols [6, 12, 16, 17, 23, 34]. involvement. There are nearly half a million database in- The shared-storage architecture also has its own limita- stances running on Alibaba Cloud (from both our cloud tions. First, low-latency data transmission cannot be always customers and various business units within Alibaba group). guaranteed between compute and storage nodes. For those Both POLARDB and AnalyticDB have gained rapid growth messages transmitted cross switches, data centers or even re- in usage from a broad range of business sectors, including gions, the transmission time will be significantly amplified, e-commerce, finance, media and entertainment, education, especially when local RDMA network is used. Second, the new retail, and others. These cloud database systems and number of read-only nodes supported for a single database is technologies have successfully served both the complex busi- limited.

Load more