A Centralized Ledger Database for Universal Audit and Verification
Total Page:16
File Type:pdf, Size:1020Kb
LedgerDB: A Centralized Ledger Database for Universal Audit and Verification Xinying Yangy, Yuan Zhangy, Sheng Wangx, Benquan Yuy, Feifei Lix, Yize Liy, Wenyuan Yany yAnt Financial Services Group xAlibaba Group fxinying.yang,yuenzhang.zy,sh.wang,benquan.ybq,lifeifei,yize.lyz,[email protected] ABSTRACT certain consensus protocol (e.g., PoW [32], PBFT [14], Hon- The emergence of Blockchain has attracted widespread at- eyBadgerBFT [28]). Decentralization is a fundamental basis tention. However, we observe that in practice, many ap- for blockchain systems, including both permissionless (e.g., plications on permissioned blockchains do not benefit from Bitcoin, Ethereum [21]) and permissioned (e.g., Hyperledger the decentralized architecture. When decentralized architec- Fabric [6], Corda [11], Quorum [31]) systems. ture is used but not required, system performance is often A permissionless blockchain usually offers its cryptocur- restricted, resulting in low throughput, high latency, and rency to incentivize participants, which benefits from the significant storage overhead. Hence, we propose LedgerDB decentralized ecosystem. However, in permissioned block- on Alibaba Cloud, which is a centralized ledger database chains, it has not been shown that the decentralized archi- with tamper-evidence and non-repudiation features similar tecture is indispensable, although they have been adopted to blockchain, and provides strong auditability. LedgerDB in many scenarios (such as IP protection, supply chain, and has much higher throughput compared to blockchains. It merchandise provenance). Interestingly, many applications offers stronger auditability by adopting a TSA two-way peg deploy all their blockchain nodes on a BaaS (Blockchain- protocol, which prevents malicious behaviors from both users as-a-Service) environment maintained by a single service and service providers. LedgerDB supports verifiable data provider (e.g., Microsoft Azure [8] and Alibaba Cloud [4]). removals demanded by many real-world applications, which This instead leads to a centralized infrastructure. As there are able to remove obsolete records for storage saving and is still no widespread of truly decentralized deployments in hide records for regulatory purpose, without compromis- permissioned blockchain use cases, many advocates are veer- ing its verifiability. Experimental evaluation shows that ing back to re-examine the necessity of its decentralization. LedgerDB's throughput is 80× higher than state-of-the-art Gartner predicts that at least 20% of projects envisioned to permissioned blockchain (i.e., Hyperledger Fabric). Many run on permissioned blockchains will instead run on central- blockchain customers (e.g., IP protection and supply chain) ized auditable ledgers by 2021 [23]. on Alibaba Cloud have switched to LedgerDB for its high To reform decentralization in permissioned blockchains, throughput, low latency, strong auditability, and ease of use. we propose LedgerDB { a ledger database that provides tamper-evidence and non-repudiation features in a central- PVLDB Reference Format: ized manner, which realizes both strong auditability and Xinying Yang, Yuan Zhang, Sheng Wang, Benquan Yu, Feifei Li, high performance. We call it a centralized ledger database Yize Li, and Wenyuan Yan. LedgerDB: A Centralized Ledger (CLD) system, which pertains to centralized ledger technol- Database for Universal Audit and Verification. PVLDB, 13(12): ogy (CLT) that is opposed to decentralized ledger technology 3138-3151, 2020. DOI: https://doi.org/10.14778/3415478.3415540 (DLT) in blockchain. LedgerDB is developed and widely adopted at Ant Financial from Alibaba Group. Its major designs are inspired by limitations in conventional permis- 1. INTRODUCTION sioned blockchains [6, 26, 11, 31, 18, 29, 30] and ledger Since the launch of Bitcoin [32], blockchain has been in- databases [7, 40]. troduced to many applications as a new data collaboration First, we observe that many use cases of permissioned paradigm. It enables applications to be operated by mu- blockchains (e.g., provenance and notarization) only seek for tually distrusting participants. A blockchain system im- the tamper resistance from cryptography-protected struc- plements hash-entangled data blocks to provide a tamper- tures, such as hashed chains and Merkle trees [27]. Often- evident ledger with bulks of transactions committed by a times, none of the other features (e.g., decentralization and smart contract) are utilized. In this case, a simplified au- ditable ledger service from a credible central authority with This work is licensed under the Creative Commons Attribution- essential data manipulation support is sufficient. It tends NonCommercial-NoDerivatives 4.0 International License. To view a copy to be more appealing against blockchains, due to its high of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For throughput, low latency, and ease of use. any use beyond those covered by this license, obtain permission by emailing Second, the threat model of permissioned blockchain is [email protected]. Copyright is held by the owner/author(s). Publication rights inadequate for real-world audit. For external (i.e., third- licensed to the VLDB Endowment. party) auditors beyond its consortium, data replication guar- Proceedings of the VLDB Endowment, Vol. 13, No. 12 ISSN 2150-8097. ded by consensus protocols seems deceptive. For example, DOI: https://doi.org/10.14778/3415478.3415540 3138 Table 1: Key comparisons between LedgerDB and other systems. Throughput Auditability Removal Non-Repudiation Provenance System (max TPS) external third party peg capability purge occult server-side client-side native clue LedgerDB 100K+ 3 TSA 3 strong 3 3 3 3 3 QLDB [7] 1K+ 7 7 7 weak 7 7 7 7 7 Hyperledger [6] 1K+ 7 7 7 weak 7 7 3 3 7 ProvenDB [40] 10K+ 7 Bitcoin 3 medium 7 3 7 7 7 Factom [43] 10+ 3 Bitcoin 3 strong 7 7 3 3 7 with the endorsement from all peers, they are able to fabri- strong auditability, broad verification coverage, low stor- cate fake evidence (e.g., modify a transaction from ledger) age cost, as well as high throughput. against external auditors without being noticed. What is • To ensure strong auditability, we adopt non-repudiable worse, the partial endorsement from a majority is already signings and a TSA peg protocol to prevent malicious be- sufficient to conduct above deception, if others fail to provide haviors (defined in § 2.2) from users and LSP. original chain proofs to external auditor. As a consequence, • We provide verifiable data removal operators to resolve permissioned blockchains only retain limited auditability. storage overhead and regulatory issues. They help remove Third, though some centralized auditable ledger databases obsolete records (purge) or hide record content (occult) emerge as alternatives to permissoned blockchains, they how- without compromising verifiability. We also provide na- ever excessively assume a trustful service provider. For ex- tive primitives (clue) to facilitate the tracking of applicati- ample, AWS discloses QLDB [7, 13] that leverages an im- on-level data provenance. mutable and verifiable ledger; Oracle announces its blockcha- • To sustain high throughput for write-intensive scenarios, in table [35]. These auditable ledgers require that the ledger we enhance LedgerDB with tailored components, such as service provider (LSP) is fully trusted, such that historical an execute-commit-index transaction workflow, a batch records can be archived and verified at the server side. It re- accumulated Merkle-tree (bAMT ), and a clue index (cSL). mains unknown whether the returned records are tampered Our experimental result shows that LedgerDB achieves by LSP, especially when it colludes with a certain user. excellent throughput (i.e., 300,000 TPS), which is 80× Fourth, the immutability1 provided by existing systems and 30× higher than Hyperledger Fabric and QLDB re- implies that each piece of committed data is permanently spectively. disclosed on the ledger as the tamper evidence, which indeed The rest of this paper is organized as follows. We discuss causes prohibitive storage overhead and regulatory issues in limitations in current solutions and our design goals in § 2. real-world applications. Many blockchain users wish to clean We overview LedgerDB in § 3, and then present our detailed up obsolete records for storage savings, since it will other- designs (i.e., journal management, verification mechanism, wise become unaffordable as data volume grows. In terms and provenance support) in § 4, § 5 and § 6 respectively. of regulatory issues, those records violating law or morality Use cases and real-world practices are provided in § 7. We (e.g., upload someone's residential address without prior ap- provide experimental evaluation in § 8 and related work in proval) cannot be erased once it has been committed, which § 9, before we conclude in § 10. is undesirable in many applications. To address these issues, LedgerDB adopts a centralized ar- chitecture and a stateless journal storage model, rather than 2. PRELIMINARIES AND DESIGN GOALS tightly-coupled transaction state model in most blockchain We work on the design of a ledger database that supports systems [21, 30], to achieve high system throughput. It blockchain-like (especially permissioned blockchain) applica- adopts multi-granularity verification and non-repudiation pr- tions using a centralized architecture. It records immutable otocols, which prevents malicious behaviors from (colluded) application transactions