chainifyDB: How to get rid of your and use your DBMS instead

∗ Felix Schuhknecht Ankur Sharma Jens Dittrich JGU Mainz Saarland Informatics Campus Saarland Informatics Campus schuhknecht@uni- [email protected] jens.dittrich@uni- mainz.de saarland.de saarland.de Divya Agrawal* Celonis GmbH [email protected]

ABSTRACT blockchain and the underlying consensus among the peers. [8] Today’s permissioned blockchain systems come in a stand- Many descriptions of “blockchain-technology” read as if alone fashion and require the users to integrate yet another they describe an unexplored data management planet in a full-fledged system into their already far distant galaxy. To bring things back to earth, in this pa- complex data management landscape. This seems odd as per, we will take an old researcher’s attitude: What and traditional DBMSs share considerable parts if all of this could be realized using our good old relational of their processing stack. Therefore, rather than replacing database systems? With this in mind, we can translate the the established infrastructure, we advocate to “chainify” ex- above foreign language into the following: We form a net- isting DBMSs by installing a lightweight blockchain layer work of DBMSs, where each DBMS keeps a replica of the on top. Unfortunately, this task is challenging: Users might database. When executing transactions, we ensure that all have different black-box DBMSs in , which poten- change in the same way. If a database deviates tially behave differently. To deal with such setups, we intro- for whatever reason, we try to recover it. As we build upon duce a new processing model called Whatever-Voting (WV), established DBMSs, we can easily integrate our layer into pronounced [weave]. It allows us to create a highly flexible an existing IT-landscape and build upon the powerful fea- permissioned blockchain layer coined chainifyDB that (a) is tures of these systems: SQL, the relational model, and high centered around bullet-proof database technology, (b) can transaction processing performance. be easily integrated into an existing heterogeneous database landscape, (c) is able to recover deviating organizations, Feature DBMS Blockchain chainifyDB and (d) adds only up to 8.5% of overhead on the under- Well-integrated in IT- X × X landscape lying database systems while providing an up to 6x higher Convenience and accessibility × throughput than Hyperledger Fabric. X X Robustness via recovery X × X High performance X × X 1. INTRODUCTION Sync. in untrusted environ- × X X ments A blockchain can be defined as an immutable ledger for recording transactions, maintained within a distributed net- 1: chainifyDB combines the best of both worlds. work of mutually untrusting peers. The peers execute a con- sensus protocol to validate transactions, group them into blocks, and build a hash chain over the blocks. Blockchains Thus, instead of reinventing the wheel, we advocate to may execute arbitrary, programmable transaction logic in the simply combine the best of both worlds. Table1 lists the form of smart contracts. A smart contract functions as a database features we build upon as well as the blockchain trusted distributed application and gains its security from the features we add by chainifying our good old relational DBMSs. ∗Work done while being employed at Saarland Informatics Campus. 1.1 Permissioned Blockchain Systems (PBS) Blockchain systems are required in situations, where mul- tiple organizations, that potentially distrust each other, in- dependently maintain a shared state. This situation is for example present in the medical area: When multiple doc- This article is published under a Creative Commons Attribution License tors treat a patient, they independently keep a patient file (http://creativecommons.org/licenses/by/3.0/), which permits distribution and record their treatments therein. Without a blockchain and reproduction in any medium as well allowing derivative works, pro- system, the following problems can arise in such a situation: vided that you attribute the original work to the author(s) and CIDR 2021. 11th Annual Conference on Innovative Data Systems Research (CIDR ‘21) Missing data synchronization. Each of the doctors has January 10-13, 2021, Chaminade, USA. their own local version of the patient data, without prop- erly synchronizing it with the data of all other doctors in 3. Built on established DBMS-Technology. chainifyDB charge. Thus, differences in form of errors or missing data reuses the already existing execution engine and query can appear over time. language of the underlying DBMS. If the DBMS is a Missing transaction synchronization. Each of the doctors SQL-based relational system, then chainifyDB com- treats the patient independently, without synchronizing his or municates via SQL as well. her intended treatment with the plans of the other doctors. This renders their treatment in the best case inefficient, in 4. Robustness via recovery. chainifyDB provides efficient the worst case harmful. mechanisms to recover from any form of state devia- A so called permissioned blockchain system (PBS) seems tion. Deviation can be caused by crashes and hardware to provide exactly what the doctors need: a way of sharing problems, but also by malicious and/or non-agreeing and managing data across a of known but independent behavior. organizations. However, if this is the case, why don’t we see 5. Extensive experimental evaluation of chainifyDB. In permissioned blockchains everywhere? The following num- comparison with the comparable state-of-the-art per- bers show how severe the need is: 86% of treatment errors missioned blockchain systems Fabric [8] and Fab- are actually caused by missing or faulty in pa- ric++ [23], chainifyDB achieves an up to 6x higher tient files [2]. The cost caused by double treatment and throughput. We show that chainifyDB is able to fully redundant information is estimated at 1950 USD per hos- utilize the performance of the underlying database sys- pital patient [2]. In total, 210 billion USD per year go to tems and demonstrate its recovery capabilities experi- unnecessary or needlessly expensive care [4] in the US. mentally. The reason lies in that although PBSs solve these two problems, they actually introduce at the same time various new problems: 2. WHATEVER-VOTING MODEL First of all, the typical PBS comes as yet another stand- In summary, we “chainify” existing data management in- alone system. As such, it is extremely hard to integrate into frastructures. This creates a set of interesting challenges: As an existing data management landscape. E.g. our doctors mentioned, we have to assume that different DBMSs come have a large database of patient files already, which must be together in a single heterogeneous network. Although these migrated to the new blockchain system. Furthermore, PBSs systems all understand SQL, they might execute the very such as Fabric are neither based on SQL nor do they use a same transaction slightly differently. Thus, we cannot rely relational engine underneath. The entire system is designed on that each organization executes every transaction in ex- from the point of of a distributed system rather from actly the same way. Further, we have to treat the DBMSs the point of view of a database system. This is very unfortu- as complete black-boxes, which cannot be adapted in any nate as like that many technologies are either solved solved way. Connected to this, we cannot rely on that the underly- unsatisfactorily or get reinvented on the way. For instance, ing storage and execution engine provides features tailored recent work has shown that Fabric can basically be con- towards our system. This is drastically different to classi- sidered a distributed version of MVCC [23]. Additionally, cal PBSs. For example, Fabric requires the computation of PBSs such as Hyperledger Fabric do not provide any recov- read/write-sets as a side-product of transaction execution — ery mechanism. This is really unfortunate as this implies something, a black-box DBMS does not provide. Addition- that if any node deviates, it implicitly leaves the network ally, we have to assume, that any component of the system forever. can potentially behave in a malicious way. To address these challenges, we argue that the consensus- 1.2 Contributions phase must be pushed towards the end of the processing As a consequence of these observations, we introduce pipeline. Instead of reaching consensus only on which trans- chainifyDB, which solves all aforementioned problems. We action to execute, we advocate to reach consensus on the ac- make the following contributions: tual state changes performed by the transactions. By this, we are able to detect any deviating behavior that might 1. Whatever-Voting model (WV). Instead of reaching con- have happened during execution. In fact, this enables the sensus on the order of transactions before the execu- organizations to implement whatever they want there. The tion, our WV-model reaches consensus on the state result is a new, highly flexible processing model we call change generated by the execution. We do so by using the Whatever-Voting model (WV ). It has the following two an pluggable voting-based algorithm [18]. This allows phases: us to form a private blockchain network out of poten- tially different DBMSs. In particular, it allows us to (1) Whatever-phase. Each organization does whatever it treat these DBMSs in a completely black-box fashion, deems necessary to pass the V-phase later on. As a i.e. we do not have to access or modify the code of the side-product, each organization produces a digest of used system in any way. its behavior of the W-phase. 2. Synchronized Transactions without a middleman. (2) Voting-phase. We perform a voting round on the di- chainifyDB keeps local databases, which are main- gests of the W-phase to check whether agreement can tained by individual organizations, synchronized. Al- be reached on them with respect to an agreement pol- though these organizations potentially do not trust icy. Only if agreement is reached, the state changes are each other, we ensure that transaction processing is committed to a ledger by the individual organizations. reflected equally across all databases in the network. If an organization is non-agreeing, it can try to recover. chainifyDB achieves this without introducing a mid- If no agreement is reached at all, all organizations can dleman. try to recover. This WV model allows us to design and implement our Algorithm 1 V-phase highly flexible permissioned blockchain system: chainifyDB. 1: procedure V(Digest D1,...,Digest Dk, Agreement It supports different transaction processing systems across Policy P = {C1,C2, ...}) organizations to build a heterogeneous blockchain network. 2: for each Combination C in P do Note that WV is also more powerful than classical 2PC or 3: boolean agreement = true 3PC-style protocols in the sense that they still assume deter- 4: Digest Dcons = DC.p1 ministic behavior of the organizations without looking back 5: for Digest D = DC.p2 to DC.pl do at the performed state changes. In WV, we simply do not 6: if Dcons 6= F then care about whether organizations claim to be prepared for 7: agreement = false a transaction and what they claim to . In contrast, 8: break we solely look at the outcome. In summary, we measure the 9: if agreement then return Dagreed outcome rather than the promises. 10: return no agreement reached Figure1 presents a visualization of the entire model. Let

W-phase V-phase D 2.1 Whatever Recovery t 1,t + 1 … t W 1 no As described, our model allows us to treat the W-phase agreement … V(D1,t, . . . , Dk,t, c) as a complete blackbox. This is possible as the proceeding t t D + 1 … W k,t is only determined by whether an agreement is reached on k the produced digests of the W-phase and not by its inter-

Dagreed,t nal behavior. However, if we have no information about the Dagreed t = Dk t ? , , internal behavior of the W-phase available, then we have D D agreed,t = 1,t ? Ledger very limited options in recovering from the situation that the system as a whole or an individual organization cannot Figure 1: Round-based processing of the WV model. proceed to the next round. To tackle this problem, we al- low the W-phase to optionally expose information about its O1,...,Ok be k independent organizations in the network. internal behavior to enable a more sophisticated recovery. They process one transaction per round. Currently, we are W-phase V-phase in round t. W1 D t t T 1,t + 1 … S By default, our model treats the W-phase as a blackbox 1,t−1 S1,t no agreement

and makes no assumptions on its internal behavior. The … V(D1,t, . . . , Dk,t, c) only requirement is that as a side-product of executing the t t Wk D + 1 … T k,t

t S S

, k,t−1 k,t ′ W-phase Wl of organization Ol in round t, a digest Dl,t k must be produced. The idea of this digest is to externally D = Dagreed,t t Checkpoint ,

d every 2. round: S S S represent the impact of the changes that happened in the e k,t−4 , k,t−2 , k,t e r g

W-phase. For example, the digest could be a cryptographic a Ledger R D k Dk′ t T T Dagreed,t ≠ Dk,t hash-code of all updates that were performed. The k di- , S S S k′,t k,t−1 k,t−2 gests D1,t to Dk,t are now passed to the V-phase to check Dagreed t = D t whether an agreement can be reached on them. Our V-phase , 1, supports the usage of arbitrary vote-based mechanisms or consensus mechanisms, such as Paxos oder PBFT, depend- Figure 2: Checkpoint-based recovery of organization Ok. The state is first restored from the most recent checkpoint Sk,t−2 ing on the required guarantees. To showcase our system, we 0 earlier than t and then replayed till round t, resulting in Sk,t. use a lightweight vote-based algorithm [18]. We believe this 0 allows for a fair comparison with Fabric, which relies on a If the new digest Dl,t now matches Dagreed,t, recovery was trusted ordering service. successful. Additionally to the digests, an agreement policy P is For example, we could have the very valuable informa- passed to the V-phase. It specifies, which organizations are tion available, that the W-phase maintains and modifies a expected to agree on their digests. If the digests are equal local database. Thus, the database is capturing the current for at least one of the combinations {C ,C , ...}, then agree- 1 2 state S of organization O after round t. Having access to ment is reached. For example, P = {{1, 2}, {1, 3}} specifies l,t l the state enables recovery via checkpointing, as visualized in that either the organizations O and O or organizations O 1 2 1 Figure2: Let us assume that agreement on the digest D and O must have the same digest to reach agreement. The cons,t 3 was reached, but the digest D of organization O is dif- logic of the V-phase is shown in Algorithm1. It essentially k,t k ferent. If organization O creates a checkpoint every second tests the passed combinations one by one and tries to reach k round in form of a snapshot of the state, then it can try to agreement on one combination. recover by replaying the two intermediate transactions on If no agreement is reached, then no one is allowed to start the most recent checkpoint before t, namely S . This the next round. In such a case, all organizations try to re- k,t−2 leads to a potentially new state S0 with a corresponding cover. If agreement on a digest is reached, it is returned k,t new digest D0 . If now D0 = D , then we recovered as D . Now, each organization O , whose digests k,t k,t agreed,t agreed,t l successfully. equals the agreement digest (Dl,t = Dagreed,t) is allowed to proceed to the next round t + 1. If the locally computed digest differs from the agreement digest, then the organiza- 3. CHAINIFY DB tion must not proceed but can try to recover, as described In this Section, we present chainifyDB and how it instan- in the next Section. tiates the WV model. As described in the introduction, the core idea of chainifyDB is to equip established infras- block 42 time tructures, which already consist of several database man- Executed Executed Executed agement systems, with blockchain functionality as a layer T1 T2 T3 on top. The challenge is that these infrastructures can be initial state: after update on pk 4: after delete on pk 3: after insert on pk 9: highly heterogeneous, i.e. every participant potentially runs Foo Foo Foo Foo PK A B C PK A B C PK A B C PK A B C a different DBMS where each system can interpret a cer- 1 43 q 3.4 1 43 q 3.4 1 43 q 3.4 1 43 q 3.4 tain transaction differently. As a result, the digests across 2 67 b 1.2 2 67 b 1.2 2 67 b 1.2 2 67 b 1.2 3 88 k 7.8 3 88 k 7.8 4 42 z 7.9 4 42 z 7.9 participants might differ. Note that per round, chainifyDB 4 87 i 5.6 4 42 z 7.9 5 12 q 3.2 5 12 q 3.2 does not process only a single transaction, but a batch of 5 12 q 3.2 5 12 q 3.2 6 16 c 1.9 6 16 c 1.9 transactions. This is a performance optimization in order to 6 16 c 1.9 6 16 c 1.9 9 11 c 3.3 Trigger Trigger Trigger reduce the amount of overall network communication. Foo_Digest Foo_Digest Foo_Digest Foo_Digest The W-phase of chainifyDB looks as follows: First, it P serial hash T P serial hash T P serial hash T P serial hash T 4 0 3B50 U 4 0 3B50 U 4 0 3B50 U 4 0 3B50 U takes a batch of proposed transactions and orders them in 2 0 F68D U 2 0 F68D U 3 0 B2F9 D 3 0 B2F9 D a TransactionBlock. Precisely, we want to establish an order 3 0 B2F9 D 3 0 B2F9 D 9 0 0AE I 9 0 0AE I 9 0 0AE I F that is globally valid, i.e. all organizations should receive the 2 1 E5AF U 2 1 E5AF U A A same order of transactions. The simplest way of implement- Figure 3: Logical -wise per block digest computation. ing this is to host a dedicated orderer for this task. This (untrusted) orderer forms a TransactionBlock of proposed transactions, which is then distributed to all other organi- ters the V-phase to determine whether agreement can be zations for execution. Of course, a malicious orderer could reached. As mentioned before, we use a lightweight voting manipulate transactions or send different blocks to different algorithm [18] in the V-phase of chainifyDB. Other consen- organizations. However, this is not an issues in our WV- sus mechanisms, like Paxos or PBFT, could be applied here model. In the V-phase, we are able to catch all malicious as well. or deviating behavior, that might have happened earlier in the W-phase. This is in strong contrast to the other mod- 3.2 Logical Checkpointing and Recovery els, which has to rely on the orderer being trustworthy or If, after the V-phase, an organization is non-agreeing or no at least byzantine fault tolerant. Each transaction of the agreement is reached at all, a checkpointing-based recovery TransactionBlock is now executed against the local relational of the organization respectively all organizations happens. database. Obviously, this execution potentially updates the In chainifyDB, we create a checkpoint by creating a snap- database. As a side-product of execution, we produce the shot of the database after every k blocks, where k is con- digest Dl,t, as described in the following. figurable. The snapshot is created for all tables that were changed since the last consistent snapshot. Snapshots are 3.1 From Digests to LedgerBlocks created on the SQL-level through either a non-maintained In chainifyDB we assume SQL-99 compliant relational or by a CREATE TABLE command. Creat- DBMSs to keep the state at each organization. Thus, we ing such a snapshot is surprisingly fast: Snapshotting the can utilize SQL 99 triggers to realize a vendor-independent accounts table with 1,000,000 rows, which is part of the digest versioning mechanism, that specifically versions the Smallbank [5] benchmark used in our experiment evalua- data of chainifyDB in form of a digest table. The digest ta- tion, only takes 827ms in total on our machine of type 2 ble is computed per TransactionBlock. We instrument every (see Section4) running PostgreSQL. shared table in our system with a trigger mechanism to au- Figure4 shows an organization switching to recovery tomatically populate this digest as follows: for every tuple mode, as the digest of block 46 of this organization is non- changed by an INSERT, UPDATE, and DELETE-statement, we cre- agreeing. It recovers by restoring the consistent snapshot ate a corresponding digest tuple. It captures the Foo_block_44 and by replaying the transaction history, until of the changed tuple, a counter of the change, the hash as the it reaches a consistent block 46. If it would not be able to digest of the values of the tuple after it was changed (for a recover from this checkpoint, it would try to recover from an delete: its state before removing it from the table), and the older available checkpoint. In general, we go back as many type of change that was performed. Figure3 shows how to snapshots as are maintained by the system. process a block of an update, delete, and insert transaction. Although the digest table captures all changes done to a 3.3 Parallel Transaction Execution table by the last TransactionBlock, it does not represent So far, we did not specify how the W-phase is actually the digest yet. The actual digest is represented in form running transactions in the underlying DBMS. Naively, we of a so called LedgerBlock, which consists of the follow- could simply execute all transactions of a block one by one in ing fields: (1) TA_list: The list of transactions that were a sequential fashion. However, this strategy is highly ineffi- part of this block. Each transaction is represented by its cient, if the underlying system is able to execute transactions SQL code. (2) TA_successful: A bitlist flagging the success- in parallel. This leads us to an alternative strategy, where fully executed transactions. (3) hash_digest: A hash of the we could submit all transactions of a block to the underly- logical contents of the digest table. In our case, this is a ing (hopefully parallel) database system in one batch and SHA256 hash over the hash-values present in the digest ta- let it execute them concurrently. While this strategy lever- ble. (4) hash_previous: A hash value of the entire contents ages the performance of the underlying system, it creates of the previous LedgerBlock appended to the ledger, again in another problem: it is very likely that every DBMS sched- form of a SHA256 hash. This backward chaining of hashes ules the same batch of transactions differently for parallel allows anyone to verify the integrity of the ledger. execution, eventually leading to non-agreement. This LedgerBlock leaves the W-phase as digest and en- The strategy we apply in chainifyDB sits right between processing processing processing processing processing recover processing processing block 42 block 43 block 44 block 45 block 46 from local block 45 block 46 snapshot checkpoint 41 checkpoint 44 block 42 block 43 block 44 block 45 block 46 block 45 block 46

TA1 TA1 TA1 TA1 TA1 TA1 TA1

TA2 TA2 TA2 TA2 TA2 TA2 TA2 Foo Foo Foo TA3 TA3 TA3 TA3 TA3 TA3 TA3 PK A B C PK A B C PK A B C 1 5 q 1.3 1 5 q 1.3 1 5 q 1.3 2 8 a 4.2 2 8 a 4.2 2 8 a 4.2 4 35 z 8.8 4 42 z 8.8 4 42 z 8.8 5 12 q 3.2 5 12 r 3.2 5 12 r 3.2 Foo_block_44 7 9 d 0.2 7 9 d 0.2 non- 7 9 d 0.2 8 11 c 3.3 Foo_block_41 8 12 x 5.4 PK A B C consenting 8 12 x 5.4 PK A B C 1 5 q 1.3 ledger 2 8 a 4.2 block snapshot 1 5 q 1.3 snapshot 2 8 a 4.2 4 42 z 8.8 restore 4 35 z 8.8 5 12 r 3.2 from local 5 12 q 3.2 7 9 d 0.2 copy 7 9 d 0.2 8 12 x 5.4 8 11 c 3.3

Figure 4: chainifyDB’s recovery using checkpoints. As block 46 is non-agreeing it has to enter the recovery phase. It tries to recover using the most recent local checkpoint. If this would fail, it would try to recover from an older checkpoint. the previously mentioned strategies and is inspired by the can occur in the graph. parallel transaction execution proposed in [24] and relates to Let us extend the example from our Semantic Analysis the ideas of [15, 12, 22]. When a block of transactions is re- Phase and let us assume, that T1 has been added to the ceived by the execute-subphase, we first identify all existing dependency graph already. By inspecting T2 we can deter- conflict dependencies between transactions. This allows us mine that PK[42, inf] overlaps with PK[100,100] of T1. As to form mini-batches of transactions, that can be executed T2 is an update transaction, there is a conflict between T2 safely in parallel, as they have no conflicting dependencies. and T1 and add a dependency edge from T1 to T2. Figure5 presents an example dependency graph for 9 transactions. T4 (3) Executing the Dependency Graph. Finally, we can exe- cute the transactions in parallel. To do so, we perform topo- T6 T8 logical sorting, i.e. we traverse the execution stages of the T1 T2 graph, that are implicitly given by the dependencies. Our T9 T3 T5 T7 example graph in Figure5 has four stages in total. Within Stage 1 Stage 2 Stage 3 Stage 4 one stage, all transactions can be executed in parallel, as no dependencies exist between those transactions. Figure 5: A topological sort of the dependency graph with The actual parallel execution on the underlying database k = 9 transactions yielding four execution stages. system is realized using k client connections to the DBMS. To execute the transactions within an execution stage in Let us see in detail how it works. The process can be parallel, the k clients pick transactions from the stage and decomposed into three phases: (1) Semantic Analysis. First, submit them to the underlying system. As our method for a block of transactions, we do a semantic analysis of is conflict free, it guarantees the generation of serializable each transaction. Effectively, this means parsing the SQL schedules. Therefore we can basically switch off concurrency statements and extracting the read and write set of each control on the underlying system. This can be done by set- transaction. These read and write sets are realized as in- ting the isolation level of the underlying system to the lowest tervals on the accessed columns to support capturing both support level (e.g. READ COMMITTED for PostgreSQL) point query and range query accesses. For instance, assume or if possible completely turned off to get the best perfor- the following two single-statement transactions: mance. T1 :UPDATE Foo SET A = 5 WHERE PK = 1 0 0 ; T2 :UPDATE Foo SET A = 8 WHERE PK > 4 2 ; 3.4 Running Example The extracted intervals for these transactions are: Let us now discuss the concrete system architecture at the running example of Figure6. chainifyDB consists of T1: A is updated where PK is in [100,100] three loosely coupled components: ChainifyServer, Execu- T2: A is updated where PK is in [42,infinity] tionServer, and CommitServer. In Step 1, the client creates (2) Creating the Dependency Graph. With the intervals at a Proposal from a SQL transaction. This Proposal contains hand, we can create the dependency graph for the block. For the original transaction as a SQL string along with meta- two transactions having a read-write, write-write, or write- data information, such as the client ID and a signature of read conflict, we add a corresponding edge to the graph. the transaction. The client then sends the Proposal to the Note that as transactions are inserted into the dependency ChainifyServer of O1. graph in the execution order given by the block, no cycles In Step 2, the ChainifyServer early aborts all Proposals, Local 4. EXPERIMENTAL EVALUATION DBMS Y Ledger To evaluate chainifyDB we use the following setup and workload: Organization O 5 6 8 2 Type 1 (small): Two quad-core Intel Xeon CPU E5-2407 Chainify Execution Commit running at 2.2 GHz, equipped with 48GB of DDR3 RAM. Server Server Server 8 Local

Server Type 2 (large): Two hexa-core Intel Xeon CPU X5690 run- Ledger Commit 7 ning at 3.47 GHz, equipped with 192GB of DDR3 RAM. 6 4 Unless stated otherwise, we use a heterogeneous network consisting of three independent organizations O1, O2, and Client Pluggable Untrusted Ordering Service 4 5 O3. As chainifyDB will be employed within a permissioned Server Execution 3 environment between a relatively small number of organiza- O 4 3 tions, which know each other, we evaluate its behavior for

DBMS Z three organizations. Organization O1 owns two machines of 2 type 1, where PostgreSQL 11.2 is running on one of these Server Chainify Execution Commit Chainify machines. Organization O2 owns two machines of type 1 as

Organization well, but MySQL 8.0.18 is running on one of them. Finally, 1 Server Server Server organization O3 owns two machines of type 2, where again 5 6 8 Organization O1 PostgreSQL is running on one of the machines. The individ- Local ual components of chainifyDB, as described in Section 3.4, DBMS X Ledger are automatically distributed across the two machines of each organization. Additionally, there is a dedicated ma- chine of type 2 that represents the client firing transactions Figure 6: Architecture of a chainifyDB. to chainifyDB as well as one that solely runs the ordering service. As agreement policy, we configure two different options: In the first option Any-2, at least two out of our three or- which have no chance of reaching agreement later on. To ganizations have to produce the same digest to reach agree- do so, the ChainifyServer of O1 first accesses the authen- ment. In the second option All-3, agreement is reached only tication policy of the client, the public key of the client, if all three organizations produce the same digest. Empirical and the agreement policy. The agreement policy P is in evaluation revealed a block size of 4096 transactions to be our example that all three organizations have to agree on a good fit. Of course, we also activate parallel transaction the proposal. If the ChainifyServer of O1 successfully au- execution as described in Section 3.3. thenticates the client and the verification of the Proposal is As workload we use transactions from Smallbank [5]. We successful, it forwards the Proposal to O2 and O3 as spec- create 100,000 users with a checking account and a savings ified in the agreement policy. Every ChainifyServer in the account each and initialize them with random balances. The network now tests, whether the Proposal has a chance of workload consists of the four transactions TransactSavings, reaching agreement later on (e.g. by checking local con- DepositChecking, SendPayment, and WriteCheck. During a straints) and prepares a signed EarlyAbortResponse, which single run, we repeatedly fire these four transactions at a fire contains whether the organization wants to early abort the rate of 4096 transactions per client, where we uniformly pick proposal or not, as well as the Proposal itself. The Chaini- one of the transactions in a random fashion. For each picked fyServers of O2 and O3 then send their EarlyAbortResponse transaction, we determine the accounts to access based on a to the ChainifyServer of O1. In Step 3, as all three organi- Zipfian distribution with a s-value of 1.1 and a v-value of 1. zations agreed upon the Proposal, the ChainifyServer of O1 creates a ChainedTransaction from the original Proposal and 4.1 Throughput the three EarlyAbortResponses, and sends it to the pluggable We start the experimental evaluation of ChainifyDB by and potentially untrusted OrderingService, which can be dis- inspecting the most important metric of a blockchain sys- tributed as well. The OrderingService is then queuing this tem: the throughput of successful transactions, that make ChainedTransaction. In Step 4, when a certain amount of it through the system. ChainedTransactions are queuing, the OrderingService pro- Therefore, we first inspect the throughput of ChainifyDB duces a TransactionBlock from these ChainedTransactions. in our heterogeneous setup under our two different agree- The dispatch service of the OrderingService then forwards ment policies Any-2 and All-3. Additionally to ChainifyDB, the TransactionBlock to every ExecutionServer in the net- we show the following two PBS baselines: (a) Vanilla Fab- work. In Step 5, the ExecutionServer of each organization ric [8] v1.2, probably the most prominent PBS system cur- now computes the near-optimal execution graph and ex- rently. (b) Fabric++ [23], an improved version of Fab- ecutes the ChainedTransactions in the TransactionBlock in ric v1.2. Both Fabric and Fabric++ are also distributed parallel. As a side-product, the ExecutionServer computes across the same set of machines, the blocksize is 1024. the LedgerBlock as digest. In Step 6, the ExecutionServer Figure 7(a) shows the results. On the x-axis, we vary forwards the LedgerBlock to the local CommitServer. In the number of clients firing transactions concurrently from Step 7, the CommitServers perform the agreement round 3 clients to 24 clients. On the y-axis, we show the average according to the agreement policy. This involves all three throughput of successful transactions, excluding a ramp-up CommitServers. In Step 8, they reach agreement, each Com- phase of the first five seconds. We can observe that Chaini- mitServer generates the corresponding LedgerBlock and ap- fyDB using the Any-2 strategy shows a significantly higher pends it to its ledger. throughput than Fabric++ with up to almost 5000 trans- o ihrdge fprleimdrn xcto u to due execution during parallelism transactions. between of allows conflicts uniform it degree less as higher a distribution, per a Zipf under transactions skewed for 6144 throughput the to under the up than distri- with see, second higher uniform can even a is we distribution following As picked are bution. accounts the where level. isolation possible transac- lowest the the executes at and tion transactions, the of exploits our inflow slightly which to batch-wise execution, due a transaction is Chaini- produces parallel This clients, PostgreSQL. optimized actually 12 raw and than policy throughput 6, Any-2 higher 3, the for under fact, fyDB In raw systems. that the see database over can overhead negligible we only side-by-side, introduces 7(b) ChainifyDB Figure and 7(a) Figure ing the of in independent set working workload clients. of the this out- number keep significantly under PostgreSQL to MySQL see, 2GB can performs we of config- As size are memory. buffer system main transactions a both the with that fire Note we ured un- stand- time, [11]. 7(b) This OLTP-bench the using Figure workload. of same in throughput be- the systems the der also database inspect speed single-instance we processing alone if run- in organizations visible, difference slowest comes The the by MySQL. strategy, throttled All-3 signifi- ning the the is for Under progress wait the organization. to third having able slower without are cantly progress, PostgreSQL the using lead to organizations Any-2 two the the under the heterogeneous Thus, strategy, than MySQL. PostgreSQL faster the running significantly running organization workload in single the organizations lies process All- two to this The the able are for and use. reason Any-2 we The the setup between strategy. gap 3 performance large state. a the does on therefore round and agreement any deterministic as- perform be it not to Second, trustworthy. execution as- be the to it sumes service First, ordering ChainifyDB: the than sumes con- assumptions makes it more although only siderably second, achieves per Fabric++ transactions comparison, 1000 In around second. per actions arcadFbi+.W s h mlbn okodfol- workload distribution. Smallbank Zipf the evaluate a use we We Additionally, lowing Fabric++. clients. and of Fabric number varying with for ChainifyDB icy of Throughput (a) Avg. Throughput of Successful Transactions [TPS] tetruhu o Smallbank, for throughput the 2 Table in show also We compar- By make: can we observation more one is There is there that observe also can we ChainifyDB, Regarding 1000 2000 3000 4000 5000 0 ChainifyDB (Any-2) Fabric 4. Section in described as setup heterogeneous the for transactions successful of Throughput 7: Figure 3 6 ChainifyDB (All-3) Fabric++ Number ofclients Any-2 12 and All-3 24 pol- ayn ubro let.Tesm okoda nFig- in OLTP-bench as that distribution. workload Note uniform OLTP-bench. same a using follows The fired is for 7(a) clients. PostgreSQL ure of and number MySQL standalone varying of Throughput (b) Avg. Throughput of Successful Transactions [TPS] 1000 2000 3000 4000 5000 6000 ftm,w aulyijc nudt otetbeo or- of table the to update amount an certain inject a manually after ganization we Then, time, workload. of transactions Smallbank with network the chainifyDB of our sustain we First, progressing. continue see it. to from able and recover whether is see network organization and and the organization detect one one down to bring we of able Afterwards, is force- database chainifyDB we the whether First, ways: corrupt different fully two in network chainifyDB Recovery and Robustness 4.2 a and for Zipf a transactions distribution. following successful uniform Smallbank of under (Any-2) throughput ChainifyDB Average 2: Table etayoeacrigt h n- policy: Any-2 and the recovery pro- to with the according stops anymore also recovery ment This starts organization checkpoint. and of recent gression round most agreement the to the from applied in been deviation has the update the after observe Shortly organization than can faster We much each progress organizations checkpoint. blocks, the committed local that five a Every creates the organizations On IDs. block block. or- each responding three for all commit where of setup, the Fig- On homogeneous in PostgreSQL. a run Additionally, ganizations test we setup. 8(b), heterogeneous ure our for Any-2 zations the under progress have to then able from be organizations policy. it to two agreement removing remaining reach by The to organization a simulate network. one we update the Finally, of this database. failure perform the complete in not table directly do the by externally modifying we but transaction, that chainifyDB a Note through deviation. the Uniform Zipf Distribution 0 rcsl,w aetefloigstpfrti experiment: this for setup following the have we Precisely, our disturb will we capabilities, recovery the test To evsaietepors falorgani- all of progress the visualize we 8(a), Figure In 3 O 1 29TPS 2279 TPS 2757 Clients 3 n e o fast how see and MySQL O 2 stofrbhn.A onas soon As behind. far too is 6 O O Number ofclients 1 3 80TPS 3840 TPS 3676 Clients 6 as , and O O 3 O 3 sntal orahagree- reach to able not is hc u PostgreSQL, run which , 1 PostgreSQL y 12 x sal orcvrfrom recover to able is ai,w lttecor- the plot we -axis, 74TPS 5774 TPS 4709 Clients 12 ai,w lttetime the plot we -axis, O 2 unn MySQL. running O 1 O tdetects it , 14TPS 6144 TPS 4812 Clients 24 24 1 sbusy is O 1 re- Organization 1: Organization 3: Failure Deviation detected Organization 2: Stops progressing Recovery starts Organization 3: Failure Organization 2: Continues progressing as Organization 1: Organization 1 catches up. Continues progressing as Organization 2 catches up. Organization 1: Recovery finishes

Organization 1: Recovery finishes Organization 1: Deviation detected Recovery starts

(a) Heterogeneous Setup (2x PostgreSQL, 1x MySQL). (b) Homogeneous Setup (3x PostgreSQL). Figure 8: Robustness and Recovery of ChainifyDB under the Any-2 agreement policy.

6000 covers, which takes around 17 seconds, O3 also restarts pro- ChainifyDB (Any-2) ChainifyDB (All-3) gressing, as agreement can be reached again. Both O1 and 5000 O3 progress until we let O3 fail. Now, O1 can not progress 4000 anymore, as O3 is not reachable and O2 still too far behind due its slow database system running underneath. Thus, O 1 3000 halts until O2 has caught up. As soon as this is the case, both O1 and O2 continue progressing at the speed of the 2000 slower organization, namely O2. In Figure 8(b), we retry this experiment on a homogeneous setup, where all organi- 1000

zation run PostgreSQL. Thus, this time there is no drasti- [TPS] Throughput of Successful Transactions Avg. 0 cally slower organization throttling the network. Again, at Crypto & SerialExec No Crypto & SerialExec Crypto & ParallelExec No Crypto & ParallelExec a certain point in time, we externally corrupt the database ChainifyDB Configuration Options of organization O1 by performing an update and O1 starts Figure 9: Cost breakdown of ChainifyDB. to recover from the most recent checkpoint. In contrast to the previous experiment, this time the recovery of O1 does our cryptographic components have a negligible negative im- not negatively influence any other organization: O2 and O3 pact, our parallel transaction execution obviously has a very can still reach agreement under the Any-2 policy and con- positive one. With activated cryptography, parallel transac- tinue progressing, as none of the two is behind the other tion execution improves the throughput by up to 5x. one. Recovery itself takes only around 4 seconds this time and in this case, another organization is ready to perform 4.4 Varying Blocksize a agreement round right after recovery. When O fails, O 3 2 Finally, let us inspect the impact of the blocksize, which is has to halt processing for a short amount of time, as O has 1 an important configuration parameter in any blockchain sys- to catch up. In summary, this experiment shows (a) that tem. We vary the blocksize from 256 transactions per block we can detect deviation and recover from it, (b) that the in logarithmic steps up to 4096 transactions per block and network can progress in the presence of failures, (c) that all report the average throughput of successful transactions. organizations respect the agreement policy at all times, and (d) that recover neither penalizes the individual organiza- 6000 tions nor the entire network. ChainifyDB (Any-2) ChainifyDB (All-3)

5000

4.3 Cost Breakdown 4000 In Section 4.1, we have seen the end-to-end performance of chainifyDB. In the following, we want to analyze how much 3000 the individual components contribute to this. Precisely, we 2000 want to investigate: (a) The cost of all cryptographic com- putation, such as signing and validation, that is happening 1000 at several stages (see Section 3.4 for details). (b) The impact [TPS] Throughput of Successful Transactions Avg. 0 of parallel transaction execution on the underlying database 256 512 1024 2048 4096 system (see Section 3.3). Number of transactions per block Figure9 shows the results. We can observe that the over- head caused by cryptography is surprisingly small. Under Figure 10: The effect of varying the blocksize. the Any-2 policy, activating all cryptographic components decreases the throughput only by 7% for parallel execution. Figure 10 shows the results. We can see that both under Under the All-3 policy, the decrease is only 8.5%. While the Any-2 and All-3 policy, the throughput increases with the blocksize. This increase is mainly caused by our parallel PostgreSQL with blockchain features. This results in a transaction execution mechanism, which analyzes the whole “blockchain ”, which is capable of per- block of transactions and schedules them for parallel conflict- forming trusted transactions between multiple PostgreSQL free execution. instances. While this project is clearly a step in the right direction, it does not go far enough: In contrast to chaini- 4.5 Varying the Number of Organizations fyDB, the proposed system still comes in a stand-alone fash- Adding a large number of organizations to the chaini- ion and forces the users to integrate yet another new system fyDB network requires a significant amount of computing into their infrastructure. Further, they heavily modify the resources, making the evaluation of the whole system a bit internals of PostgreSQL. We intended to compare chaini- tricky. However, if we look closely, only one component fyDB against this system, however, the source code was not in chainifyDB’s transaction pipeline, namely the V-phase, available. involves synchronous communication between the organiza- BlockchainDB [13, 14] follows exactly the opposite ap- tions. Thus, to evaluate the scaling capabilities with the proach of chainifyDB: It installs a database layer on top number of organizations, we simulate the computational and of an existing blockchain, such as Ethereum [3] or Hyper- network-related efforts required in the V-phase from the per- ledger Fabric [8]. BlockchainDB basically provides a front spective of one organization Oi. For each vote, the organi- end to the underlying permissioned blockchain. In addi- zation creates a signed request, asynchronously simulates tion, the authors introduce a sharding mechanism which the voting procedure for k − 1 organizations by waiting for allows them to scale if a relatively large number of orga- 2 · l(Oi,Oj ) seconds (round-trip latency), where l(Oi,Oj ) nizations are involved. Overall, while this concept eases the is the network latency between two organizations Oi and use of blockchain systems, it does not ease the integration Oj . It then verifies the k − 1 cryptographic signatures of of blockchain features in established infrastructures. the responses and finally compares the local digest with the ChainSQL [20] takes the open-source blockchain system received digest. In Figure 11, we plot the average latency Ripple [9] and integrates relational and NoSQL databases (five runs) of the voting round while varying the number of into the storage backend. This enables it to run SQL- organizations under three different agreement policies. We style respectively JSON-style transactions. However, similar evaluate the V-phase under real-world latencies of AWS re- to [13, 14], by integrating database systems into the heavy- gions [1]. For each measurement, we choose a source organi- weight blockchain system Ripple, the transaction processing zation located in Central-Europe AWS region. All remaining performance is limited to that of Ripple — thus overshad- organizations were selected uniformly from a set of 15 AWS owing the high performance of the underlying database sys- regions from US, Europe, and Asia. tems. 10 Another project at the intersection of databases and policy=50%+1 policy=66.67% blockchains is Veritas [16]. This visionary paper also pro- policy=100% poses to extend existing database systems by blockchain fea- tures; however, they focus on a cloud infrastructure. To syn- chronize instances, they utilize log shipping. Therefore, this solution requires the underlying database system to provide 1 log shipping in the first place and disallows the connection of different database systems in one network, if their log Latency [seconds] shipping mechanisms are not compatible with each other. BigchainDB [6] combines the blockchain framework Ten- dermint [7] with the document store MongoDB and therefore extends it with blockchain features. In contrast to chaini- 0.1 16 256 4096 65536 fyDB, the system is shipped in a stand-alone fashion and Number of organizations uses the MongoDB interface. Apart from works aiming at closing the gap between Figure 11: Latency of the V-phase from the perspective of databases and blockchains from an architectural perspec- one organization, which performs a voting round with n − 1 tive, there is a considerable amount of research improving other organizations. We show three different policies, where the performance of permissioned blockchains, e.g. in partic- 50%+1, 66.67%, and 100% of the organizations have to agree. ular for Fabric [17], [10], and [19]. Finally, in our own recent As we can see in Figure 11, for all three evaluated poli- previous work [23] we explored in depth to which degree the cies, the latency remains stable until the network contains performance issues of Fabric can be overcome if we keep Fab- 8192 organizations. This is because up to this point, the ric as the central component. The lessons from that work V-phase is dominated by the network communication, not made us drop that idea and highly motivated us to stick the computational effort. If we increase the size of the net- with DBMSs as the powerhouses — and write this paper. work further, the computational effort becomes dominant, increasing the latency of the V-phase linearly with the net- 6. CONCLUSION work size. As the size of a typical permissioned blockchain We introduced a highly flexible processing model for per- network is significantly smaller than 8192 organizations, the missioned blockchain systems called the Whatever-Voting V-phase adds negligible overhead to the transaction pipeline. model, which avoids making assumptions on the precise be- havior of organizations by reaching agreement on the out- 5. RELATED WORK come instead of the promise. To showcase the strengths There are other projects that explore the intersection of of WV, we proposed chainifyDB, an implementation of a databases and blockchains. In [21], the authors extend blockchain layer, which is able to chainify arbitrary DBMSs and connect them in a network. We discussed how recovery R. Ramamurthy, S. T. V. Setty, J. Szymaszek, A. van of deviating organizations can be performed. In an exten- Renen, J. Lee, and R. Venkatesan. Veritas: Shared sive experimental evaluation, we showed that chainifyDB verifiable databases and tables in the cloud. In CIDR. does not only offer a 6x higher throughput than comparable www.cidrdb.org, 2019. baselines, but also introduces a robust recovery mechanism, [17] C. Gorenflo, S. Lee, L. Golab, and S. Keshav. which grant organizations the chance to participate in trans- Fastfabric: Scaling hyperledger fabric to 20, 000 action processing again. . In IEEE ICBC, pages 455–463. IEEE, 2019. Acknowledgements [18] S. Haber and W. S. Stornetta. How to time-stamp a This research is funded by the German Federal Ministry of digital document. In A. Menezes and S. A. Vanstone, Education and Research (BMBF). editors, Advances in Cryptology - CRYPTO ’90, volume 537 of Lecture Notes in Computer Science, 7. REFERENCES pages 437–455. Springer, 1990. [1] Aws latencies: [19] H. Meir, A. Barger, and Y. Manevich. Increasing https://medium.com/@sachinkagarwal/public-cloud- concurrency in hyperledger fabric. In SYSTOR, page inter-region-network-latency-as-heat-maps- 179. ACM, 2019. 134e22a5ff19. [20] M. Muzammal et al. Renovating blockchain with [2] CBInsights: distributed databases: An open source system. Future https://cbinsights.com/research/report/blockchain- Generation Comp. Syst., 90:105–117, 2019. technology-healthcare-disruption/. [21] S. Nathan, C. Govindarajan, A. Saraf, M. Sethi, and [3] Ethereum: P. Jayachandran. Blockchain meets database: Design https://github.com/ethereum/wiki/wiki/white-paper. and implementation of a blockchain relational [4] database. PVLDB, 12(11):1539–1552, 2019. https://www.scientificamerican.com/article/unnecessary- [22] T. M. Qadah and M. Sadoghi. Quecc: A tests-and-treatment-explain-why-health-care-costs-so- queue-oriented, control-free concurrency architecture. much/. In Middleware, pages 13–25. ACM, 2018. [5] Smallbank: [23] A. Sharma, F. M. Schuhknecht, D. Agrawal, and http://hstore.cs.brown.edu/documentation/deployment/ J. Dittrich. Blurring the lines between blockchains and benchmarks/smallbank/. database systems: the case of hyperledger fabric. In [6] Bigchaindb: https://www.bigchaindb.com, 2019. SIGMOD Conference, pages 105–122. ACM, 2019. [7] Tendermint: https://tendermint.com/, 2019. [24] C. Yao, D. Agrawal, G. Chen, Q. Lin, B. C. Ooi, [8] E. Androulaki, A. Barger, V. Bortnikov, et al. W. Wong, and M. Zhang. Exploiting single-threaded Hyperledger fabric: A distributed operating system for model in multi-core in-memory systems. IEEE Trans. permissioned blockchains. CoRR, abs/1801.10228, Knowl. Data Eng., 28(10):2635–2650, 2016. 2018. [9] F. Armknecht, G. O. Karame, A. Mandal, F. Youssef, and E. Zenner. Ripple: Overview and outlook. In TRUST, volume 9229 of Lecture Notes in Computer Science, pages 163–180. Springer, 2015. [10] H. Dang, T. T. A. Dinh, D. Loghin, E. Chang, Q. Lin, and B. C. Ooi. Towards scaling blockchain systems via sharding. In SIGMOD Conference, pages 123–140. ACM, 2019. [11] D. Difallah, A. Pavlo, C. Curino, and P. Cudr´e-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. PVLDB, 7(4):277–288, 2013. [12] B. Ding, L. Kot, and J. Gehrke. Improving optimistic through transaction batching and operation reordering. PVLDB, 12(2):169–182, 2018. [13] M. El-Hindi, C. Binnig, A. Arasu, D. Kossmann, and R. Ramamurthy. Blockchaindb - a shared database on blockchains. PVLDB, 12(11):1597–1608, 2019. [14] M. El-Hindi, M. Heyden, C. Binnig, R. Ramamurthy, A. Arasu, and D. Kossmann. Blockchaindb - towards a shared database on blockchains. In SIGMOD Conference, pages 1905–1908. ACM, 2019. [15] J. M. Faleiro, D. Abadi, and J. M. Hellerstein. High performance transactions via early write visibility. PVLDB, 10(5):613–624, 2017. [16] J. Gehrke, L. Allen, P. Antonopoulos, A. Arasu, J. Hammer, J. Hunter, R. Kaushik, D. Kossmann,