Protocols for Building Secure and Scalable Decentralized Applications

A Dissertation

Presented to the Faculty of the Graduate School

of Cornell University

in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by Kai Mast December 2020 © 2020 Kai Mast ALL RIGHTS RESERVED Protocols for Building Secure and Scalable Decentralized Applications Kai Mast, Ph.D. Cornell University 2020

Decentralized ledger technologies distribute data and execution across a public peer- to-peer network, which allows for more democratic governance of distributed systems and enables tolerating Byzantine failures. However, current protocols for such decen- tralized ledgers are limited in performance as they require every participant of the protocol to execute and validate every operation. Because of this, systems such as Bitcoin or Ethereum are limited in their throughput to around 10 transaction per sec- ond. Additionally, current implementations provide virtually no privacy to individual users, which precludes decentralized ledgers from being used in many real-world ap- plications. This thesis analyses the scalability and privacy limitations of current protocols and discusses means to improve them in detail. It then outlines two novel protocols for building decentralized ledgers, their implementation, and evaluates their performance under realistic workloads.

First, it introduces the BitWeave, a blockchain protocol enabling parallel transac- tion validation and serialization while maintaining the same safety and liveness guaran- tees provided by Bitcoin. BitWeave partitions the system’s workload across multiple distinct shards, each of which then executes transactions mostly independently, while allowing for serializable cross-shard transactions. Second, it discusses DataPods, which is a database architecture and programming abstraction that combines the safety properties of decentralized systems with the scala- bility and confidentiality of centralized systems. Each data pod is akin to a conventional database instance with the addition of enabling users to detect and resolve misbehav- ior with the help of a global ledger. Further, data pods are interoperable with each other through federated transactions, enable confidentiality of data, and allow users to migrate their data in case of failure. BIOGRAPHICAL SKETCH

Kai Mast was born in Böblingen, Germany. He received his Bachelor of Science degree from the University of Bamberg in 2014. While working towards his undergraduate degree, he researched general artificial intelligence with Dietrich Dörner and peer-to- peer networks with Udo R. Krieger. He wrote his undergraduate thesis on on-chip networks while visiting Intel Mobile Communications. His interest in peer-to-peer networks lead to him pursuing a Ph.D. at Cornell Uni- versity, working with Emin Gün Sirer. During his time at Cornell, his work revolved around secure database management systems and blockchain protocols. He spent the summer of 2016 at Microsoft Research Cambridge working with Anthony Rowstron and the summer of 2019 at U.C. Berkeley working with Dawn Song. For his minor studies in sociology, Kai Mast researched political alignment in social networks with Yongren Shi and Michael W. Macy. He received his Master of Science degree from Cornell in 2017 and his in doctorate in 2020.

iii ACKNOWLEDGEMENTS

There are many people without which I would not have been able to make this journey.

During my undergraduate studies, Philipp Eittenberger and Dietrich Dörner were the first encouraging me to conduct research. Additionally, Udo Krieger and Todor Mladenov have been amazing mentors while at the University of Bamberg.

My adviser Gün Sirer always helped me improve my research, coding, and technical writing skills. All members of the Cornell Community, and in particular the Cornell System Lab, have been great friends and colleagues. This includes, but is not limited to,

Ayush, Deniz, Edward, Efe, Ethan, Florian, Harjasleen, Jack, Kevin, Matthew, Natacha, Robert, Shir, Soumya, Ted, Tom, Vera, Vlad, Xanda, and Yunhao. A particular of my doctoral studies were the undergraduates I had the pleasure to work with: Lequn,

Charles, Aaron, and Arzu. Internships have been an invaluable experience during, both, my undergraduate and graduate studies. I want to thank everybody at Elektrobit Automotive, Siemens

Healthcare, Intel Mobile Communications, and Microsoft Research that I had the chance to interact with. Here, I want to especially thank Andreas Pokorny who helped me find and secure my first internship.

Finally, I want to thank my family, my friends, my housemates, and, especially, my partner Sophie for all their support during the ups and downs of graduate school.

iv TABLE OF CONTENTS

1 Introduction 1 1.1 Motivation ...... 1 1.2 Decentralized Ledger Abstraction ...... 3 1.2.1 Consistency ...... 4 1.2.2 Immutability ...... 5 1.2.3 Auditability ...... 6 1.3 Decentralized Ledger Technologies ...... 6 1.3.1 Assumptions and Attack Model ...... 7 1.3.2 Sybil Detection ...... 7 1.3.3 Committee-Based Consensus ...... 8 1.3.4 Nakamoto Consensus ...... 9 1.3.5 Bottlenecks ...... 11 1.4 Existing Approaches for Scaling Blockchains ...... 13 1.4.1 Off-Chain Protocols ...... 13 1.4.2 Sharding Blockchains ...... 13 1.5 Challenges in Sharding Blockchains ...... 14 1.5.1 Maintaining Safety ...... 14 1.5.2 Ensuring Consistency ...... 15 1.5.3 Maintaining Decentralization ...... 15 1.5.4 Providing Sound Incentive Mechanisms ...... 16 1.6 Thesis Contributions ...... 16 1.6.1 BitWeave: Audit-based Sharding for Blockchains ...... 17 1.6.2 DataPods: Federated Decentralized Databases ...... 17 1.7 Thesis Outline ...... 18

2 Abstractions for Scalable Decentralized Applications 19 2.1 Transaction Fees and Digital Payments ...... 19 2.2 Existing Data Models for Decentralized Ledgers ...... 19 2.2.1 The UTXO Model ...... 20 2.2.2 The Accounts Model ...... 22 2.3 Smart Contracts ...... 24 2.3.1 Limitations of Smart Contracts ...... 26 2.4 Concurrent Decentralized Applications ...... 26 2.4.1 Objects and Object Types ...... 27 2.4.2 Application Functions ...... 28 2.4.3 Reservations ...... 28 2.4.4 Implementation ...... 30 2.5 Chapter Summary ...... 30

v 3 BitWeave: Audit-based Sharding for Blockchains 31 3.1 Foundation: Bitcoin-NG ...... 33 3.2 Blockchain Structure ...... 33 3.3 Consensus Abstraction ...... 35 3.4 Roles in BitWeave ...... 36 3.4.1 Epoch Leaders ...... 36 3.4.2 Shard Followers ...... 36 3.4.3 Shard Commanders ...... 37 3.5 Transaction Processing Overview ...... 37 3.5.1 Reservations ...... 38 3.5.2 Commits and Aborts ...... 39 3.5.3 Efficient Cross-Shard Communication ...... 39 3.5.4 Transaction Fees and Miner Rewards ...... 40 3.6 Fault-Tolerant Transaction Processing ...... 42 3.6.1 Detecting Fraud ...... 43 3.6.2 Fraud-Proofs ...... 44 3.6.3 Incentivizing Fraud-Finding Behavior ...... 45 3.6.4 Ensuring Shard Availability ...... 46 3.6.5 Adaptive Confirmation Intervals ...... 48 3.7 Correctness ...... 49 3.7.1 Safety ...... 51 3.7.2 Liveness ...... 53 3.8 Case Studies ...... 54 3.8.1 Applying BitWeave to Ethereum ...... 54 3.8.2 Applying BitWeave to Bitcoin ...... 56 3.9 Implementation Details ...... 57 3.9.1 Reducing Transaction Footprint ...... 58 3.9.2 Block Size and Epoch Length ...... 58 3.9.3 Congestion Control ...... 59 3.10 Experimental Evaluation ...... 60 3.10.1 How well does BitWeave’s overall throughput scale? ..... 63 3.10.2 How does sharding affect the transaction footprint? ...... 63 3.10.3 What is the overhead generated by cross-shard messages? ... 64 3.10.4 How well does the protocol handle failures? ...... 64 3.11 Discussion and Open Problems ...... 65 3.11.1 Shortening CHALLENGE Periods ...... 65 3.11.2 Reducing Chain Size ...... 66 3.11.3 Adapting to Changing Workloads ...... 67 3.12 Chapter Summary ...... 68

4 DataPods: Federated Decentralized Databases 69 4.1 Foundation: Trusted Execution Environments ...... 70 4.2 The DataPods Architecture ...... 72 4.2.1 Assumptions and Attack Model ...... 74

vi 4.2.2 Global Ledger Abstraction ...... 74 4.2.3 Application Programming Interface ...... 75 4.2.4 Authenticated Private Storage ...... 77 4.2.5 Secure Function Evaluation ...... 81 4.2.6 Detecting Data Pod Failure ...... 81 4.2.7 Economic Incentives ...... 82 4.3 Application Case Studies ...... 84 4.3.1 Federated Social Networking ...... 84 4.3.2 Non-Tangible Assets ...... 86 4.3.3 Decentralized Machine Learning ...... 86 4.4 Federated Transactions ...... 88 4.4.1 Transaction Chopping ...... 88 4.4.2 Transaction Lifecycle ...... 90 4.4.3 Fault-Tolerance ...... 94 4.4.4 Federated Transaction Fees ...... 96 4.4.5 Concluding Example ...... 98 4.5 User Data Migration ...... 99 4.6 Correctness ...... 101 4.6.1 Safety ...... 102 4.6.2 Liveness ...... 103 4.7 Implementation Details ...... 104 4.8 Experimental Evaluation ...... 105 4.8.1 Application Benchmark ...... 105 4.8.2 Microbenchmarks ...... 106 4.9 Discussion and Open Problems ...... 109 4.9.1 Sybil Attacks ...... 109 4.9.2 Global Objects ...... 110 4.9.3 Software-Based Attacks on TEEs ...... 110 4.9.4 Handling Offline Clients ...... 111 4.10 Chapter Summary ...... 112

5 Related Work 113 5.1 Consensus Protocols and State Machine Replication ...... 113 5.2 Accountability for Distributed and Decentralized Systems ...... 114 5.2.1 Audit Mechanisms ...... 114 5.2.2 Encrypted Databases ...... 115 5.2.3 Trusted Execution and Decentralized Ledgers ...... 116 5.3 Scaling Decentralized Systems ...... 117 5.3.1 Minor Changes to Nakamoto Consensus ...... 117 5.3.2 Sharding Decentralized Systems ...... 119 5.3.3 Scaling On-Disk Storage ...... 123 5.3.4 Federated Chains ...... 123 5.3.5 Sidechains and Off-Chain Mechanisms ...... 124 5.4 Emerging Blockchain Protocols ...... 125

vii 6 Conclusion 127

Bibliography 128

viii CHAPTER 1

Introduction

In this thesis, we present protocols for building scalable decentralized applications and the principles that motivate their design.

1.1 Motivation

The Internet plays a major role in modern human life and its governance istherefore of high importance. Unfortunately, a few corporations currently control most critical parts of the World Wide Web’s infrastructure, such as social networks, digital payment systems, and instant messaging. As a consequence, they have the power to shape online discourse, and through that, society at large. Additionally, online services can infer sensitive information, such as political alignment, not only from personal data but from -data as well [92]. The fact that users are dependent on these services, have no say in how services operate, and are not able to easily switch service providers exacerbates this problem.

Federated protocols aim to overcome some of these obstacles by allowing inde- pendent organizations to cooperate but still require trust in these organizations. E- Mail [39], ActivityPub [16], Matrix [36], or XMPP [33] allow independent servers to selectively exchange data. This means that not all users have to trust the same entity but can choose which node in the network to trust their data with. Similarly, DNS consists of a hierarchy of independent servers that each manage a separate part of the domain namespace. Fundamentally, federation by itself simply offers a choice of providers to the user, but does not provide any incentives for the providers to engage

1 (b) Federated: Each node (c) Decentralized: An en- (a) Centralized: All users provide a similar service semble of node collectively rely on a single node to pro- and nodes selectively inter- provide a s service in a man- vide a particular service. act with each other through ner that prohibits misbehav- a common protocol. ior.

Figure 1.1: Overview of the centralized, federated, and decentralized architectures. Un- like the former two, the decentralized architecture allows clients to interact with the network as a whole and, thus, to be fully independent of a specific service provider. in desired behaviors.

Blockchains [73, 103], or more broadly decentralized ledgers, enable applications to execute across a trustless peer-to-peer infrastructure. We consider a system decentral- ized if individual nodes cannot influence its execution as long as they do not controla threshold of the network. This means that decentralized architectures protect against malicious adversaries in addition to simple crash failures. As a result, decentralized ledgers allow for online services to operate without the reliance on a trusted party. Figure 1 outlines this stark contrast to previous architectures, where each user’s data is in full control of a single organization.

While blockchain protocols are a promising technology in the abstract, they fall short in critical ways. For example, the Ethereum blockchain has roughly the process- ing power of a portable calculator or about 35k floating-point operations per second1.

The culprit for these limitations is that decentralization requires massive replication of

1With the current gas limit, Ethereum can do about two million floating-point multiplications per block, which are published about once a minute [30]

2 computation and data. This massive replication results in high computation, commu- nication, and storage overheads, which in turn, hurts throughput and latency.

Additionally, protocols for decentralized execution can only guarantee limited pri- vacy to its users and can usually not enforce confidential on the data they process. Be- cause data and computation is replicated across a global network, it is much easier for an attacker to gain access to data compared to a centralized setting. Most users,how- ever, expect that applications will process their personal data sensitively. As a result, many real-world applications can currently not be deployed easily in a decentralized setting.

This thesis focuses on improving scalability and privacy guarantees for real-world decentralized applications. Scalability in this context means that protocols can accom- modate workloads of demanding applications. Further, deployment in a real-world environment requires protocols to provide proper economic incentives so that all ra- tional network participants behave as expected.

1.2 Decentralized Ledger Abstraction

Each decentralized architecture, in essence, provides the abstraction of an append-only ledger with semantics that go beyond the mere storage of data and execution of pro- grams. These semantics are key to building applications with high integrity in ade- centralized setting. For the rest of this thesis, we will refer to this abstraction asde- centralized ledgers and the underlying protocols as decentralized ledger technologies (DLTs).

We extend the formalism of Adya [1], which defines a database D consisting of a his-

3 tory HD of transactions and a set of objects OD, each associated with a totally-ordered set of object versions. Each transaction is a set of operations applied to a particular object, such as a read, write, or append. Each object’s version history initially only consists of the ⊥ value, indicating that it has not been created yet.

Transactions affecting the same object(s) and their operations can be ordered with respect to each other. We say a transaction T precedes another transaction T ′ if it appears earlier in the database’s history, denoted as T ← T ′. This relationship is transitive, i.e., if T1 ← T2 and T2 ← T3 hold, then T1 ← T3 holds as well. Similarly, we say an operation op precedes another operation op′ if the object versions it accesses precedes that of op′. Transactions affecting two disjoint sets of objects may not beable to be ordered with respect to each other, denoted as T ↔ T ′. Similarly, operations affecting two distinct objects cannot be ordered with respect to each other, denotedas op ↔ op′.

1.2.1 Consistency

Like many conventional database management systems, decentralized ledgers allow enforcing application-specific constraints on the data and provide strict serializability for all operations. Serializability ensures that all transaction execute atomically, i.e., in a serial or equivalent to serial order. In other words, if two transactions T1 and T2 are applied to two distinct objects, they must be applied in the same order to both objects.

Strict serializability extends this notion of a real-time order: if T1 started before T2, its operations should also be applied before those of T2 (see Equation 1.1). This, in turn, not only ensures the integrity of a system’s state but also makes it much easier for developers to build applications, because they do not have to reason about concurrency.

4 ∀ ∈ H ∀ ∈ ∈ ← ⇒ ← ∨ ↔ T1,T2 , op1 T1, op2 T2.T1 T2 op1 op2 op1 op2 (1.1)

1.2.2 Immutability

Decentralized ledgers are eidetic: they maintain a record of all transactions ever pro- cessed by the system. From this record, any past state of the system can be regenerated and inspected. As a result, the ledger can serve as a notary or an impartial witness, by providing a reliable record of past information.

Formally, successfully applying a transaction T to a database D yields a new database

D′ with T appended to its history. Similarly, the version history of each object modi- fied by T will be appended with its new version.

We then define immutability as a constraint on the allowed state transitions from D to D′. Thus, if a database state D predates another state D′, i.e., D ← D′, all of its transactions and object versions are contained in D′. This means the successor state can only add new transactions and object versions and not remove or reorder them, denoted in equation 1.2.

′ ′ ∀D, D . D ← D ⇒ HD′ ⊆ HD (1.2)

Immutable systems thus are, unlike the term immutability suggests, able to change their state, but will only allow state transitions that extend the state without removing existing information. Further, they might enforce other data policies to ensure the

5 integrity of a particular application. For example, a cryptocurrency usually wants to ensure that no transaction is spent more than once.

1.2.3 Auditability

Auditablity enables participants to join the network at any point in time and verify all state relevant to them up to the current point without having to trust a particular remote party. Formally, we say there exists a publicly-available function verify(D, D′) that certifies a transition from D to D′ is valid. Auditors can then recreate and verify the entire system execution by verifying all database state transitions, starting with the initial state consisting of an empty transaction history.

1.3 Decentralized Ledger Technologies

At the core of DLTs lies consensus protocols, used for state machine replication (SMR) to maintain a unified database. The definition of a state machine comprises asetof potential states the machine can be in and a set of admissible state-transitions that allow moving from one state to another. SMR decides which state transitions to perform and replicates this decision across all participants of the protocol. As a result, all non-faulty participants maintain the same state at any point in time.

Most consensus protocols, while varying greatly in their implementation, are leader- based. This class of protocols first appoint a particular node to be leader (sometimes called a primary), which then proposes state-transitions to the system. These state transitions are then subject to approval by the rest of the network. The existence of a singular leader ensures that transactions are proposed in an order that ensures seri-

6 alizability. Finally, leader-based protocols can react to failures or bad performance at any point in time by appointing a different entity to be leader.

1.3.1 Assumptions and Attack Model

Distributed ledgers are designed to be resilient against Byzantine failures, a model that encompasses both benign failures and those caused by malicious intent. A Byzantine actor may want to change the network’s behavior to their advantage or break the net- work entirely. To achieve this, attackers may issue invalid or conflicting messages, and delay or hide communication. Correct nodes, on the other hand, follow the protocol as prescribed.

To ensure that correct nodes faithfully follow the protocol, distributed ledger proto- cols typically assume that the majority of network participants behave rationally and provide incentives for these rational actors to advance the protocol. These incentives can take the form of direct payments, where parties that process transactions receive compensation in the form of block rewards or transaction fees. Further, incentives can be based around collateral, where parties that misbehave are penalized financially.

1.3.2 Sybil Detection

An open network inherently has to deal with Sybil attacks, where a single entity is creating multiple identities to gain more control over the system. These attacks are feasible because without a trusted third party there is no straightforward way to au- thenticate user identities.

Consensus protocols rely on either computational barriers or stake to prevent such

7 Sybil attacks. Stake-based systems manage membership information as part of itspro- tocol. In committee-based consensus protocols stake usually is binary, which means only members of the committee are allowed to vote and each committee member has the same voting power. Here, each change to the committee must be approved by all participants. Recent protocols have introduced the notion of variable stake, often bound to how much cryptocurrency a certain party holds. In this setting, cryptocur- rency can be passed on to other participants to dynamically reallocate voting power.

Systems that rely on computational barriers for Sybil detection do not manage any form of global membership information. Instead, participants have to perform a cer- tain task to become, or have the chance to become, leader. Most commonly, this task involves solving a cryptographic puzzle, where an input to a hash function has to be found such that the functions’ output is below a specified threshold.

These particular cryptographic puzzles are better known as Proof of Work (PoW) [26]. The underlying intuition is simple: every attempt to solve the puzzle requires acon- stant amount of computation and the chance to solve the puzzle is independent of earlier attempts. Proof of Work, thus, provides a very reliable means of Sybil detection, albeit being a very wasteful mechanism.

1.3.3 Committee-Based Consensus

Classical consensus protocols achieve state machine replication among a fixed set, or committee, of nodes. They were first introduced by Leslie Lamport [55], among oth- ers [25, 74, 89]. These protocols now form the foundation for most fault-tolerant appli- cations. For example, a web service might be implemented across three data centers. If one of the data centers fails, the consensus protocol ensures that operation can con-

8 tinue by shifting computation to the other two data centers.

While consensus protocols were initially intended to tolerate only benign failures, the introduction Byzantine fault-tolerant consensus protocols allowed for more com- plex uses cases. For example, a node in the committee might not simply become un- available but encounter a software bug that makes it behave in ways not originally intended by the software developers. While such failures might be much moreun- likely than a crash, it is still important to be resilient against them for safety-critical applications.

Recently, committee-based Byzantine fault-tolerant consensus protocols, such as Practical Byzantine Fault-Tolerance (PBFT) [12], have received new attention in the con- text of decentralized ledgers. Because these protocols cannot only protect against soft- ware bugs or hardware failures, but also against a malicious human adversary control- ling a subset of the committee, they are suitable for implementing applications where mutually-distrusting parties are trying to agree on a consistent state.

1.3.4 Nakamoto Consensus

The Bitcoin paper introduced Nakamoto consensus, a consensus protocol that buildson top of Proof-of-Work (PoW). There are two core differences between protocols described so far and Nakamoto consensus. First, the use of PoW allows it to be a public protocol that allows participants to join and leave at any point in time. To take part in the protocol one does not have to register with some global mechanism, but merely starts attempting to solve the crypto-puzzle. Second, Nakamoto consensus operates non- deterministically, where the current state is known to be agreed upon by the global network with some high, but not absolute, probability.

9 Figure 1.2: Sketched structure of a Bitcoin-like blockchain. The currently winning chain is highlighted.

Systems based on Nakamoto consensus rely on Gossip protocols [24] to broadcast messages, such as transactions or blocks, because they execute across a peer-to-peer network with no pre-defined topology or membership. Instead of being connected to the full network, participants of a Gossip protocol only talk to a few peers. When receiving a new message, they forward it to all their peers. To make gossip efficient, participants usually keep track of which messages they already sent to or received from a particular peer. As a result, messages eventually spread to the entire network, without the network being fully connected.

Nakamoto consensus performs leader election using PoW through a process called mining. Once a party has solved the cryptographic puzzle, they forward their solution in form of a block to the network to become leader. Instead of proposing transactions after becoming leader, they directly include a set of serialized transactions in thepub- lished block. Once participants start mining, their chance of becoming miner is directly proportional to the processing power available to them, because each attempt to solve the crypto-puzzle is independent of past attempts.

Figure 1.2 outlines how, at a particular point in time, there might be multiple com- peting chains. Here, while the prefix of the chain is considered stable and abandoned

10 forks have been removed, at the head of the chain multiple forks are competing for the longest chain.

Protocols based on Nakamoto consensus usually assume a strong bound on the network latency. This ensures that a block will be visible to all network participants after some fixed time. More concretely, systems like Bitcoin assume that this boundis about five minutes.

1.3.5 Bottlenecks

We now discuss the major bottlenecks of blockchains: execution, verification, and com- munication. Essentially, electing protocol leaders and ordering transactions in a globally- replicated manner requires massive replication of both data and computation. Thus, what the network can process as a whole is limited by the fact that every participant needs to process, forward, and execute all transactions.

Execution

Transactions in decentralized ledger systems differ significantly from those in conven- tional database systems. Every participant of the protocol maintains its local state in an authenticated data structure to be able to verify and process future blocks. In par- ticular, DLT nodes usually calculate and store some form of Hash-tree of the state, and every block contains the root hash of the current state. These hash trees can be used both to verify blocks and to provide succinct proofs of some substate of the system. Executing transactions in such an authenticated manner requires more computation and storage. This is one of the reasons why systems such as Ethereum employ alimit on how much computational steps a block can contain (“gas limit”). Previous work has

11 demonstrated that an improved storage engine can mitigate this bottleneck to some amount [78].

Verification

Blockchains rely on digital signatures to ensure the correctness and authenticity of messages. Intuitively, checking every transaction request and block generates a high computational workload as digital signatures are rather complex to verify. Increas- ing the number of transactions for some time interval thus significantly increases the burden for every node in the network to participate in the protocol.

Communication

Finally, for every node to be able to process every block and transaction, all transactions and blocks must be propagated to the entire network. Intuitively, this creates a high network communication overhead. Decentralized ledgers usually execute across a geo- distributed peer-to-peer network. Here, a larger state that needs to be synchronized will further increase the considerably high propagation latencies. Even worse, scala- bility mechanisms may harm decentralization, a key promise of decentralized ledgers. For example, a naive attempt for increasing throughput of a ledger is a higher block frequency or block size. Either, will cause a higher propagation delay of messages and, in turn, increase the likelihood of forks. Additionally, bigger block sizes raise the CPU and storage requirements for nodes participating in the network. This problem is especially salient for new nodes joining the network the need to verify all blocks in the chain before processing new transactions. As a result, only participants with strong hardware that are well-connected may participate in the protocol, causing a more centralized network layout.

12 1.4 Existing Approaches for Scaling Blockchains

1.4.1 Off-Chain Protocols

A prominent line of work for improving the performance of DLT systems are so-called “Layer 2” systems. Such systems build on top of an existing DLT and improve its per-

formance. The most common example for Layer 2 solutions are payment- and state- channels [80, 60, 70], which allow building networks of peer-to-peer relationships to process certain operations without the involvement of the global ledger.

Payment channels lock funds on the global ledger and facilitate the fast transactions between parties through an off-chain protocol. Only the amount locked on the base

chain is allowed to be exchanged in these systems, and a tally of balances is kept for when it is time to settle. On settling, the amount apportioned to the settler asdenoted by her balance in the sidechain is unlocked on the main chain and returned to the settler.

State channels extend this scheme from cryptocurrency funds to arbitrary state.

Similarly, systems such as Plasma [79] or Arbitrum [42] maintain authenticated data structures outside the main ledger and solely rely on it in case of failures.

1.4.2 Sharding Blockchains

Sharding aims to address all three scalability bottlenecks but realizing a sound and

effective sharding protocol faces significant challenges. In essence, sharding allows every participant to only process a subset of all transactions of the network. Ideally, this allows to linearly scale the throughput of the system without increasing the burden

on any particular participant.

13 Previous work usually implements sharding in the following way [99]. Some mech- anism keeps track of a set of identities, e.g. by examining the last k miners of a PoW chain [48]. The protocol then assigns shard some subset of these nodes. Each shard then locally runs a consensus mechanism, such as PBFT [12], and a distributed trans- action protocol, such as two-phase commit, handles cross-shard transactions. Finally, some scheme is in place to periodically “merge” the state of all shards.

1.5 Challenges in Sharding Blockchains

Blockchain enthusiasts have long hoped that sharding will solve the scalability prob- lem. So far no sharding protocol has been deployed in a real-world setting. The reason for this is complex but, at a high level, sharding decentralized ledgers faces four ma- jor challenges: reduced safety, loss of network decentralization, reduced consistency, and lack of economic incentives.

1.5.1 Maintaining Safety

The essence of decentralized ledgers is that they protect some application, e.g. a cryp- tocurrency, against a strong Byzantine adversary. A basic requirement for protecting against such an adversary is to have a Sybil-detection mechanism, which is usually based on how much stake an entity has or how much computational work is done.

For example, a core assumption in Bitcoin is, that less than 50% (or 25% in some cases) of the entire network are controlled by adversaries. A shard intuitively has much less total stake (or computational work) as the system as a whole. Thus, some mechanism must be in place to ensure that a single shard is safe as the network as a

14 whole.

1.5.2 Ensuring Consistency

In systems such as Ethereum, a serial order of transactions is enforced to ensure con- sistent updates to contract states. Unfortunately, enforcing a total order for all trans- actions is very difficult if shards execute mostly independently. To ensure consis- tency, a sharding protocol needs some mechanism to consistently apply transactions to multiple shards. Protocols for distributed transactions, which ensure consistent and atomic updates across shards, have been explored extensively in the systems commu- nity. However, adapting such protocols to a permissionless setting with Byzantine failures is a challenge.

1.5.3 Maintaining Decentralization

So far we have discussed decentralization as an abstract system property. More con- cretely ensuring decentralization means keeping the burden of joining the network and participating in the consensus protocol low. Ideally, anybody with a computing device should be able to join the system.

Even current non-sharded systems that promise decentralization are not very de- centralized in practice. For example, Bitcoin and Ethereum are controlled by only a handful of entities [37]. The underlying reason for this is that decentralization often conflicts with the goal of scalability. If a network supports many participants ofvary- ing locations and processing power, data takes longer to be propagated across the net- work.

15 Some sharding protocols ensure the safety of their sharding scheme by sacrificing decentralization to some extent. For example, Monoxide [100] requires a large set of miners to process all shards. As a result, the protocol relies on nodes with access to large amounts of processing power, such as data centers, to function correctly.

1.5.4 Providing Sound Incentive Mechanisms

Miners (or stakers) participate in a consensus protocol because they receive some mon- etary reward or want to secure the value of their assets stored on the ledger. Bitcoin and Ethereum have fairly straightforward incentive mechanisms, where the miner of a new block gets some new currency and transaction fees as a reward.

Incentive mechanisms tend to get more complicated when introducing sharding, as there may exist distinct shard chains and transactions can execute across multiple shards. For example, OmniLedger while providing a safe sharding protocol does not provide sound incentives for the large set of validators required to power the protocol.

1.6 Thesis Contributions

This thesis presents two novel protocols for building scalable decentralized applica- tions: BitWeave and DataPods. The former is a “Layer 1” solution that increases scalability through sharding, while the latter proposes a semi-decentralized architec- ture based on secure hardware for “Layer 2” protocols.

16 1.6.1 BitWeave: Audit-based Sharding for Blockchains

We first introduce BitWeave, a secure and scalable sharding protocol for public blockchains.

BitWeave enables shards to execute mostly independently and relies on audit mecha- nisms to ensure correctness. The protocol distinguishes between the main chain, that mostly processes meta-information, such as who is responsible for which shard, and shard chains, that contain actual transaction information. Shards are not delegated to a trusted third party or a committee of nodes but are instead serialized by an untrusted node: the shard commander. Shard participants can rectify the failure or misbehavior of shard commanders using the main chain. In the common case, this allows for high throughput, while safety is guaranteed during Byzantine failures.

The key contribution of BitWeave is that it provides scalability without harming decentralization or safety of the protocol. Instead of assigning a certain fraction of the mining power (or stake) to each shard, all mining power remains at the main chain. The number of information processed by the main chain is kept minimal, whichal- lows nodes to participate in the consensus protocol virtually independently of their computing power. This, in turn, avoids centralization around a few powerful entities.

1.6.2 DataPods: Federated Decentralized Databases

We then introduce DataPods, which is an architecture for trustless, federated databases based on secure hardware. Here, all user data is stored in a secured database instance, while a global ledger tracks meta-data, provides a public-key infrastructure, and en- ables economic incentives via a cryptocurrency. Data pods operate mostly without the involvement of this global ledger, while still providing the same safety and live- ness guarantees as conventional decentralized systems. Users can execute applications

17 across multiple data pods using federated transactions and move their data between pods in case of failures.

The key advantage of data pod over other off-chain approach is that they support dynamically changing workloads, churn of users, confidential computation, and com- plex application logic at the same time. This is achieved by relying on a combination of trusted hardware and audit mechanisms.

1.7 Thesis Outline

The rest of this thesis describes the underlying principles and implementation oftwo novel protocols and evaluates them against the three design goals of decentralization, scalability, and confidentiality.

In Chapter 2, we discuss application and data abstractions that enable building scalable decentralized protocols, such as BitWeave and DataPods. The thesis then covers BitWeave (Chapter 3) and DataPods (Chapter 4) in detail. Finally, it gives an overview of related work (Chapter 5) before concluding (Chapter 6).

18 CHAPTER 2

Abstractions for Scalable Decentralized Applications

In this chapter, we introduce programming and data models used by decentralized ledgers and the data structures they rely on.

2.1 Transaction Fees and Digital Payments

At the core of each decentralized infrastructure is a digital currency that enables eco- nomic incentives. In essence, every participant holds some amount of currency, that can be used to reward other participants in the network for work they perform, or to provide some notion of deposits. Further, this currency can also use a regular form of payment, e.g., to use when paying a vendor.

Transactions fees are one of the main incentive mechanism driving decentralized ledgers. At a high level, users pay protocol participants for including their transactions. These fees depend on the urgency of a particular transaction and the current congestion of the network. Users set the transaction fee when issuing a new request and protocol leaders decide whether to include the transaction or not. This results in a fee market that adapts to the current supply and demand.

2.2 Existing Data Models for Decentralized Ledgers

Decentralized ledgers require a different data model than conventional databases, be- cause they execute in a trustless environment. Here, each user is associated with a set

19 Figure 2.1: Sketch of the UTXO model. Each transaction consumes one or multiple un- spent transaction outputs and generates at least one new transaction output. Outputs are owned by a particular public key. of cryptographic keys and must sign off transaction spending their cryptocurrencies with those keys. Nodes participating in a decentralized protocol need to prove that a transaction has been signed off by a particular set of users for it to be deemed valid.

Additionally, transactions issued by a client might be conflicting. For example, Alice might request to spend $5 each to Bob and Claire, but only have $6 in her account.

Data models in decentralized ledgers are focused around the notion of payments and cryptocurrencies, as this was their initial application and cryptocurrencies are still the basis for incentive mechanisms in almost every DLT. We discuss the two most common ones: UTXO and Accounts.

2.2.1 The UTXO Model

Bitcoin represents a user’s account balance as a set of Unspent Transaction Outputs

(UTXOs). Transactions in the UTXO model work similarly to a voucher system in which some input vouchers are exchanged for new vouchers of the same or lesser value. Figure 2.1 outlines how transactions consume UTXOs (the unspent outputs of a previous transaction) and produce new UTXOs. Note, that in a real system some

20 Figure 2.2: Sketch of a Merkle tree and an associated proof. To proof the authenticity of Object 3 against the root A, we only need to provide a branch leading from A to the object. transaction’s input would go towards a transaction fee.

Bitcoin, like many other DLTs, relies on Merkle hash trees [69] to provide authen- tication of the blockchains state. Merkle trees can be generated for any arbitrary set of objects. To do this, these objects are first arranged in some pre-defined order. The tree is then constructed by recursively combining k hashes, where k is some branching factor of the tree, and generating a new hash value from the resulting value. A Merkle proof then allows verifying an object’s state against the root of the tree without having access to the entire tree. The proof is the particular branch from the object to thetree root. The verifier here just recomputes and checks the correctness of every hashinthe branch to ensure the proof’s integrity. These proofs are virtually impossible to forge as it is very hard to find collisions, i.e., to input values that map to the same output value, for cryptographic hash functions [88].

The key advantage of the UTXO model is that one can succinctly prove theexis- tence of an unspent transaction output. Each block in Bitcoin contains the Merkle root of the current UTXO set, which allows verifying the current state without each block

21 having to contain the entire set. Protocol participants just locally compute the current state by executing all previous blocks, generate the hash tree, and then verify the root against the public chain. Additionally, third parties that do not maintain the entire state of the blockchain, so-called “light clients”, can verify the existence of a particular UTXO by verifying a Merkle proof.

The UTXO model significantly reduces the complexity of the data model that trans- actions execute on, but limits storing custom data. Participants in this protocol merely have to maintain the UTXO set to track the state of the blockchain and processing a transaction only involves adding and removing UTXOs to the set. As a result, platforms that are focused mostly on monetary transactions, such as Bitcoin or ZCash, often still rely on the UTXO model due to its simplicity.

2.2.2 The Accounts Model

Ethereum, in contrast to Bitcoin, relies on a data model focused around the notion of accounts. Intuitively, an account has a non-negative balance and can be owned by a particular user. Additionally, accounts can hold other data as well, which enable more complex applications.

Decentralized ledgers implementing the accounts model must provide additional measures for preventing double-spending and other conflicting transactions. First, in- stead of merely verifying the existence of a particular UTXO, the transaction must be verified against the account’s state and ensure applying it to the account willnot violate consistency constraints. Second, as long as there are sufficient funds in the spending account, a malicious node might use the same request to issue multiple trans- actions. To address this, Ethereum transaction requests contain a unique number, or

22 Figure 2.3: Simplified outline of a Patricia tree. Data is stored in a compacted trie structure and each tree node has an associated hash value generated from its children. nonce, and the protocol only admits one transaction for each combination of account identifier and nonce.

One drawback of the accounts model is that the authentication of state is more complex. Here, DLT nodes usually maintain three Merkle trees per block instead of just one as in the UTXO model. The first hash tree provides information about the resulting state of the system, the second hash tree represents the set of all transactions contained in this block, and the third hash tree represents the set of all changes.

In Ethereum and most other Account-based systems state is represented in the form of Patricia Merkle trees [71]. Patricia trees have two key advantages over conventional Merkle trees: there exists a maximum bound on their height and updates are relatively inexpensive. This is achieved by storing data in a compact trie structure asoutlined in Figure 2.3. Unlike when updating a conventional Merkle tree, where a new entry might reorder the set and rebuilding the entire tree, inserting new values in Patricia trees only needs to update the affected subtree. The limited height is achieved by using

23 hash values as object keys, which are guaranteed to be a certain length.

2.3 Smart Contracts

The client-server model, where applications perform computation locally andthen write the resulting state to the database, does not apply to the decentralized setting. Using the client-server model Byzantine actors could attempt storing invalid applica- tion results in the globally replicated ledger and, thus, violate consistency. To prevent such attacks, DLTs provide means to execute arbitrary applications directly onthe ledger itself, similar to stored procedures in conventional database systems.

Ethereum introduced the notion of Smart Contracts, stateful programs that are stored and executed entirely on the decentralized ledger. While previous system al- ready provided some notion of programmability, such as Bitcoin Script, Ethereum smart contracts where the first to provide full Turing completeness and, as a result, the possibility to support arbitrary programs. Smart contracts are usually written in a high-level language, such as Solidity, and then compiled to byte code, such as EVM byte code or WebAssembly, before being stored and executed on the ledger.

Smart contracts reside on a particular address on the blockchain, analogous to how each user’s account is assigned an address. Contracts may hold currency and contain a key-value store to store arbitrary data. Users call functions of a smart contract by issuing a transaction that contains a function call and some amount of currency to pay for the computation.

Similar to transaction fees, a transaction request contains a certain amount of cryp- tocurrency to pay for computation. In Ethereum this budget consists of “gas”, a unit

24 pragma solidity ^0.4.0;

contract BasicToken { mapping (address => uint256) public tokens;

constructor (address gen_account , uint256 gen_amount) public { tokens[gen_account] = gen_amount; }

function transfer(address src, address dst, uint256 amount) public { require ( tokens[src] >= amount );

tokens[src] = amount; tokens[dst] += amount; } } Listing 2.1: A simple token implemented as a Solidity contract representing a computational step, and a maximum amount of gas that can be spent. Each computational step then has a cost in gas paid from the transaction requests bud- get. If the transaction runs out of gas during execution, it aborts.

Contracts can modify their local state directly while executing or invoke functions of other contracts. The latter allows reusing existing code and interaction between different applications. For example, one can implement a custom token ontopof Ethereum that can be used as a form of payment by other contracts. Listing 2.1 outlines a simplified implementation of such a token where the contract manages a mapping from accounts to token balances and tokens can be moved between account using the transfer-call.

25 2.3.1 Limitations of Smart Contracts

First, because smart contracts do not execute or reside on only one particular machine, but on the entire decentralized network, their execution is somewhat limited. Smart contracts must execute in a deterministic fashion to ensure their outcome is identical on every honest participant of the protocol. This can be achieved trivially by executing all transactions in serial order as done in Ethereum but gets considerably harder when introducing hardware concurrency.

Second, because the entirety of the contract resides at a particular address, sharding smart contracts is more difficult. Attempts such as Ethereum35 2.0[ ] shard by account identifier, which means the entirety of a contract is located at a single shard. Thiscan cause problems when a particular contract becomes very popular.

2.4 Concurrent Decentralized Applications

We now introduce the notion of Concurrent Decentralized Applications (CDAs), which is a new data model designed to enable scalability of computation, supports sharding of data, and allows enforcing access policies. At a high level, CDAs combine the ad- vantages of the UTXO and the accounts models — they support small objects, that can be modified independently, while still providing full programmability.

A ledger employing the CDA abstraction processes a set of applications, each con- sisting of object types and associated functions. Applications are defined similar to smart contracts in conventional blockchains, but can execute concurrently without vi- olating strong consistency guarantees. This is achieved by separating the applications state into semi-independent objects and supporting serializable transactions on top of

26 type Account = { "messages": Map> }

pub fn create_account(uname: String) > ObjectId: return db.new("Account", {uname, messages:{}})

pub fn send_message(src: ObjectId , dst: ObjectId , msg: String): #only allow if caller owns source account assert.owns_object(src)

db.list_append(src, ["messages", dst], msg) db.list_append(dst, ["messages", src], msg) Listing 2.2: Implementation of a simple instant messaging application as a CDA these objects.

2.4.1 Objects and Object Types

Applications operate on a global key-value store that stores application-specific objects. The underlying DLT may shard or replicate this key-value store to increase its scala- bility. Each object is a structured set of data similar to those in document stores, e.g.

MongoDB. Objects are collections of attribute-value pairs, where attributes have types such as lists, dictionaries, strings, binary data, and numeric values.

Like a UTXO, each object of a CDA application is owned by a particular public key.

This ownership scheme enables defining who can invoke functions on an object and allows augmenting CDAs with access control mechanisms. This enables, for instance, to only allow the owner of a particular account to spend funds from it.

Object types define how user data will be organized and application-specific consis-

27 tency constraints. To achieve this, they enforce the high-level structure of each object that is an instantiation of this specific type. This enforcement is implemented asaset of fields that each instance of the type must contain and additional predicates defined on these fields. For example, a token application may define an account type thatmust contain non-negative a balance field.

2.4.2 Application Functions

Applications encode functionality as a set of functions that are exposed to users and other applications. These functions interact with a database interface that allows creat- ing new objects and modify existing ones. Here, the underlying DLT provides means to perform these updates in an atomic and isolated manner. A naive way to achieve this is to execute transactions serially, like in Ethereum. Fortunately, the small granularity of objects allows for more fine-grained locking and, as a result, increased concurrency.

Listing 2.2 sketches how we can implement the token application from before as a CDA. Here, the application defines the object type of a token account as the granu- larity of which data is stored by the application. The application leverages the built-in ownership mechanism to ensure tokens are only transferred by the owner. Note that, instead of directly accessing the token data, the application interacts with a database interface to abstract away the concurrency control.

2.4.3 Reservations

We now introduce the notion of reservations, a primitive that enables high concur- rency when processing transactions in a CDA. At a high level, reservations allow for

28 greater concurrency over traditional locking by allowing multiple transactions to hold locks on an object in some cases. For example, more than one transaction can spend from the same account as long as the account’s balance is sufficient. Each reservation is bound to one specific operation that is intended to be applied to a single dataob- ject. Increasing concurrency by composing reservations is crucial in the decentralized setting as transactions here usually have high latency and traditional locking would therefore limit throughput significantly.

Reservations are expressed as predicates on the input and output states of a par- ticular operation. Thus, they can be represented as apair (a; b) of precondition a and post-condition b. As an example, money transfers require two different pairs of predicates: (balance ≥ i; balance ← balance − i) on the spending account and (; balance ← balance + i) on the receiving account, where i is the amount transferred.

DLTs following the CDA model maintain a stable state consisting of all committed and validated operations and a pending state consisting of all reservations of uncom- mitted transactions and transactions that are not validated yet. To manage the pending state, ledgers maintain a reservation set in from of a Patricia Merkle tree. Leaves of the reservation tree can either represent entire objects, or subfields of some object.

To admit a new reservation, the reservation must be checked both against the reser- vation set and the stable state of a ledger, as its state must remain valid independent of whether any of the pending transactions eventually abort or commit. For numeric types, for example, account balances, this can be implemented by tracking its pending state as an interval. A new transaction then can be checked against that interval. For example, we can guarantee an integer is positive if the lower bound of the tracked inter- val is positive. Operations affecting coordination-free datatypes, such as append-only sets, can avoid these checks entirely and rely on invariant confluence instead6 [ ].

29 2.4.4 Implementation

While implementations may vary, ideally, DLTs using the CDA model maintain four authenticated data structures. First, nodes maintain a hash tree for the current state. This hash tree contains all currently existing objects, as well as all applications that are currently registered with the blockchain. Further, the set of all newly accepted transactions and their modifications to the state, like in the Accounts model, eachis encoded in a dedicated hash tree each.

Finally, nodes maintain an authenticated data structure representing the currently held reservations. This allows proving that a certain object is locked, for cross-shard transactions. In particular, a shard can generate a Merkle proof that a reservation is held at a certain point in time and send it to another shard.

2.5 Chapter Summary

This chapter gave an overview of the existing data models in DLTs: UTXOs andac- counts. It then described how the CDA model, combined with reservations, enables scalable decentralized applications. CDAs allow fine-grained locking of data in a de- centralized application. Reservations enable to avoid serial execution in cases where the particular order of certain operations does not matter. In the next two chapters, we describe protocols building on top of these abstractions.

30 CHAPTER 3

BitWeave: Audit-based Sharding for Blockchains

This chapter introduces BitWeave [66], a scalable transaction sharding protocol that is directly applicable to Bitcoin, Ethereum, and other public and private blockchain systems. At a high level, sharding allows breaking the collective workload of a sys- tem into smaller workloads that can be processed mostly independently. BitWeave implements sharding by building on the insight from Bitcoin-NG [31] that “mining” of a block in traditional Nakamoto consensus performs two tasks: leader election and transaction serialization, which are separable tasks. The BitWeave protocol leverages this to differentiate between protocol leaders, that perform consensus, and shard com- manders, that serialize transactions.

BitWeave introduces a novel sharding protocol design allows for concurrent seri- alization of transactions without sacrificing security guarantees afforded by traditional Nakamoto consensus. It does so by allowing the network to self-organize into pools of nodes that concurrently process transactions. Since most blockchains today use account-based transaction models, we describe how BitWeave parallelizes transac- tion processing under this model. However, BitWeave can be easily applied to other transaction models as well, such as UTXO-based transactions.

BitWeave enables sharding by supporting multiple, concurrent shard chains; each responsible for maintaining a distinct subset of all accounts. A transaction may operate on account data that exist in multiple shards, in which case all involved shards process the transaction in tandem. To enable scalability, BitWeave is designed to minimize the number of shards included in a transaction and the amount of communication needed between shards.

31 (a) Bitcoin: Each block contains both leader information and transaction data

(b) Bitcoin-NG: Here two distinct types of blocks exist. Keyblocks contain leader information and microblocks contain transaction data.

(c) BitWeave: This scheme extends the Bitcoin-NG by multiple concurrent shard chains.

Figure 3.1: A high-level comparison of three protocols: Bitcoin, Bitcoin-NG, and BitWeave. Each color denotes a different participant in the system. The circular blocks with keys represent key blocks, and the “Tx” labeled blocks are transaction carrying blocks.

Figure 3.1 sketches this design and demonstrates how BitWeave compares to ex- isting protocols, Bitcoin and Bitcoin-NG. While in existing protocols provide a single chain, our protocol allows for concurrent chains that are eventually merged using so- called keyblocks. This enables higher throughput while still securing all transaction data through a single verified chain. The rest of this section will describe this design in more detail.

32 3.1 Foundation: Bitcoin-NG

While in most protocols the LEADER and ORDER tasks are bundled together in one pro- cess, Bitcoin-NG [31], on which BitWeave builds, breaks down the process of mining in traditional Nakamoto consensus into its constituent processes to increase through- put. The Bitcoin-NG LEADER process proceeds as follows: Miners solve a PoW puzzle and broadcast a special block called a keyblock with the solution to the rest of the net- work, signaling their status as the protocol leader. At that point, the winning miner performs an ORDER process by grouping transactions into microblocks and broadcast- ing them into the network. The entity that mined the most recent keyblock creates and broadcasts microblocks so long as they are the leader. Solely the network speed and how quickly the leading miner can sequence them limit the flow of transactions.

While Bitcoin-NG improves throughput over the conventional Bitcoin protocol, it is still limited to the bandwidth of a single entity executing the ORDER-process. In addition, a single high-throughput chain harms decentralization as every participant of the protocol needs to possess the processing power and network bandwidth to process the chain in its entirety. BitWeave addresses these limitations by accommodating separate microblock chains that each can be ordered by a distinct entity.

3.2 Blockchain Structure

In general, a participant of a permissionless blockchain system may invoke certain system behaviors by sending a transaction request to the network. The network nodes that receive the request keep it in their local storage until they have verified it and included it on the chain or have discarded it for being invalid. In BitWeave, these

33 transaction requests contain a source and target account and may contain an amount to be transferred between the two accounts, a function invocation or both. The issuer of the transaction, i.e. the owner of the source account, further signs the transaction so that other participants can verify the authenticity of the request.

Figure 3.2: In BitWeave, nodes only maintain the full state of shards they follow, while tracking headers of all shards blocks.

BitWeave supports many concurrent shards, each of which serializes transactions by bundling them into microblocks and publishing them onto their own distinct chain.

These microblock chains are periodically joined bya keyblock, which establishes an or- der between microblocks of different shards. BitWeave nodes leverage the ordering provided by keyblocks to establish a happens-before invariant in transaction process- ing – specifically, they ensure that all shards approve of a proposed transaction witha high certainty before it is committed. As a result, each shard in BitWeave establishes a total order among its microblocks, while there exists only a partial ordering between microblocks across different shards.

Figure 3.2 outlines how BitWeave allows different nodes to track only the trans- action data of shards they follow. While the exact layout of the microblock depends

34 on the implementation details of the specific protocol instantiation, it always follows the following structure: The header contains meta-information and is cryptographi- cally signed, and the payload includes transaction data. Unlike Bitcoin-NG, the same transaction might appear in the payload of multiple microblocks representing differ- ent stages of the execution of the transaction. The payload of a microblock contains a sequence of transactions with each followed by a flag corresponding to the specific action regarding the transaction, such as COMMIT or RESERVE. In the following section, we differentiate between shard chains, which contain serialized transactions, and the main chain, which contains fraud proofs and delegation information. While the former is prone to fraud, the latter is protected by Proof-of-Work (or a similar mechanism). We formalize our definition of fraud in Section 3.6.

3.3 Consensus Abstraction

BitWeave provides a generalized mechanism that is independent of the underlying consensus protocol. This allows BitWeave to leverage advancements in consensus protocols that are unrelated to sharding. Abstractly, consensus protocols for decen- tralized ledgers agree on the order of an append-only log that contains transactions with some payload.

The underlying blockchain protocol must be able to support the semantics ofa cryptocurrency and provide the notion of epochs to indicate the passage of time. A cryptocurrency is necessary for mechanisms that incentivized data pods to behave cor- rectly. A new epoch can be indicated by the mining of a new block in Nakamoto-based systems or the admission of a new batch of transactions in a committee-based proto- col.

35 3.4 Roles in BitWeave

Participants in the BitWeave network can hold one or more the following roles: epoch leaders, shard commanders, and shard followers.

3.4.1 Epoch Leaders

Epoch leaders decide which microblocks from the previous epoch are included on the main chain, delegate shard commandership, and handle recovery after detecting shard misbehavior. Epoch leaders are elected through some LEADER process such as Proof- of-Work or Proof-of-Stake, and publish a key-block to the blockchain to denote their leadership. Epoch leaders only maintain meta-information about the chain state – all regular transactions are handled by shard commanders and shard followers.

3.4.2 Shard Followers

Shard followers maintain state for a particular shard and process its microblock chain. Each shard follower thus maintains a subset of the system-wide chain state. Unlike other roles, shard followers are not appointed explicitly by the protocol. Instead, any participant in the network may opt to follow a shard’s chain and validate operations on it. When followers detect shard commander misbehavior, they report it by issuing a fraud-proof to the network.

36 3.4.3 Shard Commanders

A shard commander is a particular shard follower that is appointed to process transac- tions related to a specific shard. Shard commanders perform the ORDER process for their shard by serializing transactions into microblocks on its shard until a new key block is mined. When a key block is mined, the epoch leader names one shard commander per shard through their public key.

3.5 Transaction Processing Overview

Designing a transaction protocol for a sharded blockchain reduces to two fundamental challenges: dividing work among shards and ensuring atomicity in cross-shard trans- actions. At a high level, BitWeave implements a replicated state machine and a sound transaction processing mechanism is necessary so that the system always remains a consistent state. Transactions in BitWeave are thus required to be serializable.

We first describe how a system’s transaction workload is mapped to shards. Trans- actions in BitWeave are composed of a non-empty set of operations, where each opera- tion represents a modification to data and applies exactly to one account. Additionally, we assume that there exists a function that derives the set of operations from a trans- action. For example, a money transfer operation consists of a set of decrements on the source account(s) and a set of increments on the destination account(s). Because of the sharded nature of the BitWeave blockchain, these operations may take place on different shards.

Each shard in BitWeave is responsible for maintaining a subset of all accounts and for processing operations affecting those accounts. The protocol further defines a

37 consistent hash function [43] that provides a mapping from accounts to shards. Since transactions generally operate on more than one account, it is often the case that sev- eral shards are involved in processing a transaction.

The second challenge of sharding is ensuring that all transactions are executed atomically: all shards participating in a transaction’s execution must unanimously de- cide whether or not to execute the transaction and that execution needs to happen in lockstep across all participating shards. For single-shard transactions, this is trivial because there is a total ordering of transactions within an individual microblock chain. Therefore, single-shard transactions are simply included on their respective chains, just as in Bitcoin-NG.

3.5.1 Reservations

After clients issue a transaction request to all shards involved in the transaction, these shards decide whether to include the transaction in their chain or not. Shard check the transaction against their stable and pending states as described in Section 2.4.3 in order to include a reservation on their chain. If the check is successful, shards then update their stable state to represent the inclusion of that reservation.

A reservation for a particular transaction represents a commitment to finalize it.

After a shard has included a transaction’s reservation in its chain, it must participate in the second phase of the transaction.

38 3.5.2 Commits and Aborts

Shard finalize transactions by including a corresponding ABORT- or COMMIT-entry on their chain. Commits denote that all affected shards have included the required reser- vations on their chain. Once a commit has been included in the chain, shards will release the associated reservations and modify the chain state accordingly.

If a transaction does not acquire all required reservations in time, shards issue ABORT operations for the transaction. Once the chain includes the abort, shards re- lease all associated reservations without modifying the chain state.

3.5.3 Efficient Cross-Shard Communication

BitWeave enables lightweight message passing between shards by only forwarding relevant reservations from one shard to another. This allows nodes in the system to verify a partial view of the total state: Nodes solely follow the full chains of shards they are interested in and rely on authenticated messages from other shards to reason about cross-shard transactions.

The protocol supports two different kinds of microblocks to facilitate cross-shard transaction: message blocks and transaction blocks. A message block concisely summa- rizes the results from the previous epoch for other shards. Transaction blocks admit new transaction to the shard’s chain or commit currently pending transactions.

Shard commanders must publish a message block at the beginning of each epoch summarizing the last epoch. The message block contains a hash of the new stateand multiple payloads, each containing a set of messages for a specific shard. A message consists of the transactions identifier and a Merkle proof certifying that a particular

39 reservation was acquired. Figure 3.3 outlines this structure.

Figure 3.3: Sketch on how message blocks capture changes to the shard state: Shard one locks Tx1, which covers shards one and two, and Tx2, which covers shards one and three. Thus, shard one sends a proof or reservation for Tx1 to shard two and for Tx2 to shard three respectively.

Nodes in BitWeave connect to multiple relay networks to efficiently implement this scheme. Namely, there is one main network that propagates block headers and key blocks, and multiple shard-specific networks exist that propagate transaction data.

Nodes then advertise their subscriptions to specific shard upon connecting to other peers and ensure they are connected to a sufficiently large number of peers for each network.

3.5.4 Transaction Fees and Miner Rewards

A core component of Bitcoin are built-in incentive mechanisms for network partici- pants, namely rewards for mining new blocks and transaction fees for validating trans- actions. While the reward scheme for newly mined key blocks can directly be derived from Bitcoin’s mechanism, BitWeave transaction fees require a more complex scheme

40 due to the existence of shards and microblocks. BitWeave treats transaction fees for each shard independently to reduce the amount of communication required during transaction execution. Transaction requests thus specify fees on a per shard basis, which also allows accommodating congested shards.

BitWeave splits fees three ways for each operation on a shard’s chain to maximize throughput. The intuition behind this is merely an extension of the proof provided by Eyal et al. [31]. Their work showed that fees should be split 40/60 between two consecutive epoch leaders to incentive the current leader to include as many blocks as possible from the previous epoch. BitWeave extends this scheme by allowing the current epoch leader and the current shard commander to split the first 40% of the fee, the ratio of which is determined between the specific shard commander and the epoch leader during shard delegation.

The main chain keeps track of a small set of accounts for actors involved inleader election and transaction processing. For nodes to be able to become miners or shard commanders, they need to create a globally viewable account. For miners, this ac- count is solely used for rewards, while shard commanders need to put up a deposit to start their tenure as commander. Nodes either collect funds by mining blocks or by transferring funds from a shard’s account to create such a globally-visible balance. Similarly, shard followers need to maintain funds on the main chain to be able to issue fraud-proofs (Section 3.6.2) and availability wagers (Section 3.6.4).

Miners do not parse the entire content of shard blocks to extract fees but merely the block header, which contains the accumulated fees. Thus, miners rely on shard followers to validate the correctness of the fees specified in the shard block headers. Funds created from fees are locked until the associated transaction is validated to make this scheme secure.

41 Figure 3.4: Lifetime of a cross-shard transaction: The transaction is not considered finalized until all reservations and commits have been included and validated onall shard chains.

3.6 Fault-Tolerant Transaction Processing

The BitWeave protocol assigns an overall timebound to each transaction, the confir- mation interval, to address Byzantine failures, such as malicious nodes or network par- titions. This confirmation interval allows for ample time to verify a transaction’s cor- rectness before it is applied to its corresponding shards’ state. A transaction is thus atomic because it does not modify state until it passes through the confirmation inter- val on all shards and all locked resources are released if the transaction does not get confirmed in time or aborts.

Figure 3.4 sketches how confirmation intervals are broken down for a cross-shard transaction. As the protocol is an extension of two-phase commit, we separate the confirmation interval into a reservation phase, in which the required resources for the transaction are locked via reservations, and a finalize phase, in which the transaction is confirmed or aborted. Each phase proceeds in two periods to account for Byznatine behavior. The ISSUE period, where shard commanders issue acknowledgments on-

42 chain to signify that shard-specific operations were performed, and the CHALLENGE period, where Byzantine behavior in the ISSUE period is amended through a fraud- proof and rollback mechanism (Section 3.6.2) carried out by shard followers. To ensure that shards are coordinated on which period of the confirmation interval is occurring, each period is measured in a certain number of epochs from the key-block height at which the transaction was created by the client, the length of which is mandated by the protocol.

3.6.1 Detecting Fraud

BitWeave depends on shard followers to aid in validating of transactions and detecting fraud. Reservations and commits cannot be immediately assumed to be valid, as mali- cious behavior during the ISSUE period of each phase is resolved during its correspond- ing CHALLENGE period through fraud-proofs submitted by shard followers. Followers are motivated to participate in this behavior through built-in financial incentives. Cor- rectness can be maintained by the presence of a few honest shard followers and does not require a majority of honest shard participants.

The length of the CHALLENGE period must be set to a sufficiently large number of epochs, so that an issued reservation, commit, or abort guaranteed to be correct af- ter its CHALLENGE period has passed. Instantiations of the BitWeave protocol thus set the length of this period such that it is virtually impossible for a sequence of ma- licious epoch leaders to be active during their entire duration and to tolerate network propagation latencies. To determine the length of the CHALLENGE period, the proto- col sets time bounds for the propagation delay tp and the validation delay tv, which is the period it takes for at least one honest epoch leader to be available. The protocol then sets the length of the CHALLENGE period as a function of these parameters, where

43 tchallenge = tv + 2 ∗ tp. Additionally, it should hold that the ISSUE period is at least as long as tp, to avoid unnecessary aborts.

A concrete implementation, based on Bitcoin-NG, sets those values to tp = 3 and tv = 10. We can show that for a validation delay of ten epochs we show in the proba- bility that at least one honest epoch leader exists is very high. Assuming, like previous work, that at most 25% of the mining power (or stake) is controlled by malicious parties, the following holds.

P r[“# honest leaders > 0”]

= 1 − P r[“# honest leaders = 0”]

= 1 − 0.25k ≈ 1 − 9.54 ∗ 10−7 ≈ 1

This probability is lower than the probability of the chain’s last k blocks being rewritten by a malicious actor, based on the proof in the original Bitcoin paper. Bit- coin, like BitWeave, assumes a tight network propagation bound to ensure timely convergence of the chain. This is a realistic assumption, as long-lived global network partitions are highly unlikely.

3.6.2 Fraud-Proofs

At a high level, fraud-proofs are messages containing references to discrepancies in a shard’s chain. The protocol differentiates between three different kinds of fraud-proofs, each relying on a set of Merkle proofs to concisely demonstrate their correctness. First, a fraud-proof is raised if a shard block is inconsistent with its header. This can be the case if any of the Merkle-roots are not consistent with the state stored in the block’s

44 body. Second, if a transaction block contains a reservation, commit, or abort that is not consistent with the shard’s state, a fraud-proof is raised. This can be demonstrated by making the fraud-proof contain a snapshot of the involved object(s) and reservation(s).

Third, a fraud-proof is raised if a message block contains an invalid message, or missesa message. The former can be shown by demonstrating that the Merkle proof associated with the message is incorrect, while the latter is shown by pointing to a commit ofthe previous epoch(s) that does not have a corresponding message.

Fraud-proofs are submitted to the epoch leader and enable it to correct ashard state by reverting the history of the shard to the state right before the conflicting block, thus nullifying the fraudulent transaction. Having epoch leaders process fraud-proofs ensures that the malicious behavior is rectified on-chain and ensures that every party will see the correction, preventing nodes from operating on an invalid shard state.

Fraud proofs, once included in the main chain, override shard state. Depending on the kind of fraud, either a specific block is invalidated, a certain operation is undone, or the contents of a message are modified. The previously described CHALLENGE peri- ods ensure that rollbacks do not affect transactions that are considered committed and valid.

3.6.3 Incentivizing Fraud-Finding Behavior

BitWeave’s incentive mechanism is designed to reward fraud finders and punish mis- behaving shard commanders. Nodes pay a deposit to become a shard commander, which is withheld in case they misbehave. If a shard follower detects misbehavior and successfully submits a fraud-proof, the misbehaving commander’s deposit is forfeited and collected by that particular follower.

45 Further, BitWeave employs two mechanisms to ensure that rewards are not stolen from honest validators. First, the system relies on cryptographic commitments to pre- vent followers from sending fraud-proofs in plaintext. Followers issue commitments in the form of a cryptographic hash of the fraud-proof, and not the proof itself. Once the main chain includes a commitment, followers reveal the fraud-proof’s content to collect the owed reward unless an earlier fraud-proof has already collected the reward.

Like regular transactions, fraud-proofs include a fee to prevent denial-of-service at- tacks. Fees for fraud-proofs render it infeasible for an adversary to issue many invalid fraud-proofs. Because the fraud-proofs rely on cryptographic commitments, the epoch leader cannot distinguish between valid and invalid fraud-proofs at first; however, it can detect if the issuing party has enough funds to pay for the fraud-proof transaction. Epoch leaders include fraud-proofs independent of their validity and collect the asso- ciated transaction fee to limit the number of invalid fraud-proofs that can be issued.

3.6.4 Ensuring Shard Availability

The BitWeave protocol relies on probabilistic sampling and availability wagers ensure liveness, similar to how fraud-proofs guarantee safety.

Honest miners rely on sampling to ensure they only include shard blocks on the main chain for which the payload is available. First, shard commanders encode their blocks using Reed-Solomon error-correcting codes, so that parts of a shard block’s payload can be sampled efficiently. Then, instead of parsing the entirety of every block, they sample a small fraction of it from the network and reject all blocks where that sample is unavailable. Al-Bassam et al. [4] demonstrated that in this setting querying as little as 1% of a block’s content is sufficient to tell with very high probability whether

46 it is available or not and we refer to their work for a more detailed description of this sampling scheme.

If a malicious miner includes an unavailable block, followers issue an availability wager to the main chain requesting a specific block’s payload to be published. A shard follower that issues an availability wager must attach funds that represent their con- fidence that a commander is misbehaving and multiple wagers from different shard followers can be attached to the same block. If the commander does not reveal the block after a set time, the challenging follower is rewarded the wager amount, half taken from the commander’s deposit, the rest taken from the miner that included the unavailable block.

The protocol requires followers to pool funds to ensure no unnecessary wagers are processed and to provide a shard availability heuristic for followers. Upon detecting a potentially unavailable block, a shard follower issues a wager with some funds attached. Wagers must reach a certain threshold of funds before epoch leaders may include them in the chain. When the wager propagates through the network, other followers can either opt to attach more funds to the wager or reveal the block payload to the network if they have access to it. Eventually, the wager will either collect enough funds to be included in a key block or it expires because the block was made available.

If the commander reveals the block in time, the follower’s wager instead is burned and the protocol proceeds without change, which ensures that shard commanders do not profit from withholding blocks. Thus, a rational shard commander will always make their block payloads available as soon as possible to ensure their block headers are included in the main chain.

47 3.6.5 Adaptive Confirmation Intervals

The guarantee that there exists at least one honest epoch leader duringeach CHALLENGE period is not sufficient for all cases. In particular, a single key block might not beableto hold all fraud-proofs and availability wagers currently in the network. This problem is exacerbated by the fact that BitWeave makes no assumptions about the correctness of the shard commander. In the worst case, a single malicious actor could be in control of many shards, which would result in the main chain not including fraud-proofs before a transaction’s challenge or validation period expires.

The BitWeave protocol is designed to extend a transaction’s confirmation interval in the case of fraud to allow for enough time to reconcile the chain state using global challenge extensions. Key blocks contain a flag that indicates whether all fraud-proofs have been processed. If this flag is set, a global challenge extension is issued, which prolongs the CHALLENGE periods of all pending transactions by the validation delay tv. The latter ensures another honest leader is around in time to process theremain- ing fraud-proofs. Availability wagers similarly extend the CHALLENGE periods of the affected shard to allow time for the newly revealed block to propagate or for thechain to roll back. This means that an excess of fraud-proofs will stall the overall throughput of the chain to ensure correctness, but ensures that safety of the protocol is always maintained.

Similarly, BitWeave accommodates shards with excessive workloads through FINALIZE extensions, which lengthen the finalize-ISSUE period for currently pending transac- tions of that shard. In other words, followers of a shard do not count an epoch towards the finalize-ISSUE period of a transaction if there are still transactions to be committed on the shard or if a shard’s epoch is empty, i.e., no message block is published for that epoch. This means the confirmation interval of all pending transactions is extended to

48 ensure there is ample time to validate all commit and abort messages.

3.7 Correctness

We provide a proof sketch demonstrating that the BitWeave protocol upholds safety and liveness, assuming the assumptions of Section 1.3.1 are not violated and assuming the underlying consensus protocol behaves as specified in Section 3.3. In particular, we assume that there is a known bound on the network latency tp and that the con- sensus protocol has at least one honest leader within the validation delay tv. Then the

CHALLENGE period is set to at least 2 ∗ tp + tv. Additionally, we assume each shard has at least one honest follower and the underlying consensus protocol does not violate safety or liveness.

Our proofs for safety and liveness rely on the following two lemmas:

Lemma 1: The number of challenges to a particular transaction, in the form ofpoten- tial availability wagers, fraud-proofs, global challenge extensions, and shard finalize extensions, is guaranteed to be finite.

Proof. We first show that only a finite number of availability wagers canaffecta particular transaction. Availability wagers can only affect a transaction at the ISSUE period, which is a fixed number of epochs. Since epochs are finite, the number of blocks contained in the ISSUE period is finite. A single transaction has at most two issue phases, at most one message per shard, and will be involved in a finite number of shards. Thus, the number of blocks involved in a shard’s ISSUE period is finite, which means that the number of potential availability wagers involving this transaction is

49 guaranteed to be finite.

Recall that global challenge extensions are issued when the main chain receives too many fraud-proofs, and that shard finalize extensions are issued when the number of commits (or aborts) to be processed exceeds the shard’s capabilities. Global challenge extensions and shard finalize extensions merely extend the current epoch and donot allow any new transactions. Since shards (and the main chain) are both assumed to eventually have an honest commander (or leader) who will continue processing, the number of global challenge extensions and shard finalize extensions is finite. We see that only a finite number of fraud-proofs can be issued as all must be issued during the global challenge and shard finalize periods.

Lemma 2: Any given shard will eventually accept new transactions.

Proof. BitWeave incentivizes shards to accept new transactions through the use of fees. Shards are financially incentivized to include transactions in their chain in order to collect their fees as revenue. As there is no base block reward for microblocks, if shard commanders fail to include any new transactions, they will fail to make any profit from their work. Therefore, any shard whose shard commander is rationaland honest will accept new transactions.

A rational epoch leader is incentivized to replace a slow or non-responsive shard commander. If a shard does not process transactions, the leader that appointed the shard commander loses revenue. As a result, any rational epoch leader will be in- centivized to pick responsive shard commanders and replace unresponsive ones. A rational epoch leader will eventually be chosen due to the assumptions made on the underlying consensus protocol.

50 Combining those two facts above, we see that eventually a rational epoch leader will be chosen and such a leader will ensure that shard commanders are rational and honest. Consequently, any given shard will eventually accept new transactions.

3.7.1 Safety

BitWeave guarantees that (1) particular shards do not violate consistency and that (2) cross-shard transactions are atomic and consistent. The concrete consistency con- straint is defined by the particular application. E.g., for cryptocurrencies BitWeave guarantees that account balances are always non-negative.

While a particular shard commander might misbehave, such misbehavior will be overwritten by fraud-proofs on the main chain before the CHALLENGE period has passed. As described before, a shard’s stable state is guaranteed to be safe, while the pending state might exhibit inconsistencies. That means, only after the CHALLENGE period has expired can clients expect the outcome of a transaction to be consistent and immutable. Transactions that are not yet fully executed or validated are considered internal states of the protocol and should not be propagated to the end-user.

Proof. To prove safety, we first show that all cross-shard communication happens within a particular time interval that is pre-set by the protocol. Then, we show that any particular shard, given the right messages and waiting for the appropriate periods, will not violate consistency and will decide on whether the transaction should commit.

A transaction starts by reserving the objects that it will modify. A valid reservation must be included before the beginning of the associated reserve-CHALLENGE period and a CHALLENGE period is at least 2 ∗ tp + tv epochs long. Shards must include a message

51 block at the beginning of every epoch. If they do not, the particular epoch does not count towards a CHALLENGE period. It follows from Lemma 2 that the number of epochs without message blocks is finite. Note that, if the next message block after a reservation does not include a message for that reservation, the associated fraud-proof serves as a message instead. We thus see that for every reservation, a message is generated at least 2 ∗ tp + tv epochs before the end of the associated CHALLENGE period.

After the reservations are made, the transaction is committed or aborted atthe corresponding commit-ISSUE phase. We now show that messages sent between shards are guaranteed to arrive before the corresponding commit-ISSUE phase begins. Shard followers and commanders parse the main chain for message blocks. Message block headers contain the hash and shard identifier for each of its payload. As a result, shard followers and leaders of other shards can always identify the existence of a message relevant to them. They can then request the message or resort to an availability wager if it is not available. As with transaction blocks, the availability wager extends the challenge period ensuring that the block’s payload will arrive in time.

Thus far, we have shown that all reservations will be sent and received by allshards before the corresponding commit-ISSUE phase. This proves that all cross-shard trans- actions are atomic. We now must show that single shard and cross-shard transactions do not violate consistency.

In particular, we show that for every operation on a shard chain, there exists a finite value k such that once the entry is buried k key blocks deep, it is guaranteed to be consistent. Recall that, every operation, e.g., the commit of a transaction, is recorded in a shard’s microblock. A microblock is only considered part of the chain after it, or one of its successors, has been referenced by a key block. Shard followers track the main chain for block headers. Additionally, they parse all availability wagers

52 and fraud-proofs and adjust their state if they affect shards they follow.

For every operation part of some transaction T , k is at least 2∗tp +tv, the minimum length of a regular CHALLENGE period. If there is a header for a shard block and a shard follower has not received its payload after tp, they issue an availability wager.

The availability wager will then be propagated to the network in atmost tp. If a valid availability wager extends T ’s confirmation interval by a = 3∗tp +tv, the length of the CHALLENGE period plus an additional round trip time to allow the shard to propagate the block’s payload, we also increase k by a. It follows from Lemma 1 that the number of the possible availability wager extensions of this form is finite.

Once a shard follower has received a microblock’s payload, they verify it. When they receive the payload, there is at least tp of the challenge period remaining. Thus, if the follower finds inconsistencies, sufficient time remains to generate a fraud-proof, to send it to the main chain, and for the main chain to propagate it to all participants of the protocol. A fraud-proof invalidates the affected entries. Invalidated entries are always consistent with the shard’s state as they do not modify the shard’s state.

This completes the proof for safety.

3.7.2 Liveness

BitWeave guarantees that it will eventually decide to either commit or reject a trans- action (progress) and that it will not reject all transactions (non-triviality).

Proof. We see that transactions are finalized and either committed or rejected within a finite amount of time by applying Lemmas 1 and 2. As there is only a finitenum-

53 ber of possible extensions to a transaction and shard do not generate empty epochs indefinitely, all parts of a transaction will eventually be processed by each involved shard.

We apply Lemma 2 to prove non-triviality. Shard commanders that do not gen- erate empty epochs, either include commit and abort messages for currently pending transactions or reservations for new transactions. Assuming the number of pending transactions is finite, shard commanders will eventually accept new transactions.

3.8 Case Studies

BitWeave applies to different consensus protocols and data models, the latter ofwhich we describe in this section. Consensus protocols, both permissioned [12] and permis- sionless [73], that provide the abstraction of leader election can implement BitWeave’s main chain. Because adapting BitWeave to different consensus protocols is trivial, this section merely discusses the data model of two popular systems based on Nakamoto consensus: Bitcoin and Ethereum.

3.8.1 Applying BitWeave to Ethereum

Ethereum introduced the notion of smart contracts, which allow for arbitrary programs to be executed as part of a blockchain protocol. Smart contracts can be viewed as a form of stored procedures, where each invocation results in the execution of a transac- tion. Usually, smart contracts are compiled from a high-level language to some form of assembly that can execute in a lightweight environment, e.g. the Ethereum Virtual Machine (EVM).

54 BitWeave already supports Ethereum’s account model and can be extended to sup- port smart contract execution. Clients prepare such transactions by executing smart contracts on their local state. From this, they derive a set of reads and writes that are mapped to BitWeave operations. While pessimistic locking might be more efficient in a geo-distributed setting, OCC keeps the complexity of the protocol low and still performs well in the absence of high contention around certain data objects. Further, note that while end-to-end latency in BitWeave is high, the latency between client execution and reservation at the shard can be kept fairly low.

We sketch how this scheme works using ERC-20 [34] token transfers as an exam- ple. ERC-20 is a standard that allows developers to implement their transferable token using the Ethereum blockchain. In essence, these token can be transferred between users and exchanged against Ethereum’s currency ether. ERC-20 tokens are imple- mented by a single smart contract that tracks a mapping from tokens to users. To buy tokens from another party, one needs to transfer funds to the account of that party. The remote party then invokes the ERC-20 smart contract to transfer the requested tokens. This three-way transfer must execute atomically to ensure nobody loses their ether or tokens without compensation during the process.

A token transfer between two parties, Alice and Bob, is implemented using a trans- action that is applied to the two parties’ accounts and the token’s smart contract. The transfer of ether is implemented as before, where a reservation locks funds on Alice’s and another reservation ensures that Bob’s account exists. This transaction is then extended to apply another reservation to the smart contract that locks Bob’s token. Once all three reservations are applied the transaction can commit, and the respective account and smart contract states are updated.

55 1.0

0.8

0.6 CDF 0.4

0.2

0.0 0 2 4 6 8 10 12 14 16 18 20 Number of UTXO inputs Quantifier Sep 2018 - Mar 2019 Number of Transactions 49,413,279 Mean Number of Inputs/Tx 2.248 Median Number of Inputs/Tx 1.0 Standard Deviation. 11.083

Figure 3.5: Statistics of UTXO inputs to Bitcoin transactions between September 1, 2018 to March 1, 2019. Most Bitcoin transactions only have one input.

3.8.2 Applying BitWeave to Bitcoin

Thus, far we have described BitWeave under the assumption of an account-based transaction model. Now we show that it is easy to extend BitWeave to accommodate a transaction model based on UTXOs, the same transaction model used in Bitcoin.

While in the account model data is sharded by account-id, in the UTXO model we can directly shard by transaction outputs to keep the number of shards involved in the transaction low. Therefore, a new transaction t with input UTXOs u1, u2, ..., un is assigned to the shards responsible for the transactions that produced its inputs. To keep the number of shards involved in a transaction low, we then pick the identifiers of the transaction outputs such that they map to shards that contain one or more of the transaction’s input.

56 We surveyed the BTC transaction history for a snapshot of six months, from Septem- ber 1, 2018, to March 1, 2019, which includes a total of 49, 413, 279 transactions. Figure 3.5 shows this data in more detail, from which we can infer that the majority of recent

BTC transactions (almost 80%) take in only 1 input, and the mean number of inputs to a transaction from the time series surveyed is 2.248. Therefore, given this empirical data and the sharding approach proposed above, the number of shards involved in an average BitWeave-BTC transaction is expected to be less than 3.248 shards (Assum- ing an average transaction’s inputs are all from different shards and are unique from the transaction’s output shard).

3.9 Implementation Details

BitWeave is implemented in about 10k lines of C++ code. The implementaion is heav- ily optimized for concurrency, so that multiple cores can be leveraged to process multi- ple shard or verify multiple signatures at the same time. BitWeave’s implementation can be broken down into two distinct parts, the global ledger logic and the shard logic.

The global ledger logic keeps track of the currently winning chain and notifies shard about the start of a new epoch. When a new epoch starts, some shard blocks might need to be undone in case they are “chopped of” by the new key block. In case ofa change of the currently winning chain, several epochs might be needed to rollback and re-executed depending on how far back the chain goes. The shard logic keeps track of all account data, currently pending transactions, and all held reservations.

Nodes can then decide to start mining on the global chain, to follow a particular set of shards, and request to be commander for certain shards. Each of these three tasks is maintained as a separate state machine to increase modularity. This allows, for

57 example, run a non-mining node that sequences shards or a mining node that does not follow any shard.

3.9.1 Reducing Transaction Footprint

In a naive implementation, each transaction block would contain a sequence of com- plete transaction requests. This would generate a lot of redundant data as cross-shard transactions may appear up to 2k times on the blockchain, where k is the number of shards involved. Additionally, because transactions requests are issued by sending them to the blockchain network, clients most likely already received the transaction requests before it is included in a shard’s chain.

The BitWeave prototype thus splits the transfer of shard information from the transfer of transaction requests. In this particular instantiation, transaction blocks merely contain the identifier of the transactions and a hash of the request’s content. The latter is needed because a malicious client might issue different transactions with the same identifier. Clients then retrieve the concrete transaction request if needed.

3.9.2 Block Size and Epoch Length

Choosing a transaction block size for BitWeave is challenging. BitWeave splits block body and header to reduce overall communication but requires all nodes to verify all block headers, so a small block size would impose higher verification overheads on nodes. Verification of just the block header requires verifying a cryptographic signa- ture, which is expensive. Increasing the size of transaction blocks will impose lower verification overheads, but will increase the propagation delay of blocks, which inturn

58 reduces the chance of blocks being included by the next epoch leader. Some cryptocur- rencies support transaction block sizes of one Megabyte (Bitcoin) to 32 Megabytes (Bit- coin Cash), while others opted to allow for variable block size limits depending on the workload (Monero).

BitWeave allows for blocks up to one Megabyte, which is already proven to be an efficient block size in Bitcoin-NG [31]. Assuming a 20-byte account identifier and SHA-

256 signatures this allows for up to 20, 000 operations per transaction block. Our shard commander implementation starts issuing transaction blocks if at least 100 operations are queued up or at least one second has passed since the last transaction block. We found this is the best compromise between keeping lost microblocks due to forks low and keeping the total number of blocks low during periods of high workload.

Likewise, shortening epoch length reduces confirmation times but also threatens the stability of the system due to increased network disagreement. Bitcoin and Bitcoin- NG have an epoch length of ten minutes, while Ethereum epochs are only 10 seconds long. BitWeave’s prototype enforces epoch length of one minute to keep transaction confirmation times considerably low, while also avoiding unnecessary forks.

3.9.3 Congestion Control

Shard leaders might generate blocks for outdated epochs because they have not yet processed the most recent blocks key. In such a case would they generate a potentially large number of orphan microblocks. We observed this effect when evaluating the initial implementation of the shard commander logic.

The BitWeave implementation addresses this by introducing a simple congestion control scheme. At the beginning of an epoch, if a shard commander notices that a

59 large fraction of their blocks from the previous epoch have been discarded, it will halt block generation for a few seconds to make sure it catches up with the main chain. If such instances repeat, it will exponentially increase the timeout to accommodate the congestion. In our implementation, we observed the best results for initial timeouts of 10 seconds and a growth rate of 1.5. This scheme is not enforced by the protocol but aids to increase the shard commander’s revenue. Thus, shard commanders are incentivized to implement some variant of it.

3.10 Experimental Evaluation

10

8

6

4

Throughput (1000x txs / s) 2

0 0 1 2 3 4 5 6 7 8 Number of Shard Leaders Figure 3.6: Scalability of BitWeave with an increasing number of shard commanders

We simulate a geo-distributed blockchain network for all experiments to have a realistic assessment of the BitWeave protocol. All nodes are connected to a relay network, similar to how existing blockchain networks are connected through relays such as FIBRE [18]. The network simulates four different geographic regions withup to 100ms latencies between them. We run 200 miners with equal mining power to ensure that a realistic number of forks are generated. Note, that current blockchain

60 40000 4

30000 3

20000 2 Throughput (ops / s) 10000 1 Transaction Complexity (ops / tx)

0 0 0 1 2 3 4 5 6 7 8 Number of Shard Leaders Figure 3.7: Total rate of operations (prepare, commit, abort) with an increasing number of shard commanders

1750

1500

1250

1000

750

Throughput (txs / s) 500

250

0 0 2 4 6 8 10 12 14 16 Number of Logical Shards Figure 3.8: Overhead generated by sharding: We increase the number of logical shards while keeping the number of shard commanders constant systems have usually about 10 to 20 mining entities with enough mining power to compete in the consensus protocol [37].

Each experimental run executes for about one hour to accommodate BitWeave’s long-living transactions. We then evaluate a period of 30 epochs in the middle of the

61 run so that the measurement contains a mixture of reservations, commits, and aborts.

We configure a test setup that is bottlenecked on the processing power oftheshard commanders to assess the scalability of BitWeave. For each shard commander, we allocate a distinct x1e.xlarge instance on Amazon EC2, which has 4 virtual CPU cores and 122 Gigabytes of RAM. The large memory requirement is not a limitation of the protocol, but a shortcoming of the prototype implementation which keeps all pending transactions in memory. Consequently, a paging mechanism would reduce the memory footprint of the implementation significantly.

In this experimental setup, we increase the number of logical shards and the rate at which new transactions arrive in the network with the number of physical shard commanders. In particular, we double the number of shard commanders at every con- figuration. Additionally, we set the number of logical shards in the experiment equal to the number of shard commanders to get a fair comparison as each logical shard introduces more work for processing and executing transactions.

An excess of transaction requests creates additional verification work for transac- tions that will not be processed in time, while a low number of transaction requests results in the throughput of the system not being fully utilized. We thus set the rate of incoming transactions such that the CPUs are fully utilized on each shard. Clients issue a steady sequence of transactions sending money from the client’s account to some other uniformly-sampled account. We increase the workload with the number of shards by increasing the number of client machines issuing requests. This ensures that clients are not the bottleneck.

62 3.10.1 How well does BitWeave’s overall throughput scale?

Figure 3.6 shows the throughput of the blockchain network as a function of the number of shard commanders. We evaluate the network with up to 8 shards, where the net- work can process over 9, 000 transactions per second. In particular, throughput scales linearly at a constant rate of about 1.6. Note, that executing BitWeave with a single shard is equivalent to the behavior of the Bitcoin-NG protocol.

From these results, we can conclude that, assuming the existence of an efficient relay network, BitWeave’s throughput is solely limited by the processing power of the shard commanders. More concretely, our experiments revealed that the major bot- tleneck occurs when transactions are issued and have to be verified by nodes they execute on. While there is a theoretical upper bound for the amount of scalability possible, when verifying block headers generates a significant overhead, but we ex- pect this limit to be far from reach for current configurations. Further, the computing power of the machines running the shard commanders is rather modest and more ef- ficient machines would probably yield even better overall throughput. Finally, the network bandwidth utilized by any blockchain node in any configuration was strictly below 20 Mbit/s. Thus, BitWeave can scale to support realistic workloads in a real-life environment.

3.10.2 How does sharding affect the transaction footprint?

Figure 3.7 demonstrates how transaction complexity, i.e., the number of operations in- volved in a single transaction, increases with a higher number of shards. If the number of shards is increased, more transactions execute across multiple shards. A cross-shard transaction consists of four operations (two reservations and two commits) and poten-

63 tially more if a transaction needs to be aborted and resubmitted. Because of this, the transaction complexity stabilizes at slightly above four operations, while the number of total operations keeps doubling when the number of shards doubles.

3.10.3 What is the overhead generated by cross-shard messages?

To determine the overhead of cross-shard messages on the throughput, we conduct a microbenchmark where a single shard commander executes a varying number of shards. Figure 3.8 shows the result. While more shards allow for more concurrency, they also generate more metadata on the blockchain. This overhead is especially salient when moving from one to two shards. After that, the penalty stays roughly constant as all transactions touch at most two shards.

3.10.4 How well does the protocol handle failures?

1750

1500

1250

1000

750 Throughput op/s 500

250

0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Fraud Proofs per Second Figure 3.9: Performance with regard to fraud-proofs. We run a single shard and grad- ually increase the frequency of transaction blocks that are invalidated.

64 We evaluated a microbenchmark, which simulates shard failures to determine the overhead introduced by applying fraud-proofs to the chain. In this experiment, we keep the client workload constant while introducing a steady sequence of fraud-proofs into the network and observe how they affect performance. Here, each fraud-proof in- validates an entire transaction block (i.e., hundreds of operations) to simulate large scale failures. We further evaluate performance with a single shard, so that shard com- mander failures have the highest impact.

Figure 3.9 shows that performance degradation is negligible during failures as in a real-world setting the fraud rate would most likely not exceed one failure asecond.

The intuition behind this result is that discarded operations do not necessarily cause an abort, but extend the transactions challenge period and allow the new shard com- mander to resubmit the operation. Because the performance is mostly dominated by verifying transactions, shard failures come with a rather small penalty on the through- put generated by the re-issuing of operations and the increased key block size.

3.11 Discussion and Open Problems

3.11.1 Shortening CHALLENGE Periods

Recent work investigated how to reduce confirmation times in Bitcoin. ByzCoin [48] reduces confirmation times by establishing a set of validating nodes. Members ofthe validator committee are selected by observing which entities mined thelast n blocks on some identity chain based on PoW. These validators then issue blocks using acon- ventional quorum based consensus protocol. Thunderella [76] makes this scheme more resilient against churn, i.e., validators leaving and rejoining the network. The ByzCoin

65 approach is directly applicable to BitWeave. Here a committee generates key blocks that are guaranteed to be benign, which reduces validation time from 6 epochs to just one.

Another way to reduce CHALLENGE periods significantly is to tighten to bound on network propagation delays through more efficient relay networks. Currently, BitWeave assumes a network propagation delay of about three minutes. BloXroute [46] is one approach to reduce delays through a centralized but untrusted relay network. Similar to solutions layered on top of a blockchain, these “layer 0” solutions are or- thogonal to BitWeave.

3.11.2 Reducing Chain Size

A common problem with blockchains is that they continuously grow in size, which results in a large storage overhead for nodes participating in the protocol. This prob- lem is exacerbated in high-throughput blockchains, such as BitWeave, as significantly more transactions are admitted to the chain.

BitWeave can enable nodes to reduce storage size significantly through on-chain snapshots. Snapshots allow nodes to track a lightweight representation of the blockchain’s stable prefix. Shard commanders regularly publish a snapshot that encapsulates the current state of the shard. Once the snapshot has been validated and is buried deep enough in the chain to be considered stable, nodes discard all microblocks of that shard up until the snapshot. They merely maintain a representation of the main chain that verifies that the snapshot has been generated by an authorized source and nofraud proofs have been raised against it. Assuming the number of accounts stays the same, this stops the blockchain from linearly growing with the number of transaction to some

66 constant size.

3.11.3 Adapting to Changing Workloads

The workload of a blockchain varies, which can affect BitWeave’s performance and safety. For example, when a token goes on sale many clients may issue transactions for a specific smart contract. This is a problem because one shard commander mightnot be able to process all the incoming requests. On the other hand, if overall the number of transactions and thus active participants in the network declines, there may not be a sufficient number of validators per shards to ensure safety.

The BitWeave protocol can be extended to support a varying number of shards to address this. As in conventional database systems, we differentiate between virtual shards, which are of a constant number, and logical shards, that contain one or more virtual shards and are assigned to logical shards [94]. Logical shards are then assigned to a single validator like before. Nodes that get assigned to a different shard leverage snapshots so that they do not have to parse the entire shard’s chain.

We propose a load balancer function as part of the BitWeave protocol that follows a similar mechanism as the difficulty adjustment in Bitcoin. While the difficulty ad- justment in Bitcoin is a function of epoch length, the load balancer in BitWeave is a function of the number of transactions in and length of the last epoch. The func- tion deterministically defines how many logical shards are supposed to exist andhow the virtual shards are assigned to them. A lower number of logical shards reduces the amount of concurrency in the system while allowing for easier validation of shard commanders.

67 3.12 Chapter Summary

This chapter introduced BitWeave, a blockchain protocol that scales linearly with the number of shards while maintaining Byzantine fault-tolerance. BitWeave’s core de- sign goal is to allow small entities to participate in the consensus mechanism, arguably the key property of public blockchains. The protocol further provides sound incentive mechanisms and the same security as Bitcoin. Unlike previous solutions, the protocol does not dilute mining power and supports a fully decentralized network. Our exper- imental evaluation of BitWeave shows that it can support realistic workloads and is flexible enough to be applied to a wide variety of blockchains, including Bitcoin and Ethereum.

68 CHAPTER 4

DataPods: Federated Decentralized Databases

BitWeave is one promising attempt to scale blockchain, however, it still suffers from massive replication of data and lacks confidentiality for user data. To address these limitations we need a more radical redesign of the blockchain architecture. First, not all data should need to be globally replicated across a peer-to-peer network. Users should instead be able to pick where their data is stored. Second, user data must be confidential. This is a requirement for enabling many applications that rely onsecure computation and for users to trust the system with their personal information.

This chapter introduces DataPods [65, 63, 64], which are trustless, federated databases. Data pods operate mostly without the involvement of a globally replicated ledger, while still providing the same safety and liveness guarantees. Here, the global ledger only tracks meta-data, provides a public-key infrastructure, and enables economic in- centives via a cryptocurrency.

Unlike previous mechanisms [70, 27, 29, 79, 42] data pods support dynamically changing workloads, churn of users, confidential computation, and complex applica- tion logic. Data pods provide the following four features to achieve this:

• Federated Transactions: Independent data pod instances can interact with each other through federated serializable transactions. In the common case, these transactions operate without the involvement of the main chain. This, in turn,

enables scalable applications that operate across organizational boundaries.

• Portable User Data: Users choose which pod to store their data at and can migrate

to another data pod at any point in time. This prevents them from being locked

69 into a business relationship with a specific entity forever.

• State Witnesses: Witnesses certify the state of a data pod at a certain point in time.

These certificates are generated as a result of successful transaction execution and can be verified using the global ledger even if the data pod itself becomes unavailable.

• Secure Function Evaluation: Data pods shield user data from unprivileged access and certifies correct execution of applications, while guaranteeing the integrity

of computation.

The rest of this chapter will describe these features in more detail and discuss the applications they enable. Section 4.1 gives a brief overview of trusted execution envi- ronments. Section 4.2 will discuss the data pod architecture and Section 4.3 presents three applications that data pods enable. We then discuss federated transactions in Section 4.4 and user data migration in Section 4.5. Next, we describe our prototype im- plementation in Section 4.7 which is then evaluated in Section 4.8. Finally, we discuss open problems and future work (Section 4.9).

4.1 Foundation: Trusted Execution Environments

Trusted hardware, which is specialized hardware protecting data and execution, can serve a similar function as a trusted third party. Dedicated secure co-processors and trusted platform modules (TPMs) were among the first mechanisms to realize trusted computation. Here, specialized hardware provides some level of tamper-resistance against a strong adversary that has physical access to the machine in question. Usu- ally, this hardware has built-in cryptographic features that allow signing and encrypt- ing data. A drawback of this design is that such dedicated hardware usually has low

70 Figure 4.1: Comparison of two distinct TEE architectures: Intel SGX (left) provides each application with their own TEE, while ARM TrustZone (left) provides one “secure world” for all trusted applications. processing power and, therefore, only supports a limited set of applications.

Trusted Execution Environments (TEEs) are a new approach for trusted computation that are shipped as a feature on consumer CPUs. As a result, TEEs are available on many devices and can leverage the full processing power of the particular CPU. Intel Software Guard eXtensions (SGX) is a TEE that is available on many modern desktops and servers. Similarly, many mobile devices provide the notion of a TEE through ARM TrustZone. Finally, research is ongoing on open-source TEEs, such as the Keystone Architecture [57].

TEEs, while executing on the same processor, are shielded from other applications and even the operating system itself [20]. This is achieved by providing a specially protected memory region that can only be accessed while executing enclave code. Ad- ditionally, TEEs are protected by disabling certain processor features when executing them, such as those intended for debugging. Figure 4.1 outline two different designs for TEEs: the protected memory region can either be allocated on a per enclave basis (Intel SGX) or as a “secure world” where multiple trusted applications and a trusted operating system co-exist (ARM TrustZone).

TEEs such as Intel SGX can prove remotely that they are functioning correctly and

71 have not been compromised, a feature known as remote attestation. Remote parties initially connect to an enclave using an unsecured network connection. They then perform a TLS key exchange to establish a shared secret for secure communication. As part of this process, they verify the enclave’s signature and extract a “quote” to asses the integrity of the executable. In the case of SGX, an Intel-approved quoting enclave checks the TEE for correctness and returns the result to the remote party. Additionally, the system checks whether the CPU in use has been blacklisted by the vendor, e.g. because it is known to be faulty. As a result, the established TLS key is only known by the enclave and the remote party and the remote party has a guarantee that the enclave is authentic.

We assume that every data pod implementation has access to a TEE that, as long as the CPU itself is not tampered with, provides confidential execution and integrity.

More concretely we assume it protects application data from third-party access, that parties cannot influence the execution of the TEE, except for how many CPU cycles are allocated to it, and that it provides remote attestation.

4.2 The DataPods Architecture

The DataPods architecture enables building efficient and secure decentralized appli- cations. First, it provides mechanisms to execute program calls in parallel. Second, it allows for application-specific consistency constraints and access policies. This section defines these abstractions in detail.

Each data pod is a database instance maintaining a local state and associated ledger separate from the global state. Data pods provide similar performance as conventional databases, while enforcing semantics specific to decentralized applications. The state of

72 Figure 4.2: Overview of the Data Pod Architecture: A TEE guards data and computa- tion from unwanted access. Hashes of the resulting state are stored on the global chain to ensure availability. each data pod is anchored to the main chain to ensure liveness. In essence, clients resort to using the global consensus protocol in case the data pod misbehaves or becomes unavailable.

73 4.2.1 Assumptions and Attack Model

As described in Section 1.3.1, decentralized applications execute in the face of a strong adversary that may have overtaken large parts of the network. Additionally, adver- saries might aim to change the behavior of the protocol by rolling back a data pod’s state or delaying messages.

Adversaries might try to extract confidential information from the data pod. For example, they may inspect the pod’s state on disk or parse incoming and outgoing messages. Our attack model further allows for arbitrary modification of the hardware associated with the data pod, aside from the CPU.

For each data pod, we assume there exists at least one client validating its state.

Clients may not always be connected but will eventually synchronize their state with data pods holding their data and validate all of their operations. We leave mechanisms allowing for more lightweight clients for future work.

Finally, similar to BitWeave, the protocol relies on the absence of long-lived net- work partitions to ensure that messages are eventually propagated across the network. In our concrete implementation, we assume messages propagate within one minute.

4.2.2 Global Ledger Abstraction

Data pods are designed to be independent of the underlying DLT. This is desirable be- cause the field is constantly improving existing protocols or inventing entirely new ones. Recall that decentralized ledgers provide an append-only log that accepts trans- actions with some payload.

74 For data pods to interact with the global ledger, they need to provide proofs of pub- lication as described by Cheng et al. [14]. These proofs are cryptographic certificates attesting the inclusion of a transaction in the decentralized ledger. In the caseofa ledger that is based on Nakamoto-consensus this means that a transaction is buried deep enough in the chain such that forks at this depth are virtually impossible.

Additionally, the DLT needs to have a built-in cryptocurrency and the notion of epochs to indicate the passage of time. A cryptocurrency is necessary for mechanisms that incentivize data pods to behave correctly. A new epoch can be indicated by the mining of a new block in Nakamoto-based systems or the admission of a new batch of transactions in a committee-based protocol.

For the rest of this chapter, we refer to this DLT abstraction as the main chain to differentiate it from the local state maintained by each data pod.

4.2.3 Application Programming Interface

Data pods build on top of the CDA model from Chapter 2, where each object is a struc- tured set of data similar to those in document stores, e.g. MongoDB. The set consists of named fields with corresponding values, such as lists, dictionaries, strings, binary data, and numeric values. Objects are uniquely identified by a 128-bit identifier, where the first 64 bits are the identifier of the data pod they were created on. Encoding theorigi- nal location into an object’s ID allows for easy retrieval without the need to maintain a global registry for all objects.

Objects managed by a data pod will only be processed by a different data pod or the main chain when recovering from failures or during federated transactions. This separation of state is the key design decision that makes DataPods scale.

75 Table 4.1: List of data pod API calls supported by our implementation

db Module new(k, v) Create a new object (with key) k and val- ues v. The object will be owned by the calling user. delete(k) Delete object k. get(k) Get the most recent value of object k. get_field(k, p) Get the most recent value of field p in ob- ject k. set_contains(k, p, v) Check if the set at k.p contains the value v. load_ml_model(k, p, v) Create an instance of the ML object from data stored at k.p. insert(k, p, v) Update or create field p in object k if k exits. list_append(k, p, v) Append a list at k.p . set_insert(k, p, v) Insert value v into the set at k.p . remove_field(k, p) Remove the field p from object k if k ex- ists. increment(k, p, off) Increment a numeric value at k.p by off. decrement(k, p, off) Decrement a numeric value at k.p by off. wallet Module get_balance() Get the current balance of the calling user’s account. pay(id, a) Pay amount a of cryptocurrency from the calling user’s account to account id. random Module randbytes(l) Generate a random sequence of bytes of length l. randint(s, e) Pick a random value from the interval [s;e). assert Module exists(k, p) Check whether an object k and its field p exists. owns(k) Check whether the calling users owns the object k. is_true(pred) Check that a predicate pred holds. ML Objects train(t, r) Train a model by applying a batch stored in tensor t and the expected result r. infer(t) Run inference on the tensor t and return the result.

76 An application’s interface consists of a set of functions that can modify state by creating or modifying objects. Table 4.1 outlines the API accessible to application func- tions, consisting of (1) a database interface to create and modify objects, (2) a module that allows payments between accounts, (3) an assert module that enables access- control, and (4) a module to generate secure randomness.

type Account { balance: u32 };

pub fn transfer(src: ObjectId , dst: ObjectId): assert.owns(src)

let sbalance = db.get_field(src, ["balance"]) assert.is_true(sbalance >= amount)

db.decrement(src, "balance", amount) db.increment(dst, "balance", amount) Listing 4.1: Sketch of a simple token implementation

Listing 4.1 sketches an application implementing a custom token, which can be a form of payment in other applications. Here, the transfer function may either be invoked by a client who wants to transfer their token to another entity or by another application that wants to use the token as a form of payment.

4.2.4 Authenticated Private Storage

Data pods provide a digest, which is an authenticated data feed of its state and all updates to it. Digests provide verifiable state updates using cryptographic hashes and Merkle trees [69]. As introduced in Chapter 2, Merkle trees are hash trees that can combine hashes of an arbitrarily large set of objects (the leaves) into a single hash value (the tree’s root).

77 Figure 4.3: Digests bundle their content into batches. Each batch represents changes to the data pod’s state and contains the change set, as well as, the snapshot of the resulting state. The header of each digest batch is stored on the global chain, sothat the authenticity of the batch can be verified later.

A data pod’s digest consists of a sequence of batches, each representing a snapshot of the data pod’s state at a particular point in time. Additionally, batches record the changes since the last batch was created. Batching updates reduces the amount of communication required between the data pod and the main chain.

Figure 4.3 outlines how batches store this snapshot and the changes in a succinct form. Each batch records changes in the form of an update set and a transaction set. The transaction set certifies that a transaction was applied to the data pod’s state, while the update set records the specific changes to particular objects. Updates are either a transaction commit, abort, or the acquisition of a reservation for a particular transac- tion. These are fully contained within the batch included in an associated hashtree. To reduce their size, batches do not contain the full state and associated reservation set but only the associated hash trees. Clients can locally compute a snapshot of their partial state and the reservation set affecting their object(s) and compare it against the hash tree in the digest.

78 Data pods store the header of each batch on the main chain in the form of a global transaction. More concrete, data pods store the Merkle roots of the transaction and update sets, as well as, the Merkle roots of the state and reservation snapshots. This allows clients to generate proofs about the inclusion of a transaction or the state of an object without the involvement of the data pod. Note that a single transaction can result in multiple object updates. For example, in the messenger application a call to send_message is represented by a single transaction that results in the modification of the source and destination objects.

Updates to particular objects in a digest are encrypted using the owner’s key. Clients have direct access to their data through the secure channel and rely on trusted execu- tion to ensure correct modification of remote parties’ data. Reservations for a particular object are similarly encrypted using the client’s public key. Data in a particular pod’s digest is also encrypted using the user’s key so that only the data’s owner can access it. Additionally, some meta-data for each object is stored unencrypted, e.g., whether the object is currently involved in a pending transaction.

We now introduce the notion of witnesses, which are certificates that allow clients to track partial state of a data pod. These cryptographic certificates form the basis for most features we describe in this chapter.

State Witnesses

Clients track objects they own by parsing the digest of the data pod(s) they are stored on. In particular, they parse the update set for changes to relevant objects. They then use their private key to decrypt information about the changes and update their state locally.

79 Tracking their object’s state enables them to generate state witnesses. At any point in time, and without the involvement of the data pod, they can encrypt their local copy of the object with their public key. They then create the witness by extracting a

Merkle proof from the state tree supplied by the pod’s digest. This is possible because the entire state Merkle tree is included within the digest batch.

Transaction Witnesses

After successfully invoking a function, clients receive a transaction witness from the data pod certifying the inclusion of a transaction into a data pod’s state. The witness consists of three parts and is signed by the data pod to certify the witness has not been tampered with. First, the witness contains a Merkle proof that demonstrates the transaction is part of a batch in the pod’s digest.

Second, the witness contains the transaction request itself and a proof that it was processed without modification. This allows one to verify the request was issued bythe particular client and received by the data pod as is. As a result, even though the client might not have access to all objects modified by the transaction, they can still infer that the transction was executed correctly, by relying on the witness and the integrity of the TEE.

Third, the witness may contain a return value of the executed function. Thisallows one to extract authenticated public information from private data. For example, an application might train a machine learning model on user data, where the resulting model is public but the user data is not.

80 4.2.5 Secure Function Evaluation

Many high-value applications require enforcing confidentiality and integrity require- ments through pre-defined policies [101, 28, 109]. Conventional databases cannot live up to this challenge as they require trust in the entire application stack and host oper- ating system(s) by all parties. Earlier work focused on accountable systems [106, 59], which ensure integrity by allowing clients to audit the log for states they have observed previously. But audit mechanisms by themselves cannot protect against third-parties accessing the data.

The DataPods architecture supports confidentiality using cryptographic primi- tives or secure hardware. Homomorphic encryption, for example, enables verifiable computation on encrypted data. In our particular instantiation we implemented this functionality using trusted execution environments (TEEs), because they allow us achiev- ing data confidentiality without a large drop in performance.

When first connecting to data pod instances, clients establish a secure, authenti- cated connection with the TEE. They verify the public key of the pod to ensure they are talking to the correct entity and verify the trusted execution environment for correct- ness. Further, clients perform a Diffie-Hellman key-exchange to create an encrypted channel to the data pod. Data pods perform the same procedure when connecting to other data pods.

4.2.6 Detecting Data Pod Failure

While data and computation are protected by the TEE, the operator of a data pod can still perform certain attacks, which are addressed by the protocol through availabil-

81 ity wagers. Similar to BitWeave, clients report failure through a special transaction describing the misbehavior that is sent to the global blockchain network. Since there is a direct connection between the client and the data pod, clients can easily detect unresponsiveness.

When encountering an unresponsive data pod, clients issue an availability wager to the main chain requesting a specific digest batch to be published. A client that issues such an availability wager must attach funds that represent their confidence thatthe data pod is misbehaving. Multiple wagers from different clients can be attached tothe same batch, so long as the wagers do not surpass the data pod’s deposit amount (Section

4.2.7). If the data pod does not reveal the batch after a set time, the challenging clients are rewarded the wager amount.

If the data pod reveals the batch in time, the clients’ wager instead is burned and the protocol proceeds without change, which ensures that data pods do not profit from withholding blocks. In this case, clients who are unsatisfied with the responsiveness of a particular data pod can opt to migrate their data to avoid future disruptions.

4.2.7 Economic Incentives

Digital Payments

Data pods can handle a number amount of digital payments without the involvement of the main chain. The handling of payments in data pods follows a similar incentive structure as other off-chain solutions, such as state channels. Users lock moneyina data pod, which then can be transferred within the data pod or to other data pods. As long as money is locked at a specific data pod, it can be moved freely between accounts

82 on that pod.

As with payment channels, data pods leverage cryptographic commitments to trans- fer money between data pods without the involvement of the main chain. If a client initiates a money transfer between accounts located on different pods, the respective data pod balances need to be updated. They do this by each cryptographically certi- fying the state change. If one of the data pods becomes unavailable in the future, the other can reconcile the state by presenting the certificate to the main chain.

Similarly, balances on a data pod can be reconciled with the main chain at any point in time if requested by a client. Clients issue a request to the main chain to reconcile money that contains a Merkle proof against the data pod state certifying the resulting balance. Like with other updates to the data pod, the network relies on audit mechanisms to ensure that the resulting balance is correct.

Transaction Fees and Gas

Transactions fees are the main incentive mechanism driving the data pods protocol.

Clients pay data pods for including their transactions and data pods pay the main chain for including their transactions likewise. These fees depend on the urgency of a partic- ular transaction and the current congestion of the network.

Transaction and gas fees are structured similarly to those in Ethereum. Here, fees are negotiated dynamically between clients and data pods, and data pods and the global ledger respectively. Transaction requests may also contain a certain amount of cryp- tocurrency to pay for computation. Like in Ethereum, that budget consists of a price per gas and a maximum amount of gas that can be spent. If the transaction runs out of gas during execution, it aborts.

83 Deposits

When first connecting to the main chain, data pods stake money in the form of adeposit, which can later be used to hold misbehaving parties accountable.

The protocol incentivizes detecting and reporting fraud and punishes misbehaving data pods. Availability wagers are included on the main chain like any regular transac- tion, which means the issuer has to pay transaction fees for them to be included. The global ledger can collect transaction fees even from invalid availability wagers, which ensures that no excessive number of invalid wagers are sent to the main chain.

The wager reward is deducted from the data pod’s deposit and awarded to theis- suer(s) of a successful availability wager. Once a data pod has shown to have misbe- haved, it cannot issue a new batch to the main chain until it has renewed its deposit.

4.3 Application Case Studies

The DataPods architecture and its associated programming model supports a wide range of decentralized applications. In the following, we describe three such applica- tions built on top of the API outlined in Table 4.1.

4.3.1 Federated Social Networking

Social networks are at the center of most people’s lives: they provide the means to share information with a circle of friends and to send messages to each other. While most social networks, such as or , are currently centralized systems,

84 type UserProfile = { following: Set, messages: Map> }

pub fn create_user(uname: String) > ObjectId: return db.new("UserProfile", {uname, messages: {}, following: {}})

pub fn follow_user(me: ObjectId , other: ObjectId): db.set_insert(me, ["following"], other)

pub fn send_messages(src: ObjectId , dst: ObjectId , msg: String): # Ensure they are mutual followers assert.is_true( db.set_contains(src, ["following"], dst)) assert.is_true( db.set_contains(dst, ["following"], src))

db.list_append(src, ["messages", dst], msg) db.list_append(dst, ["messages", src], msg) Listing 4.2: Sketch of the messaging functionality of a the sparse nature of friend-relationships makes it fairly straightforward to partition them.

As discussed in Chapter 1, protocols for federated social networks suffer from the same user lock-in as centralized systems. For example, one might register with a par- ticular XMPP server that becomes inactive in the future, which is highly undesirable form a user perspective. Fortunately, we can implement social networks on the Data Pods architecture.

Listing 4.2 outlines how a direct messaging functionality such as that of Twitter can be implemented. Users create a new profile through the create_user call and follow other users with the follow_user call. The send_message function first checks that

85 users follow each other before appending the message to the chat log of both parties. This functionality could easily be extended with group chats or status updates.

4.3.2 Non-Tangible Assets

Decentralized ledgers are commonly used to track both real-world and virtual assets. A popular application in this space is “cryptokitties”, a game about owning and breeding virtual cats with unique features. New cats can be created by “breeding” existing pets and may be of high value depending on their uniqueness and traits. Cryptokitties caused major congestion on the Ethereum network due to its popularity in 2017 [102].

Listing 4.3 outlines one such application, where users can breed virtual birds. Here, each bird is associated with a unique DNA. For the first generation, this DNA is ran- domly generated, while others derive it from their parents’ DNA. To make this applica- tion complete, the functionalities to create new random birds and to transfer ownership must be added.

4.3.3 Decentralized Machine Learning

Machine learning is at the core of many modern applications. However, currently only a few large-scale organizations have access to sufficient user data to build efficient models. This creates an oligopoly of a few providers, while also giving these entities unchecked access to large amounts of personal data. Even worse, previous work has shown that even without access to the raw training data a trained model can still leak sensitive personal information [108].

We implemented a scheme that provides decentralized development of privacy-

86 type Bird = { generation: u64, dna: [u8; 32] }

pub fn breed(parent_id1: ObjectId , parent_id2: ObjectId) > ObjectId: if parent_id1 == parent_id2: abort()

parent1 = db.get(parent_id1) parent2 = db.get(parent_id2)

if parent1 is None or parent2 is None: abort()

let dna = [] for i in range(0, 32): if random.randint(0,1) == 0: dna.append(parent1.dna[i]) else: dna.append(parent2.dna[i])

let generation = 1 + max(parent1.generation,parent2.generation)

return db.new("Bird", {generation , dna}) Listing 4.3: Sketch of the breed-functionality of Cryptobirds preserving machine learning services. Here, independent services employ federated learning [50] to collect data from a multitude of users and output a privacy-preserving model to achieve this. Data pods then enable users to selectively share and sell their data to particular providers and to define restrictions on how their data and the result- ing model is being used. For example, they can opt to only give their data to providers that shield the model using differential privacy.

87 4.4 Federated Transactions

Applications can execute across data pods by leveraging federated transactions. Feder- ated transactions support strict serializability [8] to ease development of strongly con- sistent applications and guarantee confidentiality of user data. This section outlines the semantics and implementation of this primitive.

4.4.1 Transaction Chopping

Data pods employ transaction chopping [91] to enable scalable and serializable transac- tions. At a high level, transaction chopping breaks a transaction’s logic into a directed acyclic graph of subfunctions, where each edge represents either a control or data de- pendency. A control dependency represents control flow decisions, such as the out- come of an if-statement, and a data dependency represents the direct use of a variable that was output by another subfunction. Subfunctions then communicate via message passing, where each subfunction may receive control flow decisions or data from other subfunctions and may output data and control flow decisions itself.

Figure 4.4 outlines how transaction chopping is applied to the cryptobirds applica- tion. The function performs three database operations: it loads both parents objects and stores the new client object. Each of these operations must be contained in a dis- tinct subfunction as the object may reside on different data pods. The fuction further has two points at which it might abort: if the parent identifiers are equal and if either of the parent birds does not exist. The chopping mechanism aims to move abort de- cision to the earliest stage possible. As a result, the function consists of three stages. The first stage ensures that the parent identifiers are distinct. Stage two retrieves the two parent objects from the database. The final stage checks that both parents were

88 Figure 4.4: Demonstration of how the cryptobirds application is broken into four dis- tinct subfunctions. returned successfully and creates the child.

Data pods rely on the chopping scheme outlined by Faleiro et al. [32]. The proto- col first creates a subfunction for every atomic operation performed by the program. Then, dependencies between subfunctions are determined through control anddata flow analysis. Finally, we recursively merge subfunctions with their predecessor if they only depend on the predecessor and if they contain at most one database opera-

89 tion when combined.

Applying transaction chopping to data pods provides two major advantages. First, it allows for better performance since different parts of a transaction can execute con- currently, e.g., when aggregating data from multiple pods. Second, it allows for more privacy of user data because only the results of a particular subfunction are sent back to the calling party.

4.4.2 Transaction Lifecycle

Data pods process federated transactions in two phases: EXECUTE and FINALIZE. The particular data pod that received a client’s function call coordinates the transaction, which is outlined in Listing 4.4, and interacts with remote peers, outlined in Listing 4.5. Data pods rely on pessimistic two-phase locking to minimize the number of aborts while processing federated transactions.

EXECUTE Phase

A transaction’s EXECUTE phase consists of a series of stages. Each stage consists of a set of subfunctions that have no dependencies among each other and can be executed concurrently. The protocol differentiates between local stages, the only involve the initiating data pod, and distributed stages, that involve a set of data pods.

At the beginning of a distributed stage, the coordinating data pod sends an execute- request containing a set of subfunctions to each data pod involved in the transaction and waits for their response. Data pods execute subfunctions once they receive the re- quest and acquire reservations accordingly. The set of involved data pods might grow

90 Listing 4.4: Pseudocode describing the execution of a function call as a federated trans- action on_receive "call_function" (fn_name): tx = self.init_transaction(fn_name) tx_id = tx.get_identifier() remote_pods = set()

while tx.has_next_stage(): pending = [] stage_id, stage = tx.get_next_stage()

for peer , fns in stage: hdl = peer.send("execute_stage", (tx_id, stage_id, fns)) remote_pods.insert(peer) pending.append(hdl)

for hdl in pending: response = hdl.await

if not response: digest = self.request_digest(src).await if digest: response = self.parse(tx_id)

if not response or response.is_err(): tx.abort() for peer in remote_pods: peer.send("abort_transaction, (tx_id)) return

tx.commit() for peer in remote_pods: peer.send("commit_transaction", (tx_id)) as the transaction executes.

Due to loop or conditional statements, the number of stages of a transaction might not be known a priori. Thus, the coordinating data pod sends an execute-request

91 Listing 4.5: Pseudocode describing the participating of a data pod in a remotely- coordinated transaction on_receive "execute stage" (tx_id, stage_id, fns): if stage_id == 0: tx = self.create_tx(tx_id) else: tx = self.get_tx(tx_id) tx.abort_timer()

res = tx.execute(fns) tx.set_timer()

send "stage result" (res) on_timeout (tx): src = tx.get_source() digest = self.request_digest(src).await if digest: # Look for any relevant messages found = self.parse(digest) else: found = False

if not found: tx.abort() on_receive "finalize transaction" (tx_id, abort): tx = self.get_tx(tx_id) if abort: res = tx.abort() else: res = tx.commit()

send "finalize result" (res) to every involved data pod, even if they do not need to execute a subfunction in a particular remote execution stage, to notify them that the EXECUTE phase contains another stage. Additionally, the protocol allows for at most k stages per transaction to ensure there is an upper bound for transaction execution, where the concrete value of

92 k is set by the particular instantiation of the protocol.

A particular stage is successful if none of its subfunctions abort. Subfunctions may abort because they failed to acquire a reservation, performed an invalid or unautho- rized instruction, or the application logic requested to abort. If a stage fails to execute, the transaction as a whole aborts.

Lock Contention Lock contention is resolved using a wait-die scheme. If a trans- action T tries to acquire a reservation that conflicts with an existing reservation of another transaction T ′ it either (1) waits if the starting epoch of T ′ is higher than that of T or (2) aborts if the starting epoch of T ′ is equal or smaller than that of T .

FINALIZE Phase

Once the EXECUTE phase has finished, the coordinating data pod either requests all involved parties to commit or to abort the transaction. At this point, each data pod participating in the transaction has a local reservation set, which will either be marked as to be committed to the data pods state or to be discarded.

Data pods may not be able to finalize a transaction in time, because they might ex- perience contention, which is accommodated finalize extensions. They can indicate that not all transactions have been finalized using a boolean flag in the global transaction when sending a digest batch to the main chain. The FINALIZE period of all transaction currently being finalized is then extended by another epoch. Because this flag iscon- tained in the main chain, it is visible to other data pods involved in the transaction as well.

93 4.4.3 Fault-Tolerance

We now extend the mechanisms from Section 4.2.6 for federated transactions.

Unresponsive Data Pods

While federated transactions execute without the direct involvement of the global ledger, the ledger serves as a trusted source of time, and as a fallback during data pod failure. Each phase is set long enough to account for messages delays, after which unresponsiveness of a particular data pod can be recorded on the main chain.

For each distributed stage, the protocol allocates a fixed amount of time and aborts if a remote party is unresponsive within that time frame. Similarly, if a data pod is involved in a transaction coordinated by another party and does not receive instruc- tions to execute the next stage, they also consider the transaction aborted. Data pods verify the absence of a message by inspecting the remote pod’s digest and requesting an availability wager if needed.

For the EXECUTE phase, in particular, the length is determined by the number of stages, as each stage involves a network round-trip. The trusted source of time thus allows one to assert that a data pod has become unresponsive or to ensure that enough time for validation has passed to consider a committed transaction stable.

If the EXECUTE phase has been successful, the initiating data pod either initiates a commit, at which the transaction is guaranteed to succeed or times out. Once the coordinating data pod issues a commit in its digest, the transaction is considered com- mitted even if other data pods become unavailable during the commit phase. Inthe case of such a partial commit, clients of unavailable data pods commit the transaction

94 locally and update their state witness without the involvement of the particular data pod. Once these data pods become available again, they are required to issue their corresponding commits.

If the EXECUTE phase was unsuccessful, the transaction is guaranteed to abort. Ei- ther the coordinating data pod issues an abort, or it becomes unavailable. In the latter case, the transaction aborts due to a timeout.

Global Challenge Extensions

Like in BitWeave, the global ledger can indicate at the beginning of an epoch that not all pending availability wagers have been processed. In this case, a global challenge ex- tension is issued, which extends the length of the current EXECUTE or FINALIZE phase for all pending transactions to give ample time for processing all remaining availability wagers. This means that an excess of availability wagers will stall the overall through- put of the chain to ensure correctness, but ensures that safety of the protocol is always maintained. Note that the number of possible extensions of CHALLENGE and FINALIZE phases is finite, as each data pod digest has a finite number ofbatches.

Guaranteeing Message Delivery

Messages from one data pod to another are guaranteed to be delivered once recorded in a digest. More concrete, the federated transaction protocol is resilient against adver- saries that withhold or delay messages sent between data pods. Similar untrusted parts of a data pod might withhold messages due to benign failures or malicious intent. A data pod’s digest records all messages sent to other pods and, like with client witnesses, messages are recorded as encrypted data that can only be decrypted using the remote

95 pods private key. This allows data pods to retrieve messages from the digest if theyget lost in transit or the data pod fails but prevents unauthorized parties from intercepting communication.

The DataPods protocol falls back to the global network to retrieve missing mes- sages. First, the presence of messages can be verified through the main chain. The header of each digest batch contains the identifiers of all data pods for which it con- tains messages. Second, data pods, like clients, rely on availability wagers (see Section 4.2.6) to retrieve or invalidate missing digest batches.

4.4.4 Federated Transaction Fees

We extend the transaction fees from Section 4.2.7 to provide a sound incentive mecha- nism for federated transactions. Instead of defining a single transaction fee, clients set a per-pod fee fpod and a pod limit L. As a result, the transaction can spawn up to L data pods, where each gets paid fpod.

Further, transactions do not only set a total gas limit, but also a gas limit per stage and per pod. Thus, during each stage, each data pod has its own gas budget anda stage is guaranteed to not exceed the total gas budget. At the end of a stage, the total gas budget can be updated depending on how much gas was used by each data pod involved. Here, a transaction will not continue if not enough gas remains to pay for the next stage.

96 97

Figure 4.5: The lifecycle of a federated transaction: A client is connected to and invokes the breed_bird function on pod 1. Pod 1 executes the first stage of the function locally. Pod 1 then, after resolving the locations of parent_id1 and parent_id2, starts a federated execution state that executes subfunction 2 on pod 1, and subfunction 3 on pod 2. The fourth and final subfunction only executes on pod 1 and does not require a dedicated federated execution stage. 4.4.5 Concluding Example

Figure 4.5 outlines how a simplified version of the Cryptobirds application from Section might execute in a federated environment. A client is connected to a single data pod and invokes a breed_bird-call on it. Recall that this function reads the dna-field from two “parent”-objects and generates a new “child”-object, and executes in three stages.

In this particular example, the first parent-object is located on the calling data pod and the second parent on a remote data pod. The first stage can always be executed locally by the calling data pod. The next stage, in this case, needs to be executed onboth data pod 1 and 2. Data pod 1 executes the subfunction 2, which outputs the parent’s DNA, and then issues an execute-request to data pod 2 to retrieve the second parent. This request contains a reference to the subfunction as well as the parent’s identifier, which serves as an argument to the function. In general, subfunctions can have an arbitrarily large set of outputs and inputs.

Because new objects are always created at the calling data pod, once the second stage is completed, the third stage executes locally on pod 1. This function takes the parent DNA’s output by the first two subfunctions and generates a new set ofDNA by randomly choosing parts of the values from each parent. It then generates the new child object that is owned by the client that originally called the function.

If all stages were successful, the coordinating data pod then issues a commit mes- sage locally and asks data pod 2 to commit as well. After issuing the commit, data pod 1 also returns the result to the client, as the transaction is guaranteed to succeed at this point.

98 4.5 User Data Migration

The DataPods protocol relies on data migration to ensure availability in case a data pod fails. Additionally, data migration gives users the independence of a particular data pod’s operator. This means users can choose a data pod with what they deemas acceptable performance and transaction fee costs. Further, users are resilient against database operators that perform censorship, for example by selectively dropping trans- action requests. This section describes the user data migration which is sketched in Figure 4.6.

Data migration relies on state witnesses – introduced in Section 4.2.4 – and consists of two steps. First, users connect to the data pod they want to migrate their object to and establish a secure channel. They then transfer an unencrypted version of the object through the secure channel so that is guaranteed to be available on the new data pod after the migration. This object will then be part of the target data pod’s statebut cannot be used until the end of the migration. Users can then generate a state witness certifying the object being stored at the new location.

The user then initiates the data migration by placing a transaction onthemain chain containing the state witness and the state witness generated by the new location of the object. The state witness certifies the object’s state and that it is owned bythe initiating user. Once this transaction is included in the global ledger it overwrites the object’s location for all future transactions. All transactions currently affecting the object at the old location must be propagated to the main chain as well. New transactions will only be allowed to affect the object after the migration has completed.

As a result, from when a migration is initiated until its completion the object will be read-only to address potential transactions that are in-flight while the object transfer is occurring. The migration itself is set to the length of the longest potential duration

99 Figure 4.6: Outline of the data migration process. Users first establish a secure channel with the new data pod and transfer the object. They then generate a global transaction to finish the migration. of a transaction, the maximum length of execution stages, and a commit phase. This ensures that transactions currently in flight will terminate correctly.

If an object was involved in a partial commit, the witness also contains the reserva- tion(s) of the particular transaction and proof that the transaction committed. Recall, that when a data pod becomes unavailable while executing a transaction, that transac- tion might still commit (Section 4.4.3). In this case, the reservations for the particular transaction are still recorded in the failed data pods digest. Clients can tell from the transaction identifier associated with the reservation which data pod coordinated the transaction. They can then extract the commit message from that and generate the proof needed for the data migration.

For a data migration request to be valid the state witness it contains must be veri- fiable against the most recent digest header of the data pod, so that no recent changes to the object are reverted. Further, if the meta-data of an object indicates that it is

100 involved in a transaction, but the migration request does not include a proof of com- mit or abort for that transaction, the request is also considered invalid. Users initiate data migration using a special transaction on the main chain containing the state wit- ness of the object they aim to migrate. Finally, the DataPods protocol only permits data migration if there currently is no migration for the particular object in progress already.

4.6 Correctness

We provide a proof sketch demonstrating that the DataPods protocol upholds safety and liveness under the assumptions outlined in Section 4.2.1. This proof sketch follows the same structure as that for BitWeave (Section 3.7). In particular, we assume that each data pod has at least one client following it all time and that there is a strong bound on the network latency tp. Then, each EXECUTE stage and the FINALIZE phase must be set to at least 3 ∗ tp or 1.5 round-trip times. Additionally, we expect the global ledger is always available, only admits correct transactions, and allows for users to migrate data to functioning data pods at any point in time. Finally, we expect computation steps involved in executing a subfunction to have a finite bound.

We rely on the following lemma for both proofs:

Lemma 1: The number of extensions for a particular transaction, in the formof potential availability wagers, global challenge extensions, and data pod finalize ex- tensions, is guaranteed to be finite.

101 Proof. As there can only be a finite number of transactions every epoch, the total number of digest batches is also finite. For each digest batch, there can be at most one availability wager. Thus, it follows that the number of availability wagers perepoch has a finite bound.

Recall that, similar to BitWeave, global challenge extensions are issued when the main chain receives too many transactions, and that FINALIZE extensions are issued when the number of commits to be processed exceeds the shard’s capabilities. Global challenge extensions and finalize extensions merely extend the current epoch anddo not allow any new transactions.

4.6.1 Safety

The DataPods protocol guarantees that (1) particular data pods do not violate con- sistency and that (2) federated transactions are atomic and consistent. The concrete consistency constraints are defined by the particular CDA.

Proof. Every operation, e.g. a commit of a transaction, is recorded in a data pod’s digest batch. A digest batch is only considered valid by the protocol if its header is contained in the global chain. Clients observe the global chain for digest headers of the data pods they are connected to. Additionally, they parse all availability wagers and adjust their state if they affect data pods they are connected to.

If there is a header for a digest and a client has not received its payload after tp, they issue an availability wager. The availability wager will then be propagated tothe network in at most tp. A valid availability wager extends the current phase by 2 ∗ tp to allow for sufficient time for the data pod to propagate its payload.

102 Once a federated transaction’s EXECUTE period finishes successfully, all involved data pods will have acquired locks locally and are awaiting the next period or a FINALIZE message. If they do not receive either message within tp, they issue an availability wa- ger, extending the current phase by 2 ∗ tp as before. If the particular digest does not become available the transaction is considered aborted, otherwise the FINALIZE period re-executes.

This concludes the proof for safety.

4.6.2 Liveness

The DataPods protocol guarantees that a user’s data will never get stuck in an un- available data pod and that transactions will be completed after a bounded amount of time.

Proof. Transactions are restricted to a maximum number of stages, which, combined with Lemma 1, ensures that their execution has a finite bound. Thus, if none of the involved data pods fail, the transaction commits or aborts within a bounded amount of time. Further, as shown in the safety proof, if one of the involved data pods becomes unavailable, an availability wager without the response from the particular data pod will cause the transaction to abort or to partially commit, within a bounded amount of time.

Because users follow updates to their own data, they always have a copy of the most recent validated period of their data. If a data pod itself becomes unavailable however no updates to the pod’s data is possible anymore. In this case, users initiate a data migration that is guaranteed to finish within a bounded time frame. The length

103 of a data migration is set to the largest possible length of a transaction and may be extended by any availability wager. Assuming the number of availability wagers is finite, the migration itself will finish in finite time.

4.7 Implementation Details

We implemented our data pods prototype in about 10k lines of Rust code. The im- plementation consists of two dedicated processes. A “proxy”-process that multiplexes client and peer connections and the core data pod logic. The latter executes inside a trusted hardware enclave by leveraging the Fortanix Rust Enclave Development Plat- form 1. This separation is necessary because the current SGX environment does not provide means to efficiently handle many network connections as the required system calls, such as epoll2, are not supported yet by the Rust EDP.

In our concrete implementation, we limit the digest batch size to at most 10, 000 operation to ensure they can be propagated through the network quickly. To reduce latency, data pods will submit a batch to the global ledger if it reached above 5, 000 entries or when 100ms have passed since the last batch was confirmed. Each batch and transaction request is signed by the data pod using a 2048-bit RSA key. Further, the protocol relies on Transport Layer Security (TLS) to establish a secure connection between two data pods and between a data pod and a client.

Application code is written in a scripting language similar to Python. Clients com- pile scripts to bytecode that encodes an abstract syntax tree. After the bytecode has been transferred to the data pods it can be invoked by client requests. Here, the data

1https://github.com/fortanix/rust-sgx 2https://man7.org/linux/man-pages/man7/epoll.7.html

104 pod’s interpreter traverses the syntax tree to execute the application logic.

The prototype also implements basic machine learning functionality by leveraging the TVM library [13]. This allows users to build models using various tools, suchas

PyTorch or TensorFlow, and store the model in the pod in a concise intermediate repre- sentation. Merely porting the TVM runtime, instead of a full ML library, also ensures that the size of the data pod binary remains relatively small and efficient. As a result, clients define the ML models in untrusted code, then transfer the compiled modelto the data pods, which execute it using the runtime.

4.8 Experimental Evaluation

Evaluation of the DataPods prototype aims to answer two questions. First, we evalu- ate application performance across a network of data pods to show that the protocol is indeed scalable. Second, we benchmark the latency of the system executing in Intel SGX to demonstrate that the overhead created by trusted execution is reasonable.

4.8.1 Application Benchmark

In our experiments, we connected the data pods to a simulated decentralized global ledger with performance akin to systems like ByzCoin or Thunderella. Here, we limit throughput of the global ledger to 1000 transactions per second and set a minimum transaction latency of 100 milliseconds, which is comparable to what current committee- based systems are capable of. We evaluated the system on a local cluster consisting of four client machines, one machine simulating the global ledger, and up to four ma- chines running data pods. Each machine executing a data pod was equipped with an

105 3000 Cryptobirds 2500

2000

1500

1000 Throughput (txs/s)

500

0 0 1 2 3 4 Number of Data Pods Figure 4.7: Scalability of the Cryptobirds application

Intel Xeon E3-1200 CPU, 64 GiB of RAM and a gigabit network connection.

Figure 4.7 shows that the Cryptobirds application from Section 4.3 scales linearly with the number of data pods. We observed that throughput is mostly CPU-bound, as verifying and generating cryptographic proofs is computationally expensive. As a result, the system scales almost optimally, i.e., two data pods yields almost twice the throughput as a single data pod.

4.8.2 Microbenchmarks

How does the performance of the global ledger affect overall performance?

We evaluated the application performance with four data pods under a varying per- formance of the global ledger. Here, the experimental setup is identical to that of the previous sections, but we vary the parameters of the ledger simulation.

106 3000

2500

2000

1500

Throughput (txs/s) 1000

500 1 2 3 4 10 10 10 10 Latency of the Global Ledger (ms) Figure 4.8: Effect of the global ledger’s latency on the performance of a networkof data pods

Listing 4.8 shows that data pods perform well while the global ledger’s latency remains below one second. 3000

2750

2500

2250

2000 Total Throughput (txs/s) Total 1750 0 1 2 3 10 10 10 10 Throughput of the Global Ledger (tx/s) Figure 4.9: Effect of the global ledger’s throughput on the performance of anetwork of four data pods

Listing 4.9 demonstrates that data pods perform well even with very low through- put of the global ledger. A global ledger that can only process a hundred transactions per second will still be able to record a sufficient number of digest batches to allow the data pod network to have high throughput. As a result, our evaluation observed no noticeable performance difference between a configuration with a global ledger that

107 supports 10, 000 transactions per second and one that only supports 100.

How does trusted computation affect performance?

4 10 SGX enabled SGX disabled

Throughput (tx/s; logscale) 3 10

Cryptobirds (read/write) Recommender (read-only)

Figure 4.10: Comparison of latencies between trusted and untrusted execution for the Cryptobirds application and a recommender system

Figure 4.10 demonstrates the overhead of executing data pods inside a TEE for the cryptobrids application as well as a basic recommender system. For the latter, we implemented a machine learning application that lets users send queries containing input data and receive an inferred result from the model. On every query, the data pod constructs the model from a binary representation and runs inference on it. This mechanism could be improved by caching the built model, but this was out of scope for our prototype.

For the cryptobirds application, we observe about an order of magnitude overhead due to trusted execution. The reason for this is two-fold. First recent attacks onIntel

CPUs such as Spectre, require mitigations that may degrade performance. We describe

108 these attacks in more detail in Section 4.9.

Second, SGX can address only a small amount of memory efficiently before perfor- mance degrades. In essence, in current implementations of Intel’s TEE the so-called

Encrypted Page Cache (EPC) [20] only provides about 92 MiB to the application. Once data exceeds the size of the EPC, it needs to be sealed and paged to unencrypted mem- ory, which is computationally expensive. As a result performance for applications that have a large memory footprint, such as the cryptobirds, significantly degrades. In comparison, the read-only recommender application, which only needs to keep the ML model in memory, has a much lower overhead. This limitation may be mitigated in future versions of SGX by providing a larger EPC size.

4.9 Discussion and Open Problems

4.9.1 Sybil Attacks

Because each data pod is required to deposit funds upon creation and to pay transaction fees for every batch they submit to the main chain, the protocol is somewhat resistant to Sybil attacks. Nevertheless, an attacker with sufficient funds could build aSybil network of data pods to change application behavior to their advantage. Here the assumption that each data pod, even Sybils, has at least one honest client validating them might not hold. If the data pods execute in a trusted environment, however,

Sybil attacks have very little advantage as the behavior of a Sybil cannot bechanged arbitrarily.

For instantiations of the DataPods protocol that do not rely on trusted hardware to

109 be Sybil resistant, we suggest the protocol to introduce the notion of “bounty hunters”. Here, clients looking for bounties randomly walk the network of data pods to look for unresponsive data pod instances or data pods that respond with invalid states. They are incentivized to do so, because finding an invalid data pod will grant them a reward, i.e., the pod’s deposit.

Existing mechanisms for detecting Sybils in a graph should be applicable to the data pods architecture as well. We reserve more sophisticated mechanisms for detecting Sybils for future work.

4.9.2 Global Objects

While most application data can be associated with a particular user, some may not. Of- ten non-user data needs to be accessible globally by all parts of the application. Indexes on application data are one example of a global state that needs to be highly-available and replicated.

We envision an addition to the DataPods architecture, where the global chain maintains some application data and participates in federated transactions to update these global data items. Here, a particular subfunction might not execute on a single data pod, but on the global network instead.

4.9.3 Software-Based Attacks on TEEs

Side-channel attacks, which are attacks that observe the application’s behavior through non-standard communication, such as looking at its CPU or cache usage, are of con- stant interest in the security community. Thus, several papers have addressed how

110 the confidentiality of trusted hardware enclaves can be broken using such attacks [90]. Most of these attacks benefit from the fact that weak cryptographic code, e.g., where application secrets modify the control flow, is executed inside the enclave. While pre- venting data pods from side-channel attacks is beyond the scope of the paper, all cryp- tographic code in the enclave is implemented using constant-time libraries. Still, we expect future implementations of data pods will need to be amended as other such side-channel attacks are discovered.

Further, it has been shown that speculative execution and caching on Intel CPUs can be exploited to leak private information of processes and even SGX enclaves [10,

47, 98]. Some of these attacks can be mitigated by upgrading the microcode ofIntel CPUs, while others require disabling certain features such as HyperThreading. This means mitigating such attacks comes with a noticeable amount of performance loss.

However, a data pod is still orders of magnitudes faster than other blockchain systems, which means it still performs well compared to other systems with the mitigations applied. We further assume the usage of open-source TEEs, such as Sanctum [56] or

Keystone [57], will help to avoid similar attacks, as a large developer community can vet the hardware implementation and microcode of the processor.

4.9.4 Handling Offline Clients

Our current designs assume that clients continuously follow and verify changes to their data, which might not be realistic. In a mobile environment, a client might be offline for several hours, which is currently not covered by our failure model. Similarly, they might not have the computational power to be able to verify every operation relevant to them.

111 Previous work has investigated watchtowers [68], which provide verification of a Byzantine actor as a service. Similar to bounty hunters, watchtowers are financially incentivized to find misbehavior of a data pod. In our setting, they would addition- ally cache messages and relay them to the client, so that the client has access to all witnesses.

Similarly, user data might be of interest even if the user is not available anymore.

For this, mechanisms must be investigated that make user data recoverable without the reliance on their private key, for example by replicating the data.

4.10 Chapter Summary

This chapter presented DataPods, a novel database architecture enabling scalable, Byzantine fault-tolerant applications. Combined with trusted hardware, they enable computing confidential user data in a trustless setting. This makes it a suitable archi- tecture for building real-world decentralized applications. Our experiments show that this architecture scales well with the number of nodes involved and performance is mostly independent of that of the global ledger.

112 CHAPTER 5

Related Work

5.1 Consensus Protocols and State Machine Replication

Each distributed protocol is built against a particular failure model that defines the types of faults to expect. The three most prominent of these are fail-stop, crash-failure, and Byzantine. Fail-stop describes a model where nodes can only fail by terminating and this termination is visible to all other participants. Crash-failure extends this to a model where failures can not directly be observed by other participants. Instead, protocols need to rely on timeout mechanisms to infer that a node might have failed. Finally, the Byzantine failure model is the most aggressive model, where nodes might not only fail, but can actively lie, send conflicting messages, or delay communication arbitrarily.

Chain replication [97] and primary-backup [3] are the two most prominent proto- cols for the fail-stop model. This replication scheme, as the name indicates, arranges nodes as a chain. Updates are sent to the head and propagated through the chain to the tail. When reading from a tail, one always observes a state that is replicated across the entire chain. Primary-backup, on the other hand, has one designated primary copy that is directly connected to all backups. Writes are sent to the primary, which then forwards it to the backups. In both setups, failures are detectable by all parties due to the fail-stop assumptions, which allows for easy reconfiguration. In chain replication, the faulty node can simply be removed from the chain. If the primary of the primary- backup scheme fails one of the backups becomes the new primary by some pre-defined scheme.

113 The Paxos protocol [54] is one of the most prominent mechanisms for SMR in the crash-failure setting. The protocol first elects a leader. Then, once a leader iselected, it starts proposing state transitions to its followers, which then each forward their response to the network. If the majority of nodes accept the transaction, it is considered accepted by the system as a whole and the result is forwarded to the client.

Practical Byzantine Fault-Tolerance (PBFT) [12] was one of the first Byzantine fault- tolerant consensus protocols and is still widely used today. PBFT relies on leader elec- tion, similar to Paxos, but introduces an additional communication round to account for potentially conflicting messages sent to different nodes. Several improvements to

PBFT have been proposed over the last years. For example, Zyzzyva [51] avoids the third round of messages in the absence of failures using speculative execution. Aard- vark [15] adds additional robustness by making clients digitally sign their requests and frequently rotating leadership. HotStuff [105] reduces message complexity during leader election among other modifications to the protocol.

5.2 Accountability for Distributed and Decentralized Systems

5.2.1 Audit Mechanisms

Fraud proofs were first introduced in work auditing centralized services [106, 59, 67], such as a filesystem. Here, the centralized system provides an auditable log of modifica- tions to its clients. Clients communicate through channels hidden from the centralized service to exchange the log data they receive from the service and detect discrepancies in the log. Unlike blockchains, these mechanisms do not allow to recover from Byzan- tine failures by themselves, but instead just provide means to detect such failures. The

114 underlying assumption is that rational operators of service will not perform fraudulent behavior knowing they will be caught.

Recent work, such as BlockchainDB [29], Arbitrum [42], and FalconDB [77], has extended this scheme to include replication and failure recovery coordinated by a global blockchain. In FalconDB, clients do not have to maintain the entire database state but instead interact with a database that is replicated across a multiple untrusted servers. They then merely verify its authenticated data structure against theglobal blockchain. Data can be recovered in case of failures assuming at least one replica of the database is behaving correctly. Arbitrum employs a similar mechanism as Fal- conDB. However, here the correct replica(s) detect failures and report them to the main chain. BlockchainDB requires full trust of a client in the particular database instance it is connected to but enables sharding data.

5.2.2 Encrypted Databases

If policy enforcement is not a requirement, i.e., users trust each other, operating on encrypted data might be sufficient to achieve confidentiality. Maheshwari etal.[62] presented one of the first encrypted databases. Their system stores hashes of theen- crypted data in a small trusted hardware module to protect from tampering.

CryptDB [81] and Monomi [96] rely on homomorphic encryption of data. CryptDB does not encrypt all data and only supports a subset of the SQL language to make such a scheme efficient. TrustedDB [7] and Cipherbase [5] overcome this limitation by run- queries on encrypted data using a trusted hardware module. Obladi [23] provides oblivious serializable transactions. All of these systems, to our knowledge, assume that clients trust each other. In contrast, the policy enforcement and accountability features

115 in DataPods are designed with multiple distrusting clients in mind.

EnclaveDB [82], ShieldStore [45], and PESOS [52] provide secure storage using In- tel SGX instead of dedicated secure hardware, yielding better performance. While a promising first step, to our knowledge, none of these systems support federation of database nodes or timeline inspection. PESOS is a low-level object storage system yielding high throughput by relying on trusted storage technologies, a mechanism

DataPods could leverage as well.

5.2.3 Trusted Execution and Decentralized Ledgers

Trusted execution has been studied by previous work in the context of decentralized ledgers. Zhang et al. [107] show how to build authenticated data feeds (or “oracles”) using secure hardware. Here, a blockchain relies on a trusted execution environment to query regular websites using HTTPS. This allows for smart contracts to argue about data that originates outside the blockchain, such as the weather or election results.

Ekiden [14] and the Oasis chain [83] were the first to enable confidential execution of smart contracts. Here, the blockchain merely agrees on an ordering of transactions and dedicated compute nodes then execute the transactions inside a TEE to ensure con- fidentiality. Teechain [60] enables scalable payment channels using TEEs. Instead of time-bound challenge periods, such as used in the Lighting Payment Network and also in BitWeave and DataPods, Teechain relies on chain replication to prevent rollback attacks.

116 5.3 Scaling Decentralized Systems

5.3.1 Minor Changes to Nakamoto Consensus

Concurrency Inside Blocks

Bitcoin, Ethereum, and also BitWeave in its current form, enforce a serial order of operations in a single block. This limits execution of a block to a single logical core. TTOR (Topological Transaction Ordering Rule), which was used for some time in Bit- coin Cash, loosens this requirement and only enforces a partial order among transac- tions, and, thus, enables validating transactions of a single block concurrently.

However, in our experiments, we observed that the bulk of work performed by blockchain nodes is involved in validating and generating digital signatures, which can already be performed concurrently as transaction requests are usually received and generated out of band. Thus, we do not expect that TTOR would create a significant performance benefit when applied to BitWeave. Bitcoin Cash eventually switched from TTOR to another ordering mechanism as its benefits to block propagation where limited [86].

Block Size and Frequency

As mentioned in Section 1.3.5, one attempt to scale blockchains is to increase block size allowing for more transactions to be processed with the same number of blocks. The propagation delay of depends on its size and performance of the underlying peer- to-peer network. In particular, a larger block takes longer to propagate between two peers as outlined in the equation below.

117 BlockSize Latencyblock = Latencynetwork + T hroughputnetwork

Block sizes that are too large might take longer to propagate then it takes to mine

the next block [22, 40]. As a result, larger block sizes result in a higher likelihood of blockchain forks, which hurt performance. Once a fork is resolved, only one of the branches is considered part of the chain, and the rest is discarded.

Another intuitive attempt of increasing throughput of a blockchain system, is toin- crease the frequency of blocks. Similar to block sizes, a higher block frequency results in an increased change of forks as blocks are created faster than they are being prop-

agated through the network. This in turn also leads to more centralization of mining, as large-scale mining pools have a higher chance of receiving and processing blocks in time.

Ethereum relies on a mechanism called Greedy Heaviest Observed Subtree (GHOST) [93] to disincentivize centralization. Here, if a miner is aware of a fork will reference not

only a block’s direct predecessor but also the heads of competing chains (known as “uncle blocks”). As a result, miners receive a partial reward if they ended up mining on a fork.

Increasing Efficiency of Block Propagation

The nature of peer-to-peer protocols results in significant communication overheads when propagating data. Gossip communication inherently requires additional commu- nication, because data does not flow in a straight but spreads in multiple directions

through a peer-to-peer network. As a result, peers might receive the same messages from multiple parties. This problem is exacerbated in Bitcoin as transaction are propa-

118 gated through the network twice: as a transaction request and as part of a block.

One line of work is to improve the efficiency of Gossip protocols. Compact blocks [17] do not contain a full list of transactions as their payload, but merely shortened trans- action identifiers. Upon receiving a compact block, peers only request the transaction they have not seen yet. We use this mechanism in BitWeave to reduce network over- head.

Bloom filters can be used to efficiently keep track of which data a peer hasalready received [75]. Essentially, bloom filters are a lightweight data structure (usually only a few bytes) that provides a heuristic about whether a set contains some data item or not. When forwarding compact blocks, peers rely on Bloom filters to estimate which transactions the remote party already holds and forward only the ones that it probably does not have yet. This, in turn, avoids an additional round-trip time, where the remote party has to request transactions.

Another line of work is to augment peer-to-peer networks with a fast relay network.

Relay networks are usually not a good fit for the decentralized ledger setting, asthey have sparse topologies, and, thus, contain multiple points of failure. However, they can be used in addition to a fault-tolerant peer-to-peer network, to allow for faster propagation of blocks in the common case [18, 46].

5.3.2 Sharding Decentralized Systems

Several sharding solutions have been proposed for permissionless and permissioned blockchain systems that built on previous work on sharding databases and distributed transactions.

119 Sharding Databases The concepts of sharding and distributed transactions have been widely studied with regard to conventional database systems. Sharding was first pop- ularized by systems like [94] and Mercury [9]. Later work introduced systems, such as Chubby [11], which provide serializable transactions on top of sharded systems. More recent work aims to improve performance by reducing coordination [21, 72] or relying on loosely synchronized clocks [19].

Distributed Transaction Protocols The most prominent mechanism to apply trans- actions in a consistent and atomic fashion to multiple shards is two-phase commit [8]. Here, in the first phase, a transaction first locks all relevant data objects, ensuring that no concurrent updates are made. In the second phase, the transaction applies all changes and releases the locks.

Two-phase commit can be separated into two variants. First, conventional (or pes- simistic) two-phase commit acquires locks gradually while executing a transaction. If a lock is already held by another transaction, some mechanism such as wound-wait must be in place to avoid deadlocks. Optimistic concurrency control [53], on the other hand, first executes transactions without holding locks, then submits the transactions asa set of operations to the involved servers in the first phase of the protocol. The main advantage of OCC is that it keeps the time a lock is held short allowing for higher concurrency. Pessimistic concurrency control usually works better in update-heavy workloads and in settings where latencies are high.

Sharding Protocols for Decentralized Ledgers

Monoxide [100] is an approach that has some similarities with BitWeave. Monoxide breaks up the workload across independent consensus zones, each having its own set

120 of miners. Unlike BitWeave, Monoxide does not support generalized transactions, but only money transfers between exactly two accounts. For a cross-zone transaction, the transactions are first processed in the source account’s zone and then forwarded tothe target account’s zone together with a Merkle proof of the transaction’s inclusion. At some point, the transaction will be included in the source and the target zone, how- ever, the protocol does not provide an upper time-bound for this. In BitWeave, the confirmation period provides a time-bound for when a transaction is expected tobe confirmed. Furthermore, the transaction processing scheme proposed in Monoxide is susceptible to recursive invalidation of dependent transactions in the case of zone-forks.

In BitWeave, keyblocks create consistent cuts which, in combination with the valida- tion delays, prevent such issues. Another challenge with Monoxide’s design is that its independent zones naturally partition the mining power of the blockchain system, which dilutes the overall security of the system. The authors address this by assum- ing the majority of miners will work on all zones at the same time, which requires miners to possess large amounts of processing power for verification to maintain the same security guarantees as Bitcoin. This encourages mining centralization for high throughput, giving up the key property of blockchains.

Elastico [61], OmniLedger [49], and Chainspace [2] are in a similar class of scalabil- ity solutions that propose dividing the nodes in a system into small committees, each of which performs a Byzantine consensus protocol for intra-shard consensus. The Elas- tico protocol, the first of such solutions, proceeds in the following fashion: protection against Sibyls is achieved using an identity chain based on Proof-of-Work. It then pseudo-randomly assigns nodes to committees that perform PBFT in rounds until all the nodes in the system agree on a final changeset to be committed. The protocol then re-assigns committees and restarts the process for the next set of transactions. Chainspace assumes a permissioned system and does not discuss shard committee se-

121 lection. OmniLedger makes further improvements on top of Elastico, such as using RandHound to better seed for randomness in shard assignments and helps ameliorate some security compromises introduced by Elastico’s small committee sizes. However,

OmniLedger still adds several layers of complexity to public blockchains. This com- plexity is especially salient when examining the need for OmniLedger to have day-long epochs because of the amount of overhead required for bootstrapping at the beginning of an epoch, which makes it susceptible to quick-responding attackers.

Zilliqa [95] shards transactions, but not state. This protocol relies on a similar mechanism as Omniledger for assigning nodes to shards, but uses a different cross- shard commit protocol. Instead of splitting the state of the system across shards, they only split the transaction workload and replicate state among all nodes. Each shard then processes a subset of all transaction for a specific epoch and merges their resulting state with other shards at certain checkpoints. At a high level, the protocol allows a particular shard to lock parts of the state to prevent concurrent modification of the same data entries. Zilliqa employs a dataflow-based programming model to implement this scheme efficiently.

Ethereum 2.0 [35] introduces a sharding scheme among other major changes to the protocol. This mechanism borrows ideas from both off-chain mechanisms and randomness-based protocols like OmniLedger. Here, nodes participate in the proto- col by putting down a deposit. A verified random function then assigns each nodetoa particular shard. Additionally to the random assignment, the protocol ensures safety and liveness by punishing nodes that sign an invalid block or respond too slowly. At the time of writing this thesis, the Ethereum developers have not yet decided on a pro- tocol for cross-shard transactions and it is unclear whether the protocol will support serializable cross-shard transactions.

122 5.3.3 Scaling On-Disk Storage

As described in Section 2.2, every participant of a DLT protocol usually maintains the local state in an authenticated data structure, but existing storage engines provide no notion of authenticating data. As a result, systems like Ethereum store unauthenticated data in a key-value store and then store the Merkle tree structure as separate objects in that key-value store as well. Note that updating Merkle trees, and variants such as Patricia trees, usually requires updating multiple nodes of the tree. So while existing improvements to key-value stores [84, 58] can provide some performance improve- ment, those gains may be limited due to the uniqueness of a DLT system’s workload.

Raju et al. [85] propose a variant of LevelDB, that maintains multiple copies of the part of the Merkle tree. The system does not store the Merkle root to allow concurrently updating the authenticated data structure. Reads then retrieve the stored subtree and recompute the root before returning a result.

5.3.4 Federated Chains

Federated chains attempt to scale blockchains by allowing multiple separate chains to work together through a global relay chain or cross-chain swaps. Polkadot [104] is one such mechanism that provides separate blockchains with a relay chain that allow for cross-chain transactions. However, here all cross-chain transactions have to be pro- cessed by the global relay chain. In essence, the global relay chain keeps a queue of all inbound and outbound messages for each subchain to guarantee delivery. Depending on how many cross-chain transactions there are, this can constitute a bottleneck of the system.

123 Cross-chain swaps [41] allow exchanging cryptocurrencies between two separate blockchains without the involvement of a third party. Here, funds are locked on both chains for a certain amount of time. If a chain is provided with proof that the funds on the other chain are locked as well, it considers the transaction as successful, otherwise it aborts and releases the funds on timeout. This, again, assumes a strong bound on network latency.

5.3.5 Sidechains and Off-Chain Mechanisms

Yet another category of scalability solutions for permissionless chains is that of fed- erated chains, or side-chains, which solutions layered on top of existing blockchain systems. In general, these solutions lock funds on some existing system and facilitate the fast transactions between parties through an off-chain protocol. Only the amount locked on the base chain is allowed to be exchanged in these systems, and a tally of balances is kept for when it is time to settle. On settling, the amount apportioned to the settler as denoted by her balance in the sidechain is unlocked on the mainchain and returned to the settler. Plasma [79] is one notable example of a side-chain that can be anchored onto Ethereum. Payment- and state-channels [80, 60, 70] build networks of peer-to-peer relationships to process certain operations without the involvement of the main blockchain.

These approaches are similar to the protocols presented in this thesis at first sight, but differ significantly in functionality and safety. First, BitWeave and DataPodsare holistic protocols that ensure the main chain is always able to process fraud proofs. Most side-channel protocols, on the other hand, merely assume the main chain will al- ways be able to process fraud proofs, which might not be true during high contention.

Second, BitWeave and DataPodsnatively support smart contracts. While state chan-

124 nels do support smart contracts, they are currently very limited in programmability and require knowing all participants of an off-chain contract at setup.

5.4 Emerging Blockchain Protocols

Proof of Stake (PoS) is a mechanism intended to be an energy-efficient replacement for PoW. At a high level, voting power here is not dependent on a party’s processing capabilities but on the amount of funds they hold in the cryptocurrency, which, in turn, allows avoiding unnecessary computation. The key challenge in PoS is the “nothing at stake” problem: if block generation does not require mining, an adversary can easily generate many, potentially conflicting blocks.

Ourboros [44] is a provably-correct PoS protocol. Here, the protocol lifetime is divided into constant-size epochs, each consisting of some number of the slots. At the beginning of an epoch, a seed is generated from the values of the previous epoch, to generate a pseudo-random assignment of participants to slots, where the likelihood of being assigned to a slot is directly proportional to the amount of currency a participant is holding. Every slot, the selected participant is allowed to propose a block containing a set of transactions. Although an adversary here can still generate conflicting blocks, assuming the majority of the participants are honest, a sequence of correct blocks will eventually constitute the longest chain. However, in this scheme, it is still possible to anticipate who the next leader will be and launch a Denial-of-Server (DoS) attack on them. Like Bitcoin, this scheme requires blocks to propagate within a bounded amount of time and, additionally, loosely synchronized clocks.

Algorand [38] addresses some challenges in PoS using a verifiable random function

(VRF). Here, each participant locally runs the VRF which takes some global data and

125 their private key as an input. Depending on the input, the function may return a cer- tificate that the particular user is allowed to propose a block. Like in Bitcoin, multiple users may be allowed to propose blocks and Algorand provides a mechanism to sort certificates of concurrent blocks. Protocol participants then discard all blocks except the one with the highest priority to prevent forks. Algorand prevents DoS attacks by makes this function unpredictable (because it takes the user’s private key as an input) and switching participants after every round of voting. The protocol assumes theab- sence of network partitions to prevent malicious users from successfully proposing conflicting blocks.

Avalanche [87] is a probabilistic leaderless consensus mechanism with low com- munication overhead. Here, nodes periodically query a constant-size random set of peers about which transaction they accepted. Depending on their peers’ responses, they adjust their confidence in the transaction being accepted by the network asa whole. Acceptance of a particular transaction will then eventually propagate through the network. Avalanche works well with the UTXO model as transactions do not need to be totally ordered and conflicting transactions are rare. The protocol needs tobe combined with another mechanism, such as PoS, to ensure Sybil-resistance.

126 CHAPTER 6

Conclusion

Building scalable and secure decentralized applications requires entirely new protocols.

Decentralized architectures are here to stay, as they provide long-desired features, such as high failure resilience and user-centric governance. However, current architectures are not yet sufficient to fulfill this vision. The goal of a fully decentralized infrastruc- ture will not be achieved by a single mechanism but will require a new ecosystem of protocols that work together to complement each other.

This thesis presented two such protocols, that help to close the performance and functionality gap between current decentralized applications and conventional dis- tributed approaches. BitWeave is a scalable public blockchain system that can serve as a drop-in replacement for Bitcoin and Ethereum. It vastly outperforms existing so- lution through the use of sharding and audit mechanisms.

The DataPods protocol allows service providers to host trustless databases that are backed by a decentralized ledger. Applications can operate across data pods using federated transactions and users remain in full control over their data by being able to migrate at any point in time. Additionally, the use of trusted execution environments allows protecting sensitive user data from unwanted access. The data pods protocol is compatible with existing DLT protocols such as BitWeave.

Future work needs to address how to reduce the end-to-end latency of these pro- tocols, how they can be adapted to work with open-source TEEs, and provide more efficient mechanisms to guarantee availability.

127 BIBLIOGRAPHY

[1] Atul Adya. Weak consistency: a generalized theory and optimistic implementa- tions for distributed transactions. PhD thesis, Massachusetts Institute of Tech- nology, 1999.

[2] Mustafa Al-Bassam, Alberto Sonnino, Shehar Bano, Dave Hrycyszyn, and George Danezis. Chainspace: A sharded smart contracts platform. arXiv preprint arXiv:1708.03778, 2017.

[3] Peter Alsberg and J. D. Day. A Principle for Resilient Sharing of Distributed Re- sources. Proceedings of the 2nd International Conference on Software Engineering, pages 562–570, San Francisco, California, October 1976.

[4] Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin. Fraud Proofs: Max- imising Light Client Security and Scaling Blockchains with Dishonest Majorities. CoRR, abs/1809.09044, 2018.

[5] Arvind Arasu, Spyros Blanas, Ken Eguro, Raghav Kaushik, Donald Kossmann, Ravishankar Ramamurthy, and Ramarathnam Venkatesan. Orthogonal Security with Cipherbase. CIDR, 2013.

[6] Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and . Coordination Avoidance in Database Systems. Proceedings of the VLDB Endowment, 8(3):185-196, 2014.

[7] Sumeet Bajaj and Radu Sion. TrustedDB: A Trusted Hardware based Outsourced Database Engine. Proceedings of the VLDB Endowment, 4(12):1359-1362, 2011.

[8] Philip A. Bernstein and Nathan Goodman. Concurrency control in distributed database systems. ACM Computing Surveys (CSUR), 13(2):185–221, 1981.

[9] Ashwin R. Bharambe, Mukesh Agrawal, and Srinivasan Seshan. Mercury: sup- porting scalable multi-attribute range queries. SIGCOMM Conference, pages 353- 366, Portland, Oregon, August 2004.

[10] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F. Wenisch, Yuval Yarom, and Raoul Strackx. Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out- of-Order Execution. USENIX Security Symposium, pages 991-1008, Baltimore, Maryland, August 2018.

128 [11] Michael Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. Symposium on Operating System Design and Implementation, pages 335-350, Seattle, Washington, November 2006.

[12] Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance. Sympo- sium on Operating System Design and Implementation, pages 173-186, New Or- leans, Louisiana, February 1999.

[13] Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q.Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. TVM: An Automated End-to-End Opti- mizing Compiler for Deep Learning. Symposium on Operating System Design and Implementation, pages 578-594, Carlsbad, California, October 2018.

[14] Raymond Cheng, Fan Zhang, Jernej Kos, Warren He, Nicholas Hynes, Noah M. Johnson, Ari Juels, Andrew Miller, and Dawn Song. Ekiden: A Platform for Confidentiality-Preserving, Trustworthy, and Performant Smart Contracts. Eu- ropean Symposium on Security and Privacy, pages 185-200, Stockholm, Sweden, June 2019.

[15] Allen Clement, Edmund L. Wong, Lorenzo Alvisi, Michael Dahlin, and Mirco Marchetti. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. Symposium on Networked System Design and Implementation, pages 153-168, Boston, Massachusetts, April 2009.

[16] The World Wide Web Consortium. ActivityPub Specification. https://github. com/w3c/activitypub (Retrieved May 2020).

[17] Matt Corallo. Compact Block Relay. https://github.com/TheBlueMatt/ bips/blob/master/bip-0152.mediawiki (Retrieved June 2020).

[18] Matt Corallo. The Fast Internet Bitcoin Relay Engine (FIBRE). http://www. bitcoinfibre.org/ (Retrieved June 2020).

[19] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Pe- ter Hochschild, Wilson C. Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Tay- lor, Ruth Wang, and Dale Woodford. Spanner: Google’s Globally Distributed Database. ACM Transactions on Computer Systems, 31(3):8-1, 2013.

129 [20] Victor Costan and Srinivas Devadas. Intel SGX Explained. IACR Cryptology ePrint Archive, 2016.

[21] James Cowling and Barbara Liskov. Granola: low-overhead distributed transac- tion coordination. USENIX Annual Technical Conference, 2012.

[22] Kyle Croman, Christian Decker, Ittay Eyal, Adem Efe Gencer, Ari Juels, Ahmed E. Kosba, Andrew Miller, Prateek Saxena, Elaine Shi, Emin Gün Sirer, Dawn Song, and Roger Wattenhofer. On Scaling Decentralized Blockchains - (A Position Paper). Financial Cryptography and Data Security, pages 106-125, Christ Church, Barbados, February 2016.

[23] Natacha Crooks, Matthew Burke, Ethan Cecchetti, Sitar Harel, Rachit Agar- wal, and Lorenzo Alvisi. Obladi: Oblivious Serializable Transactions in the Cloud. Symposium on Operating System Design and Implementation, pages 727- 743, Carlsbad, California, October 2018.

[24] Alan J. Demers, Daniel H. Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard E. Sturgis, Daniel C. Swinehart, and Douglas B. Terry. Epi- demic Algorithms for Replicated Database Maintenance. ACM Symposium on Principles of Distributed Computing, pages 1-12, Vancouver, Canada, August 1987.

[25] Cynthia Dwork, Nancy A. Lynch, and Larry J. Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288-323, 1988.

[26] Cynthia Dwork and Moni Naor. Pricing via Processing or Combatting Junk Mail. Annual International Cryptology Conference, pages 139-147, Santa Barbara, Cali- fornia, August 1992.

[27] Stefan Dziembowski, Lisa Eckey, Sebastian Faust, and Daniel Malinowski. PE- RUN: Virtual Payment Channels over Cryptographic Currencies. IACR Cryptol- ogy ePrint Archive, 2017.

[28] Ariel Ekblaw, Asaph Azaria, John D. Halamka, and Andrew Lippman. A Case Study for Blockchain in Healthcare: “MedRec” prototype for electronic health records and medical research data. Proceedings of IEEE Open & Big Data Confer- ence, 2016.

[29] Muhammad El-Hindi, Carsten Binnig, Arvind Arasu, Donald Kossmann, and Ravi Ramamurthy. BlockchainDB - A Shared Database on Blockchains. Proceed- ings of the VLDB Endowment, 12(11):1597-1609, 2019.

130 [30] Etherscan. Ethereum Average Gas Limit Chart. https://etherscan.io/ chart/gaslimit (Retrieved June 2020).

[31] Ittay Eyal, Adem Efe Gencer, Emin Gün Sirer, and Robbert van Renesse. Bitcoin- NG: A Scalable Blockchain Protocol. Symposium on Networked System Design and Implementation, pages 45-59, Santa Clara, California, March 2016.

[32] Jose M. Faleiro, Daniel Abadi, and Joseph M. Hellerstein. High Performance Transactions via Early Write Visibility. Proceedings of the VLDB Endowment, 10(5):613-624, 2017.

[33] Internet Engineering Task Force. Extensible Messaging and Presence Protocol (XMPP): Core. Technical Report RFC 6120, 2011.

[34] The Ethereum Foundation. ERC20 Token Standard. https://theethereum. wiki/w/index.php/ERC20_Token_Standard, 2018.

[35] The Ethereum Foundation. Ethereum 2.0 Specifications. https://github.com/ ethereum/eth2.0-specs (Retrieved August 2020).

[36] The Matrix.org Foundation. Matrix Protocol Specification. https://matrix. org/docs/spec/ (Retrived July 2020).

[37] Adem Efe Gencer, Soumya Basu, Ittay Eyal, Robbert van Renesse, and Emin Gün Sirer. Decentralization in Bitcoin and Ethereum Networks. Financial Cryptogra- phy and Data Security, pages 439-457, Porta Blancu, Curaçao, February 2018.

[38] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zel- dovich. Algorand: Scaling Byzantine Agreements for Cryptocurrencies. Sym- posium on Operating Systems Principles, pages 51-68, Shanghai, China, October 2017.

[39] Network Working Group. Simple Mail Transfer Protocol. Technical Report RFC 5321, 2008.

[40] Johannes Göbel and Anthony E. Krzesinski. Increased block size and Bitcoin blockchain dynamics. 27th International Telecommunication Networks and Ap- plications Conference, ITNAC 2017, Melbourne, Australia, November 22-24, 2017, pages 1–6, 2017.

[41] Maurice Herlihy. Atomic Cross-Chain Swaps. ACM Symposium on Principles of Distributed Computing, pages 245-254, London, United Kingdom, July 2018.

131 [42] Harry A. Kalodner, Steven Goldfeder, Xiaoqi Chen, S. Matthew Weinberg, and Edward W. Felten. Arbitrum: Scalable, private smart contracts. USENIX Security Symposium, pages 1353-1370, Baltimore, Maryland, August 2018.

[43] David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin, and Rina Panigrahy. Consistent hashing and random trees: Distributed caching pro- tocols for relieving hot spots on the world wide web. STOC, pages 654–663, 1997.

[44] Aggelos Kiayias, Alexander Russell, Bernardo David, and Roman Oliynykov. Ouroboros: A Provably Secure Proof-of-Stake Blockchain Protocol. Annual Inter- national Cryptology Conference, pages 357-388, Santa Barbara, California, August 2017.

[45] Taehoon Kim, Joongun Park, Jaewook Woo, Seungheun Jeon, and Jaehyuk Huh. ShieldStore: Shielded In-memory Key-value Storage with SGX. European Con- ference on Computer Systems, pages 14-1, Dresden, Germany, March 2019.

[46] Uri Klarman, Soumya Basu, Aleksandar Kuzmanovic, and Emin Gün Sirer. bloXroute: A Scalable Trustless Blockchain Distribution Network. bloXroute white paper, 2018.

[47] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre Attacks: Exploiting Speculative Execution. IEEE Symposium on Security and Privacy, pages 1-19, San Francisco, California, May 2019.

[48] Eleftherios Kokoris-Kogias, Philipp Jovanovic, Nicolas Gailly, Ismail Khoffi, Li- nus Gasser, and Bryan Ford. Enhancing Bitcoin Security and Performance with Strong Consistency via Collective Signing. USENIX Security Symposium, pages 279-296, Austin, Texas, August 2016.

[49] Eleftherios Kokoris-Kogias, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Ewa Syta, and Bryan Ford. OmniLedger: A Secure, Scale-Out, Decentralized Ledger via Sharding. IEEE Symposium on Security and Privacy, pages 583-598, San Fran- cisco, California, May 2018.

[50] Jakub Konečný, H. Brendan McMahan, Daniel Ramage, and Peter Richtárik. Fed- erated optimization: Distributed machine learning for on-device intelligence. arXiv preprint 1610.02527, 2016.

[51] Ramakrishna Kotla, Lorenzo Alvisi, Michael Dahlin, Allen Clement, and Edmund

132 L. Wong. Zyzzyva: speculative byzantine fault tolerance. Symposium on Operat- ing Systems Principles, pages 45-58, Stevenson, Washington, October 2007.

[52] Robert Krahn, Bohdan Trach, Anjo Vahldiek-Oberwagner, Thomas Knauth, Pramod Bhatotia, and Christof Fetzer. Pesos: policy enhanced secure object store. European Conference on Computer Systems, pages 25-1, Porto, Portugal, April 2018.

[53] H. T. Kung and John T. Robinson. On Optimistic Methods for Concurrency Con- trol. International Conference on Very Large Data Bases, page 351, Rio de Janeiro, Brazil, October 1979.

[54] Leslie Lamport. The Part-Time Parliament. ACM Transactions on Computer Sys- tems, 16(2):133-169, 1998.

[55] Leslie Lamport. Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. ACM Transactions on Programming Languages and Systems, 6(2):254- 280, 1984.

[56] Ilia A. Lebedev, Kyle Hogan, and Srinivas Devadas. Invited Paper: Secure Boot and Remote Attestation in the Sanctum Processor. 31st IEEE Computer Security Foundations Symposium, pages 46–60, Oxford, United Kingdom, July 2018.

[57] Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanovic, and Dawn Song. Keystone: an open framework for architecting trusted execution environments. European Conference on Computer Systems, pages 38-1, Heraklion, Greece, April 2020.

[58] Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. KVell: the design and implementation of a fast persistent key-value store. Symposium on Operating Systems Principles, pages 447-461, Huntsville, Ontario, Canada, Octo- ber 2019.

[59] Jinyuan Li, Maxwell N. Krohn, David Mazières, and Dennis E. Shasha. Secure Untrusted Data Repository (SUNDR). Symposium on Operating System Design and Implementation, pages 121-136, San Francisco, California, December 2004.

[60] Joshua Lind, Oded Naor, Ittay Eyal, Florian Kelbert, Emin Gün Sirer, and Peter R. Pietzuch. Teechain: a secure payment network with asynchronous blockchain access. Symposium on Operating Systems Principles, pages 63-79, Huntsville, On- tario, Canada, October 2019.

133 [61] Loi Luu, Viswesh Narayanan, Chaodong Zheng, Kunal Baweja, Seth Gilbert, and Prateek Saxena. A Secure Sharding Protocol For Open Blockchains. Computer and Communications Security, pages 17-30, Vienna, Austria, October 2016.

[62] Umesh Maheshwari, Radek Vingralek, and William Shapiro. How to Build a Trusted Database System on Untrusted Storage. Symposium on Operating System Design and Implementation, pages 135-150, San Diego, California, October 2000.

[63] Kai Mast, Lequn Chen, and Emin Gün Sirer. Scaling Databases through Trusted Hardware Proxies. Proceedings of the 2nd Workshop on System Software for Trusted Execution, SysTEX 2017, Shanghai, China, October 28, 2017, pages 9:1–9:6, 2017.

[64] Kai Mast, Lequn Chen, and Emin Gün Sirer. A Vision for Autonomous Blockchains backed by Secure Hardware. Proceedings of the 4nd Workshop on System Software for Trusted Execution, 2019.

[65] Kai Mast, Keving A. Negy, and Emin Gün Sirer. Using Decentralization to Give Users Real Control Over Their Data. In Submission, 2020.

[66] Kai Mast, Charles Yu, Aaron Chao, Soumya Basu, and Emin Gün Sirer. Audit- Based Sharding for Public Blockchains. In Submission, 2020.

[67] David Mazières and Dennis E. Shasha. Building secure file systems out of Byzan- tine storage. ACM Symposium on Principles of Distributed Computing, pages 108- 117, Monterey, California, July 2002.

[68] Patrick McCorry, Surya Bakshi, Iddo Bentov, Sarah Meiklejohn, and Andrew Miller. Pisa: Arbitration Outsourcing for State Channels. Advances in Financial Technologies, pages 16-30, Zürich, Switzerland, October 2019.

[69] Ralph C. Merkle. A Digital Signature Based on a Conventional Encryption Func- tion. Annual International Cryptology Conference, pages 369-378, Santa Barbara, California, August 1987.

[70] Andrew Miller, Iddo Bentov, Ranjit Kumaresan, and Patrick McCorry. Sprites: Payment Channels that Go Faster than Lightning. CoRR, abs/1702.05812, 2017.

[71] Donald R. Morrison. PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, 15(4):514-534, 1968.

[72] Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. Extracting More

134 Concurrency from Distributed Transactions. Symposium on Operating System Design and Implementation, pages 479-494, Broomfield, Colorado, October 2014.

[73] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. 2008.

[74] Brian M. Oki and Barbara Liskov. Viewstamped Replication: A General Pri- mary Copy. ACM Symposium on Principles of Distributed Computing, pages 8-17, Toronto, Canada, August 1988.

[75] A. Pinar Ozisik, Gavin Andresen, Brian Neil Levine, Darren Tapp, George Bissias, and Sunny Katkuri. Graphene: efficient interactive set reconciliation applied to blockchain propagation. SIGCOMM Conference, pages 303-317, Beijing, China, August 2019.

[76] Rafael Pass and Elaine Shi. Thunderella: Blockchains with optimistic instant confirmation. Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 3–33, 2018.

[77] Yanqing Peng, Min Du, Feifei Li, Raymond Cheng, and Dawn Song. FalconDB: Blockchain-based Collaborative Database. SIGMOD International Conference on Management of Data, pages 637-652, Portland, Oregon, June 2020.

[78] Soujanya Ponnapalli, Aashaka Shah, Amy Tai, Souvik Banerjee, Vijay Chi- dambaram, Dahlia Malkhi, and Michael Wei. Scalable and Efficient Data Au- thentication for Decentralized Systems. arXiv preprint arXiv:1909.11590, 2019.

[79] Joseph Poon and Vitalik Buterin. Plasma: Scalable autonomous smart contracts. White paper, 2017.

[80] Joseph Poon and Thaddeus Dryja. The bitcoin lightning network: Scalable off-chain instant payments. https://lightning.network/ lightning-network-paper.pdf, 2016.

[81] Raluca A. Popa, Catherine M. S. Redfield, Nickolai Zeldovich, and Hari Balakrish- nan. CryptDB: processing queries on an encrypted database. Communications of the ACM, 55(9):103-111, 2012.

[82] Christian Priebe, Kapil Vaswani, and Manuel Costa. EnclaveDB: A Secure Database Using SGX. IEEE Symposium on Security and Privacy, pages 264-278, San Francisco, California, May 2018.

135 [83] Oasis Protocol Project. The Oasis Blockchain Platform (Whitepaper). https: //docsend.com/view/aq86q2pckrut2yvq (Retrieved June 2020), 2020.

[84] Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. Peb- blesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. Symposium on Operating Systems Principles, pages 497-514, Shanghai, China, October 2017.

[85] Pandian Raju, Soujanya Ponnapalli, Evan Kaminsky, Gilad Oved, Zachary Keener, Vijay Chidambaram, and Ittai Abraham. mLSM: Making Authenticated Storage Faster in Ethereum. Workshop on Hot Topics in Storage and File Systems, Boston, Massachusetts, July 2018.

[86] Jamie Redman. BCH Upgrades: What’s New and What’s Next. https: //news.bitcoin.com/bch-upgrades-whats-new-and-whats-next/ (Re- trieved June 2020), 2018.

[87] Team Rocket, Maofan Yin, Kevin Sekniqi, Robbert van Renesse, and Emin Gün Sirer. Scalable and Probabilistic Leaderless BFT Consensus through Metastability. arXiv preprint 1906.08936, 2019.

[88] Phillip Rogaway. Formalizing Human Ignorance: Collision-Resistant Hashing without the Keys. IACR Cryptol. ePrint Arch., 2006:281, 2006.

[89] Fred B. Schneider. Implementing Fault-Tolerant Services Using the State Ma- chine Approach: A Tutorial. ACM Computing Surveys, 22(4):299-319, 1990.

[90] Michael Schwarz, Samuel Weiser, Daniel Gruss, Clémentine Maurice, and Stefan Mangard. Malware Guard Extension: abusing Intel SGX to conceal cache attacks. Cybersecurity, 3(1):2, 2020.

[91] Dennis E. Shasha, François Llirbat, Eric Simon, and Patrick Valduriez. Trans- action Chopping: Algorithms and Performance Studies. ACM Transactions on Database Systems, 20(3):325-363, 1995.

[92] Yongren Shi, Kai Mast, Ingmar Weber, Agrippa Kellum, and Michael W. Macy. Cultural Fault Lines and Political Polarization. Proceedings of the 2017 ACM on Web Science Conference, WebSci 2017, Troy, NY, USA, June 25 - 28, 2017, pages 213–217, 2017.

[93] Yonatan Sompolinsky and Aviv Zohar. Secure High-Rate Transaction Processing

136 in Bitcoin. Financial Cryptography and Data Security, pages 507-527, San Juan, Puerto Rico, January 2015.

[94] Ion Stoica, , David R. Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet ap- plications. SIGCOMM Conference, pages 149-160, San Diego, California, August 2001.

[95] Zilliqa Team. The Zilliq Technical Whitepaper. Technical Report, 2017.

[96] Stephen Tu, M. Frans Kaashoek, Samuel Madden, and Nickolai Zeldovich. Pro- cessing Analytical Queries over Encrypted Data. Proceedings of the VLDB Endow- ment, 6(5):289-300, 2013.

[97] Robbert van Renesse and Fred B. Schneider. Chain Replication for Supporting High Throughput and Availability. Symposium on Operating System Design and Implementation, pages 91-104, San Francisco, California, December 2004.

[98] Stepahn van Schaik, Andrew Kwong, Daniel Genkin, and Yuval Yarom. SGAxe: How SGX Fails in Practice. https://sgaxeattack.com/, 2020.

[99] Gang Wang, Zhijie Jerry Shi, Mark Nixon, and Song Han. SoK: Sharding on Blockchain. Advances in Financial Technologies, pages 41-61, Zürich, Switzerland, October 2019.

[100] Jiaping Wang and Hao Wang. Monoxide: Scale out Blockchains with Asyn- chronous Consensus Zones. Symposium on Networked System Design and Imple- mentation, pages 95-112, Boston, Massachusetts, February 2019.

[101] Shawn Wilkinson, Jim Lowry, and Tome Boshevski. Metadisk a blockchain- based decentralized file storage application. STORJ, Technical Report, 2014.

[102] Joon Ian Wong. The ethereum network is getting jammed up because people are rushing to buy cartoon cats on its blockchain. https://qz.com/1145833 (accessed in May 2020), 2017.

[103] Gavin Wood. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 151:1–32, 2014.

[104] Gavin Wood. Polkadot: Vision for a heterogeneous multi-chain framework. White Paper, 2016.

137 [105] Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan-Gueta, and Ittai Abra- ham. HotStuff: BFT Consensus with Linearity and Responsiveness. ACM Sym- posium on Principles of Distributed Computing, pages 347-356, Toronto, Canada, July 2019.

[106] Aydan R. Yumerefendi and Jeffrey S. Chase. Strong Accountability for Network Storage. Conference on File and Storage Technologies, pages 77-92, San Jose, Cali- fornia, February 2007.

[107] Fan Zhang, Ethan Cecchetti, Kyle Croman, Ari Juels, and Elaine Shi. Town Crier: An Authenticated Data Feed for Smart Contracts. Computer and Communications Security, pages 270-282, Vienna, Austria, October 2016.

[108] Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, and Dawn Song. The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks. arXiv 1911.07135, 2019.

[109] Guy Zyskindand, Nathan Oz, and others. Decentralizing privacy: Using blockchain to protect personal data. Security and Privacy Workshops (SPW), 2015 IEEE, pages 180–184, 2015.

138