Sok: Understanding BFT Consensus in the Age of Blockchains

SoK: Understanding BFT Consensus in the Age of Blockchains Gang Wang Email: [email protected] Abstract—Blockchain as an enabler to current Internet infrastructure The consensus protocol is the core of blockchain to provide has provided many unique features and revolutionized current distributed agreement services, whose efficiency highly affects the performance systems into a new era. Its decentralization, immutability, and trans- and scalability of a blockchain system. Without trusted intermediaries, parency have attracted many applications to adopt the design philosophy of blockchain and customize various replicated solutions. Under the hood the parties of blockchain may behave arbitrarily and deviate from of blockchain, consensus protocols play the most important role to achieve the consensus procedures, in which we can literately consider them distributed replication systems. The distributed system community has in a Byzantine environment. Blockchain can benefit from many extensively studied the technical components of consensus to reach technologies developed for reaching consensus, replicating state, and agreement among a group of nodes. Due to trust issues, it is hard to design a resilient system in practical situations because of the existence broadcasting transactions, but in cases that network connectivity is of various faults. Byzantine fault-tolerant (BFT) state machine replication uncertain, nodes may crash or be subverted by an adversary. Though (SMR) is regarded as an ideal candidate that can tolerate arbitrary faulty there are many proof-based consensus protocols for blockchain assist- behaviors. However, the inherent complexity of BFT consensus protocols ing to solve these issues, e.g., Proof-of-Work (PoW) in Bitcoin [5], and their rapid evolution makes it hard to practically adapt themselves they are typically not energy efficient and may cause power starvation. into application domains. There are many excellent Byzantine-based replicated solutions and ideas that have been contributed to improving Fortunately, Byzantine fault-tolerant (BFT) state machine replication performance, availability, or resource efficiency. This paper conducts a (SMR) offers some opportunities to design consensus protocols that systematic and comprehensive study on BFT consensus protocols with a can tolerate arbitrary faults [6]. Under the hood of BFT SMR, it specific focus on the blockchain era. We explore both general principles replicates the state of each replica among the replication system. The and practical schemes to achieve consensus under Byzantine settings. We then survey, compare, and categorize the state-of-the-art solutions to capacity to tolerate arbitrary faults makes the BFT replicated system understand BFT consensus in detail. For each representative protocol, a reality when building some practical and critical applications. we conduct an in-depth discussion of its most important architectural However, designing an actual BFT system is not an easy task, due to building blocks as well as the key techniques they used. We aim that its inherent complexity. this paper can provide system researchers and developers a concrete view of the current design landscape and help them find solutions to In general, a consensus protocol must meet three requirements [7]: concrete problems. Finally, we present several critical challenges and (a) Non-triviality. If a correct entity outputs a value v, then some some potential research directions to advance the research on exploring BFT consensus protocols in the age of blockchains. entity proposed v; (b) Safety. If a correct entity outputs a value v, then all correct entities output the same value v; (c) Liveness. If all correct entities initiated the protocol, then, eventually, all correct entities I. INTRODUCTION output some value. Later, Fisher, Lynch, and Paterson (FLP) [8] proved that a deterministic agreement protocol in an asynchronous Blockchain, as a core of many cryptocurrencies, has advanced as network cannot guarantee liveness if one entity may crash, even a key disruptive innovation with the potential to revolutionize most when links are assumed to be reliable. This is the well-known FLP industries. Its key features of integrity, immutability, transparency, impossibility for asynchronous systems. In an asynchronous system, and decentralization have attracted many applications to explore the one cannot distinguish between a crashed node and a correct one. opportunities on the blockchain. Essentially, blockchain is a repli- Theoretically, deciding the full network’s state and deducing from cated, decentralized, trustworthy, and immutable ledger technology, it an agreed-upon output is impossible. However, there exist some so that data recorded on the blockchain cannot be deleted or modified, extensions to circumvent the FLP result to achieve an asynchronous and the correctness of data can be verified in a distributed and consensus, e.g., randomization, timing assumptions, failure detectors, decentralized manner. It allows a group of participating parties that and strong primitives [9]. Over two decades of development, BFT do not trust each other to provide trustworthy and immutable services algorithms have evolved into a wide range of protocols and appli- without the presence of a trusted intermediary [1]. Blockchain stands cations. However, these progress were typically designed specifically in the tradition of distributed protocols for both secure multiparty for some closed groups according to detailed application scenarios. computation and replicated services for tolerating faults [2]. With blockchain, a group of parties can act as a dependable and trusted We distinguish two types of fault-tolerant consensus: crash fault- third party for maintaining shared states, mediating exchanges, and tolerant consensus (aka. CFT) and non-crash (Byzantine) fault- providing secure computing engines [3]. Consensus is one of the most tolerant consensus (aka. BFT) [10]. Different failure models have important problems in any blockchain system, as in any distributed been considered in the literature, and they have distinct behaviors. In systems where multiple nodes must reach an agreement, even in the general, a crash fault is where a machine simply stops all computation presence of faults. Many existing consensus algorithms are mostly and communication, and a non-crash fault is where it acts arbitrarily, applicable to small-scale systems, such as the Practical Byzantine but cannot break the cryptographic primitives, such as cryptographic Fault Tolerance protocol (PBFT) [4], which works well with fewer hashes, MACs, message digests, and digital signatures. For instance, than 20 participating nodes. However, when extending the same in a crash fault model, nodes may fail at any time. When a node protocol to a large-scale scenario, its performance may become fails, it stops processing, sending, or receiving messages. Typically, unacceptable or even totally unworkable. failed nodes remain silent forever although some distributed protocols have considered node recovery. Tolerating crash faults (e.g., corrupted G. Wang was with the University of Connecticut, Storrs, CT 06269 USA. participating nodes) as well as network faults (e.g., network partitions 1 or asynchrony) reflects the inability of otherwise correct machines some preliminary information on Byzantine general problems, BFT to communicate with each other in a timely manner. This reflects algorithms, distributed consensus, and distributed ledger technologies. how a typical CFT fault affects the system functionalities. Classic Section III discusses some commonly used systems models on CFT and BFT explicitly model machine faults only. And these faulty network’s synchrony and adversary ability. Section IV details the models can be combined with some orthogonal network models, e.g., existing Byzantine fault-tolerant protocols with clear classification synchronous or asynchronous networks. Thus, the related work can and description. Section V presents the essential components of BFT be roughly classified into four categories: synchronous CFT [11], protocols. Section VI outlines some well-known blockchain consensus asynchronous CFT [12], synchronous BFT [13], and asynchronous algorithms, including PoX and BFT. Section VII discusses some BFT [14] [15]. The Byzantine setting is of relevance to security- challenges in applying BFT to blockchain. Section VIII provides critical settings and traditional consensus protocols that tolerate crash some discussion and future directions in this domain, and section IX failures only. concludes this paper. BFT consensus protocols as the core part of blockchain directly II. PRELIMINARIES decide if the blockchain technology can be largely applied to practical applications. In the literature, many works discuss various aspects This section provides some preliminary information on Byzantine of Byzantine-related protocols, from theory to practical prototype fault tolerance, such as Byzantine generals problem, BFT algorithms, deployment. Although applying BFT protocols to the blockchain is and reliable distributed systems. promising, it still faces many design challenges when taking the specific requirements of blockchain into consideration. For example, A. Byzantine Generals Problem blockchain typically requires the ordered sequences of transactions in the form of a block including the previous block hash, while some The Byzantine Generals Problem (BGP) was first introduced by generic BFT algorithms do not need to

Load more