
Practical Byzantine Fault Tolerance Miguel Castro January 31, 2001 c Massachusetts Institute of Technology 2001 This research was supported in part by DARPA under contract DABT63-95-C-005,monitored by Army Fort Huachuca, and under contract F30602-98-1-0237 monitored by the Air Force Research Laboratory. The author was supported by a fellowship from the Portuguese Ministry for Science and Technology, and by a fellowship from the Calouste Gulbenkian Foundation. Massachusetts Institute of Technology Laboratory for Computer Science Cambridge, Massachusetts, USA Practical Byzantine Fault Tolerance by Miguel Castro Abstract Our growing reliance on online services accessible on the Internet demands highly-available systems that provide correct service without interruptions. Byzantine faults such as software bugs, operator mistakes, and malicious attacks are the major cause of service interruptions. This thesis describes a new replication algorithm, BFT, that can be used to build highly-available systems that tolerate Byzantine faults. It shows, for the first time, how to build Byzantine-fault-tolerant systems that can be used in practice to implement real services because they do not rely on unrealistic assumptions and they perform well. BFT works in asynchronous environments like the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1 ¡ 3 of the replicas become faulty within a small window of vulnerability. The window may increase under a denial-of-service attack but the algorithm can detect and respond to such attacks and it can also detect when the state of a replica is corrupted by an attacker. BFT has been implemented as a generic program library with a simple interface. The BFT library provides a complete solution to the problem of building real services that tolerate Byzantine faults. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations. The most important optimization is the use of symmetric cryptography to authenticate messages. Public-key cryptography, which was the major bottleneck in previous systems, is used only to exchange the symmetric keys. The performance results show that BFS performs 2% faster to 24% slower than production implementations of the NFS protocol that are not replicated. Therefore, we believe that the BFT library can be used to build practical systems that tolerate Byzantine faults. Keywords: algorithms, analytic modelling, asynchronous systems, Byzantine faults, correct- ness proofs, fault tolerance, high availability, integrity, performance, proactive security, replication, and security. This report is a minor revision of the dissertation of the same title submitted to the Department of Electrical Engineering and Computer Science on November 30, 2000, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in that department. The thesis was supervised by Professor Barbara Liskov. Acknowledgments First, I must thank my thesis supervisor, Barbara Liskov, for her constant support and wise advice. I feel very fortunate for having had the chance to work closely with her. The other members of my thesis committee, Frans Kaashoek, Butler Lampson, and Nancy Lynch suggested many important improvements to this thesis and interesting directions for future work. I greatly appreciate their suggestions. It has been a pleasure to be a graduate student in the Programming Methodology Group. I want to thank all the group members: Atul Adya, Sarah Ahmed, Sameer Ajmani, Ron Bodkin, Philip Bogle, Chandrasekhar Boyapati, Dorothy Curtis, Sanjay Ghemawat, Robert Gruber, Kyle Jamieson, Paul Jonhson, Umesh Maheshwari, Andrew Myers, Tony Ng, Rodrigo Rodrigues, Liuba Shrira, Ziqiang Tang, Zheng Yang, Yan Zhang, and Quinton Zondervan. Andrew and Atul deserve special thanks for the many stimulating discussions we had. I also want to thank Rodrigo for reading my formal proof, and for his help in handling the details of the thesis submission process. I am grateful to my parents for their support over the years. My mother was always willing to drop everything and cross the ocean to help us, and my father is largely responsible for my interest in computers and programming. Above all, I want to thank my wife, Ines,ˆ and my children, Madalena, and Gonc¸alo. They made my life at MIT great. I felt so miserable without them during my last two months at MIT that I had to finish my thesis and leave. Contents 1 Introduction 11 1.1 Contributions ¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 12 1.2 Thesis Outline ¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 14 2 BFT-PK: An Algorithm With Signatures 15 2.1 System Model ¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 15 2.2 Service Properties ¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 16 2.3 The Algorithm ¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 18 2.3.1 Quorums and Certificates ¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 19 2.3.2 The Client ¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 19 2.3.3 Normal-Case Operation ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 20 2.3.4 Garbage Collection ¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 22 2.3.5 View Changes ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 23 2.4 Formal Model ¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 26 2.4.1 I/O Automata ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 26 2.4.2 System Model ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 26 2.4.3 Modified Linearizability ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 29 2.4.4 Algorithm Specification ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 31 3 BFT: An Algorithm Without Signatures 39 3.1 Why it is Hard to Replace Signatures by MACs ¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 39 3.2 The New Algorithm ¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 40 3.2.1 Authenticators ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 41 3.2.2 Normal-Case Operation ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 42 3.2.3 Garbage Collection ¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 43 3.2.4 View Changes ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 43 3.2.5 View Changes With Bounded Space ¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 48 4 BFT-PR: BFT With Proactive Recovery 52 4.1 Overview ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 52 4.2 Additional Assumptions ¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 53 4.3 Modified Algorithm ¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 54 4.3.1 Key Exchanges ¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 55 4.3.2 Recovery ¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 55 4.3.3 Improved Service Properties ¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 58 5 Implementation Techniques 60 5.1 Optimizations ¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 60 7 5.1.1 Digest Replies ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 60 5.1.2 Tentative Execution ¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 61 5.1.3 Read-only Operations ¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 62 5.1.4 Request Batching ¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 63 5.1.5 Separate Request Transmission ¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 64 5.2 Message Retransmission ¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 64 5.3 Checkpoint Management ¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 66 5.3.1 Data Structures ¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 66 5.3.2 State Transfer ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 67 5.3.3 State Checking ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 69 5.4 Non-Determinism ¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 70 5.5 Defenses Against Denial-Of-Service Attacks ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 71 6 The BFT Library 72 6.1 Implementation ¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 72 6.2 Interface ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 75 6.3 BFS: A Byzantine-Fault-tolerant File System ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 76 7 Performance Model 78 7.1 Component Models ¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 78 7.1.1 Digest Computation ¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 78 7.1.2 MAC Computation ¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 78 7.1.3 Communication ¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 79 7.2 Protocol Constants ¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 81 7.3 Latency ¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 81 7.3.1 Read-Only Operations ¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 82 7.3.2 Read-Write Operations ¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 84 7.4 Throughput ¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢ 86 7.4.1 Read-Only Requests ¢¤¢¥¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢£¢¥¢¤¢£¢£¢¥¢£¢
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages172 Page
-
File Size-