Contemporary Approaches to Fault Tolerance

news Science | DOI:10.1145/1538788.1538794 Alex Wright Contemporary Approaches N to Fault Tolerance Thanks to computer scientists like Barbara Liskov, researchers are making major progress with cost-efficient fault tolerance for Web-based systems. S MORE AND more data moves The Byzantine Generals Problem. into the cloud, many developers find themselves Figure 1 Figure 2 grappling with the pros- Lieutenant 2 as the Traitor Commander as the Traitor pect of system failure at ever-widening scales. A Commander Commander When distributed systems first started appearing in the late 1970s and early 1980s, they typically involved a small, fixed number of servers running “Attack” “Attack” “Attack” “Retreat” in a carefully managed environment. By contrast, today’s Web-based distributed systems often involve thousands or hundreds of thousands of servers Lieutenant 1 Lieutenant 2 Lieutenant 1 Lieutenant 2 coming on and offline at unpredict- “He said “He said able intervals, hosting multiple stored retreat” retreat” objects, services, and applications that 1982) DOI: 10.1145/357172.357176 often cross organizational boundaries uly In the Byzantine Generals Problem, as defined by Leslie Lamport, Robert Shostak, and over the Internet. 3 (J Marshall Pease in their 1982 paper, a general must communicate his order to attack or retreat “In a cloud we have relatively few to his lieutenants, but any number of participants, including the general, could be a traitor. SSUE .4, I sites that are loaded with a huge num- OL ber of processors,” says Danny Dolev, volves trade-offs in terms of cost, per- Fortunately, the research commu- a computer science professor at The formance, and development time. nity has been making major strides Hebrew University of Jerusalem. “Fault As Web systems grow, those trade- in this area of late, thanks in part to ” ACM TOPLAS V TOPLAS ” ACM tolerance needs to provide survivability offs loom larger and larger. “Fault-tol- the contributions of ACM A.M. Turing and security within a cloud and across erant systems have always been diffi- Award winner Barbara Liskov of Mas- ROBLEM P clouds.” cult to build,” says University of North sachusetts Institute of Technology, In this deeply intertwined environ- Carolina at Chapel Hill computer sci- whose breakthrough work in applying enerals G ment, software designers have to plan ence professor Mike Reiter. “Getting Byzantine fault tolerance (BFT) meth- for a bewildering array of potential a fault-tolerant system to perform as ods to the Internet has helped point ANTINE Z Y B failure points. Building large-scale well as a non-fault-tolerant one is a the way to cost-efficient fault tolerance E “Th fault-tolerant systems inevitably in- challenge.” for Web-based systems. JULY 2009 | VOL. 52 | NO. 7 | COMMUNICATIONS OF THE ACM 13 news While researchers have developed ization by Leslie Lamport and later underlie most contemporary work on a number of different approaches to surveyed by Fred Schneider. Lamport’s fault tolerance. However, most of these fault tolerance over the years, ultimate- work eventually led to the Paxos pro- projects involved relatively small, fixed ly they all share a common strategy: tocol, a descendant of which is now in clusters of machines. “In this environ- redundancy. While hardware systems use at Google and elsewhere. Lamport ment you only had to worry that the can employ redundancy at multiple lev- used the term “Byzantine” to describe machine you stored your data on might els, such as the central processing unit, the array of possible faults that could have crashed,” Liskov recalls, “but it memory, and firmware, fault-tolerant bedevil a system. The term derives wasn’t going to tell you lies.” software design largely comes down from the Byzantine Generals Problem, With the rise of the Internet in the to creating mechanisms for consistent a logic puzzle in which a group of gen- mid-1990s, the problem of “lies”— data replication. erals must agree on a battle plan, even or malicious hacks—rose to the fore. One of the most common ap- though one or more of the generals Whereas once state machines could proaches to software replication in- may be a traitor. The challenge is to trust each other’s messages, they now volves a method known as state ma- develop an effective messaging sys- had to support an additional layer of chine replication. With state machine tem that will outsmart the traitors and confirmation to allow for the possibil- replication, any service provided by a ensure execution of the battle plan. ity that one or more of the state ma- computer can be described as a state The solution, in a nutshell, involves chines might have been hacked. machine, which accepts commands redundancy. Two groups of developers began ex- from other client machines that alter While Lamport’s work has proved ploring ways of applying state-machine the state machine. By deploying a set foundational in the subsequent devel- replication techniques to cope with a of replica state machines with identi- opment of Byzantine fault tolerance, growing range of Byzantine failures. cal initial states, subsequent client the basic ideas behind state machine Dahlia Malkhi and Mike Reiter intro- commands can be processed by the replication were also implemented in duced a data-centric approach known replicas in a pre-determined fashion, other early systems. In the early 1980s, as the Byzantine quorum systems prin- so that all state machines eventually Ken Birman pursued a related line ciple. In contrast to active-replication reach the same state. Thus, the fail- of work known as Virtual Synchrony approaches like the Paxos protocol, ure of any one state machine can be with the ISIS system. This approach Byzantine quorum systems focus on masked by the surviving machines. establishes rules for replication that identifying a set of servers, rather than The origins of this approach to behave indistinguishably from a non- focusing on the messages, and choos- fault tolerance stretch back to the replicated system running on a single, ing a set of servers so that they intersect 1970s when researchers at SRI Inter- nonfaulty node. The ISIS approach in specific ways to ensure redundancy. national began exploring the question eventually found its way into several In the mid-1990s, Liskov started of how to fly mission-critical aircraft other systems, including the CORBA her breakthrough work on practi- using an assembly of computers. That fault-tolerance architecture. cal Byzantine fault tolerance (PBFT), work laid the foundation for contem- At about the same time, Liskov de- an extension of her earlier work on porary approaches to fault tolerance veloped viewstamped replication, a viewstamped replication that adapted by establishing the fundamental dif- protocol designed to address benign the Paxos replication protocol to cope ference between timely systems, in failures, such as when a message gets with arbitrary failures. Liskov’s ap- which network transmission times lost but there’s no malicious intent. proach demonstrated that Byzantine are bounded and clocks are synchro- These pioneering efforts all laid the approaches could scale cost-effective- nized, and asynchronous systems, in foundation for an approach to state ly, sparking renewed interest in the which communication latencies have machine replication that continues to systems research community. infinite-tail distribution (most mes- While the foundational principles sages arrive within a certain time limit of consistency and replication remain but, with decreasingly low probability, Practical Byzantine essential, the rapid growth of Web sys- messages may be delayed in transit tems is introducing important new beyond any bound). fault tolerance challenges. Many researchers are find- The SRI work also helped draw im- provides a useful ing that PBFT provides a useful frame- portant distinctions between the vari- work for developing fault-tolerant Web ous types of faults experienced in a framework for systems. “I’m really excited about the system, such as message omissions, developing recent work Barbara and her colleagues machine crashes, or arbitrary faults have done on making Byzantine Agree- due to software malfunction or other fault-tolerant ment into a practical tool—one that we undetected data alterations. Finally, Web systems. can use even in large-scale settings,” the SRI work helped to characterize says Birman, a professor of computer resilience bounds, or how many ma- science at Cornell University. chines are needed to tolerate certain Inspired by Lamport and Liskov’s failures. foundational work, Hebrew Universi- The idea of state machine replica- ty’s Dolev has been working on an ap- tion was given its first abstract formal- proach involving polynomial solutions 14 COMMUNICATIONS OF THE ACM | JULY 2009 | VOL. 52 | NO. 7 news to the general Byzantine agreement Cloud Computing problem. While his early work in this “The Web is going area 25 years ago seemed largely theo- Cloning retical, he is now finding practical ap- live,” says Ken plications for these approaches on the Web. “My theoretical work was ignited Birman. “This is Smart- by Leslie [Lamport],” he says. “Barba- going to change ra’s work brought me to look again at phones the practicality of the solutions.” the picture for At Microsoft, researcher Rama replication, creating A pair of scientists at Intel Kotla has proposed a new BFT replica- Research Berkeley have a demand from developed CloneCloud, which tion protocol known as Zyzzyva, that creates an identical clone of an strives to improve performance by us- average users.” individual’s smartphone that ing a technique called speculation to resides in a cloud-computing achieve low performance overheads. environment. Created by Intel researchers Kotla is also exploring a complemen- Byung-Gon Chun and Petros tary technique called high throughput Maniatis, CloneCloud uses BFT that exploits parallelism to im- a smartphone’s Internet connection to communicate prove the performance of a replicated cost of doing business on the Web.

Load more