<<

news

Science | doi:10.1145/1538788.1538794 Alex Wright Contemporary Approaches N to Fault Tolerance Thanks to computer scientists like , researchers are making major progress with cost-efficient fault tolerance for Web-based systems.

s more and more data moves The Byzantine Generals Problem. into the cloud, many de- velopers find themselves Figure 1 Figure 2 grappling with the pros- Lieutenant 2 as the Traitor Commander as the Traitor pect of system failure at ever-widening scales. A Commander Commander When distributed systems first started appearing in the late 1970s and early 1980s, they typically involved a small, fixed number of servers running “Attack” “Attack” “Attack” “Retreat” in a carefully managed environment. By contrast, today’s Web-based distrib- uted systems often involve thousands or hundreds of thousands of servers Lieutenant 1 Lieutenant 2 Lieutenant 1 Lieutenant 2 coming on and offline at unpredict- “He said “He said able intervals, hosting multiple stored retreat” retreat” objects, services, and applications that often cross organizational boundaries In the Byzantine Generals Problem, as defined by , Robert Shostak, and over the Internet. Marshall Pease in their 1982 paper, a general must communicate his order to attack or retreat “In a cloud we have relatively few to his lieutenants, but any number of participants, including the general, could be a traitor. sites that are loaded with a huge num- 1982) DOI: 10.1145/357172.357176 ol .4, I ssue 3 (J uly ber of processors,” says Danny Dolev, volves trade-offs in terms of cost, per- Fortunately, the research commu- a professor at The formance, and development time. nity has been making major strides Hebrew University of Jerusalem. “Fault As Web systems grow, those trade- in this area of late, thanks in part to tolerance needs to provide survivability offs loom larger and larger. “Fault-tol- the contributions of ACM A.M. Turing and security within a cloud and across erant systems have always been diffi- Award winner Barbara Liskov of Mas- clouds.” cult to build,” says University of North sachusetts Institute of Technology, In this deeply intertwined environ- Carolina at Chapel Hill computer sci- whose breakthrough work in applying ment, software designers have to plan ence professor Mike Reiter. “Getting tolerance (BFT) meth- for a bewildering array of potential a fault-tolerant system to perform as ods to the Internet has helped point ” ACM TOPLAS V TOPLAS P roblem ” ACM y z antine G enerals failure points. Building large-scale well as a non-fault-tolerant one is a the way to cost-efficient fault tolerance

“Th e B fault-tolerant systems inevitably in- challenge.” for Web-based systems.

july 2009 | vol. 52 | no. 7 | communications of the acm 13 news

While researchers have developed ization by Leslie Lamport and later underlie most contemporary work on a number of different approaches to surveyed by Fred Schneider. Lamport’s fault tolerance. However, most of these fault tolerance over the years, ultimate- work eventually led to the pro- projects involved relatively small, fixed ly they all share a common strategy: tocol, a descendant of which is now in clusters of machines. “In this environ- redundancy. While hardware systems use at Google and elsewhere. Lamport ment you only had to worry that the can employ redundancy at multiple lev- used the term “Byzantine” to describe machine you stored your data on might els, such as the central processing unit, the array of possible faults that could have crashed,” Liskov recalls, “but it memory, and firmware, fault-tolerant bedevil a system. The term derives wasn’t going to tell you lies.” software design largely comes down from the Byzantine Generals Problem, With the rise of the Internet in the to creating mechanisms for consistent a logic puzzle in which a group of gen- mid-1990s, the problem of “lies”— data replication. erals must agree on a battle plan, even or malicious hacks—rose to the fore. One of the most common ap- though one or more of the generals Whereas once state machines could proaches to software replication in- may be a traitor. The challenge is to trust each other’s messages, they now volves a method known as state ma- develop an effective messaging sys- had to support an additional layer of chine replication. With state machine tem that will outsmart the traitors and confirmation to allow for the possibil- replication, any service provided by a ensure execution of the battle plan. ity that one or more of the state ma- computer can be described as a state The solution, in a nutshell, involves chines might have been hacked. machine, which accepts commands redundancy. Two groups of developers began ex- from other client machines that alter While Lamport’s work has proved ploring ways of applying state-machine the state machine. By deploying a set foundational in the subsequent devel- replication techniques to cope with a of replica state machines with identi- opment of Byzantine fault tolerance, growing range of Byzantine failures. cal initial states, subsequent client the basic ideas behind state machine Dahlia Malkhi and Mike Reiter intro- commands can be processed by the replication were also implemented in duced a data-centric approach known replicas in a pre-determined fashion, other early systems. In the early 1980s, as the Byzantine quorum systems prin- so that all state machines eventually Ken Birman pursued a related line ciple. In contrast to active-replication reach the same state. Thus, the fail- of work known as Virtual Synchrony approaches like the Paxos protocol, ure of any one state machine can be with the ISIS system. This approach Byzantine quorum systems focus on masked by the surviving machines. establishes rules for replication that identifying a set of servers, rather than The origins of this approach to behave indistinguishably from a non- focusing on the messages, and choos- fault tolerance stretch back to the replicated system running on a single, ing a set of servers so that they intersect 1970s when researchers at SRI Inter- nonfaulty node. The ISIS approach in specific ways to ensure redundancy. national began exploring the question eventually found its way into several In the mid-1990s, Liskov started of how to fly mission-critical aircraft other systems, including the CORBA her breakthrough work on practi- using an assembly of computers. That fault-tolerance architecture. cal Byzantine fault tolerance (PBFT), work laid the foundation for contem- At about the same time, Liskov de- an extension of her earlier work on porary approaches to fault tolerance veloped viewstamped replication, a viewstamped replication that adapted by establishing the fundamental dif- protocol designed to address benign the Paxos replication protocol to cope ference between timely systems, in failures, such as when a message gets with arbitrary failures. Liskov’s ap- which network transmission times lost but there’s no malicious intent. proach demonstrated that Byzantine are bounded and clocks are synchro- These pioneering efforts all laid the approaches could scale cost-effective- nized, and asynchronous systems, in foundation for an approach to state ly, sparking renewed interest in the which communication latencies have machine replication that continues to systems research community. infinite-tail distribution (most mes- While the foundational principles sages arrive within a certain time limit of consistency and replication remain but, with decreasingly low probability, Practical Byzantine essential, the rapid growth of Web sys- messages may be delayed in transit tems is introducing important new beyond any bound). fault tolerance challenges. Many researchers are find- The SRI work also helped draw im- provides a useful ing that PBFT provides a useful frame- portant distinctions between the vari- work for developing fault-tolerant Web ous types of faults experienced in a framework for systems. “I’m really excited about the system, such as message omissions, developing recent work Barbara and her colleagues machine crashes, or arbitrary faults have done on making Byzantine Agree- due to software malfunction or other fault-tolerant ment into a practical tool—one that we undetected data alterations. Finally, Web systems. can use even in large-scale settings,” the SRI work helped to characterize says Birman, a professor of computer resilience bounds, or how many ma- science at Cornell University. chines are needed to tolerate certain Inspired by Lamport and Liskov’s failures. foundational work, Hebrew Universi- The idea of state machine replica- ty’s Dolev has been working on an ap- tion was given its first abstract formal- proach involving polynomial solutions

14 communications of the acm | july 2009 | vol. 52 | no. 7 news to the general Byzantine agreement Cloud Computing problem. While his early work in this “The Web is going area 25 years ago seemed largely theo- Cloning retical, he is now finding practical ap- live,” says Ken plications for these approaches on the Web. “My theoretical work was ignited Birman. “This is Smart- by Leslie [Lamport],” he says. “Barba- going to change ra’s work brought me to look again at phones the practicality of the solutions.” the picture for At Microsoft, researcher Rama replication, creating A pair of scientists at Intel Kotla has proposed a new BFT replica- Research Berkeley have a demand from developed CloneCloud, which tion protocol known as Zyzzyva, that creates an identical clone of an strives to improve performance by us- average users.” individual’s smartphone that ing a technique called speculation to resides in a cloud-computing achieve low performance overheads. environment. Created by Intel researchers Kotla is also exploring a complemen- Byung-Gon Chun and Petros tary technique called high throughput Maniatis, CloneCloud uses BFT that exploits parallelism to im- a smartphone’s Internet connection to communicate prove the performance of a replicated cost of doing business on the Web. with the phone’s online copy, application. “The reliability of a system increas- which contains its data and Also at Microsoft, director Chandu es with increasing number of toler- applications, up to several Thekkath has been pioneering an al- ated failures,” says Kotla, “but it also gigabits in size, in the cloud. CloneCloud would make ternative approach to fault tolerance increases the cost of the system.” He smartphones significantly faster for Microsoft’s Live Services, creating a suggests that developers look for ways and more powerful, enabling single “configuration master” to coor- to balance costs against the need to them to perform processor- heavy tasks in the cloud. For dinate recovery from machine failures achieve reliability in terms of mean example, Chun and Maniatis’s across multiple data services. The con- time to failure, mean time to detect fail- CloneCloud prototype, running cept of a configuration master also un- ures, and mean time to recover faulty on Google’s Android mobile derlies the design of several other lead- replicas. “We need more research work , conducted a test application involving the ing services in the live services market, in understanding and modeling faults facial recognition of photos. such as Google’s Chubby lock server. in various settings to help system de- Running the application on the Lorezo Alvisi, a professor of com- signers choose the right parameters,” Android smartphone took 100 Kotla says. seconds; the phone’s clone, puter science at the University of operating on a desktop computer Texas at Austin, and colleagues are Further complicating matters is in the cloud, completed the task probing the possibilities of applying the rise of mobile devices that are only in one second. game theory techniques to fault toler- sporadically connected to the Internet. According to the researchers, CloneCloud would also ance problems, while Ittai Abraham, a As people entrust more and more of provide improved smartphone professor of computer science at The their personal data to these devices— security, with virus scans of Hebrew University of Jerusalem, and like financial transactions, messaging, a device’s entire file system colleagues are incorporating security and other sensitive information—the being conducted in the cloud. Moreover, CloneCloud would methods into distributed protocols to challenge of keeping all that data in improve a smartphone’s battery punish rogue participants and deter sync across multiple platforms will life by having cloud-based against the deviation of any collusion continue to escalate. And the problem computers handle the most processor-intensive tasks. among them. of distributed fault tolerance will only The CloneCloud research While these efforts are opening new grow more, well, Byzantine. could help with intelligently research frontiers, they remain square- “The Web is going live,” says Bir- allocating tasks to the most ly rooted in the pioneering work on man, who believes that the coming energy-efficient or fastest processor in a cloud-computing Byzantine fault tolerance that started convergence of sensors, simulators, environment. “There will be a more than three decades ago. Indeed, and mobile devices will drive the need family of heterogeneous devices, many developers are just beginning to for increasingly reliable data replica- and you would like to move the computing job to the one that encounter this foundational research tion. “This is going to change the pic- makes most sense; from that for the first time. “Engineers are start- ture for replication, creating a demand standpoint, it is a great idea,” ing to discover and use these algo- from average users.” When that hap- said Allan Knies, associate rithms instead of writing code by the pens, we may just see fault tolerance director of Intel Research Berkeley, in an interview seat of their pants,” says Lamport. coming out of the clouds and back with Technology Review. Many developers still wrestle with down to earth. The CloneCloud approach the cost and performance trade-offs of could also help create a Alex Wright is a writer and information architect who computing environment that fault tolerance, however, and a number lives and works in City. would make it easier to share of large sites still seem willing to accept data between mobile devices a certain degree of system failure as a © 2009 ACM 0001-0782/09/0700 $10.00 and home-based computers.

july 2009 | vol. 52 | no. 7 | communications of the acm 15