focusfault tolerance Mastering Agreement Problems in Distributed Systems

Michel Raynal, IRISA Mukesh Singhal, Ohio State University

istributed systems are difficult to design and implement because of the unpredictability of message transfer delays and process speeds (asynchrony) and of failures. Consequently, systems de- D signers have to cope with these difficulties intrinsically associated with distributed systems. We present the nonblocking atomic commitment (NBAC) agreement problem as a case study in the context of both synchro- nous and asynchronous distributed systems where processes can fail by Overcoming agreement crashing. This problem lends itself to fur- lem, a concurrency control conflict, and so problems in ther exploration of fundamental mecha- on) a participant cannot locally commit the distributed nisms and concepts such as time-out, multi- transaction, it votes No. A Yes vote means systems is a cast, consensus, and failure detector. We the participant commits to make local up- will survey recent theoretical results of dates permanent if required. The second primary studies related to agreement problems and phase pronounces the order to commit the challenge to show how system engineers can use this in- transaction if all participants voted Yes or systems formation to gain deeper insight into the to abort it if some participants voted No. designers. These challenges they encounter. Of course, we must enrich this protocol sketch to take into account failures to cor- authors focus on Commitment protocols rectly implement the failure atomicity prop- practical In a distributed system, a transaction erty. The underlying idea is that a failed solutions for a usually involves several sites as participants. participant is considered to have voted No. well-known At a transaction’s end, its participants must enter a commitment protocol to commit it Atomic commitment protocols agreement (when everything goes well) or abort it Researchers have proposed several atomic problem—the (when something goes wrong). Usually, this commitment protocols and implemented nonblocking protocol obeys a two-phase pattern (the them in various systems.1,2 Unfortunately, atomic two-phase commit, or 2PC). In the first some of them (such as the 2PC with a main phase, each participant votes Yes or No. If coordinator) exhibit the blocking property in commitment. for any reason (deadlock, a storage prob- some failure scenarios.3,4 “Blocking” means

40 IEEE SOFTWARE July/August 2001 0740-7459/01/$10.00 © 2001 IEEE

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. The Two-Phase Commit Protocol Can Block

A coordinator-based two-phase commit protocol obeys the where noncrashed participants cannot progress because of following message exchanges. First, the coordinator requests a participant crash occurrences. vote from all transaction participants and waits for their an- swers. If all participants vote Yes, the coordinator returns the decision Commit; if even a single participant votes No or Reference failed, the coordinator returns the decision Abort. 1. Ö. Babao˘glu and S. Toueg, “Non-Blocking Atomic Commitment,” Distrib- 1 uted Systems, S. Mullender, ed., ACM Press, New York, 1993, pp. Consider the following failure scenario (see Figure A). Par- 147–166. ticipant p1 is the coordinator and p2, p3, p4, and p5 are the other participants. The coordinator sends a request to all par- Compute D = Commit/Abort ticipants to get their votes. Suppose p4 and p5 answer Yes. Ac- Vote request cording to the set of answers p1 receives, it determines the de- P1 Yes/No Crash cision value D (Commit or Abort) and sends it to p2 and p3, P 2 Crash but crashes before sending it to p4 and p5. Moreover, p2 and Yes/No p crash just after receiving D. So, participants p and p are P 3 4 5 3 Crash blocked: they cannot know the decision value because they Yes ? P4 voted Yes and because p1, p2, and p3 have crashed (p4 and p5 Yes ? cannot communicate with one of them to get D). Participants P5 p4 and p5 cannot force a decision value because the forced

value could be different from D. So, p4 and p5 are blocked un- til one of the crashed participants recovers. That is why the ba- Figure A. A failure scenario for the two-phase commit sic two-phase commit protocol is blocking: situations exist protocol.

that nonfailed participants must wait for the tually detected. Synchronous systems provide recovery of failed participants to terminate such a guarantee, thanks to the upper bounds their commit procedure. For instance, a com- on scheduling and message transfer delays mitment protocol is blocking if it admits ex- (protocols use time-outs to safely detect fail- ecutions in which nonfailed participants can- ures in a bounded time). Asynchronous sys- not decide. When such a situation occurs, tems do not have such bounded delays, but nonfailed participants cannot release re- we can compensate for this by equipping sources they acquired for exclusive use on the them with unreliable failure detectors, as transaction’s behalf (see the sidebar “The we’ll describe later in this article. (For more Two-Phase Commit Protocol Can Block”). on the difference between synchronous and This not only prevents the concerned trans- asynchronous distributed systems, see the re- action from terminating but also prevents lated sidebar.) So, time-outs in synchronous other transactions from accessing locked systems and unreliable failure detectors in data. So, it is highly desirable to devise asynchronous systems constitute building NBAC protocols that ensure transactions blocks offering the liveness guarantee. With will terminate (by committing or aborting) these blocks we can design solutions to the despite any failure scenario. NBAC agreement problem. Several NBAC protocols (called 3PC pro- tocols) have been designed and implemented. Distributed systems and failures Basically, they add handling of failure scenar- A distributed system is composed of a fi- ios to a 2PC-like protocol and use complex nite set of sites interconnected through a subprotocols that make them difficult to un- communication network. Each site has a lo- derstand, program, prove, and test. More- cal memory and stable storage and executes over, these protocols assume the underlying one or more processes. To simplify, we as- distributed system is synchronous (that is, the sume that each site has only one process. process-scheduling delays and message trans- Processes communicate and synchronize by fer delays are upper bounded, and the proto- exchanging messages through the underly- cols know and use these bounds). ing network’s channels. A synchronous distributed system is Liveness guarantees characterized by upper bounds on message Nonblocking protocols necessitate that transfer delays, process-scheduling delays, the underlying system offers a liveness guar- and message-processing time. δ denotes an antee—a guarantee that failures will be even- upper time bound for “message transfer de-

July/August 2001 IEEE SOFTWARE 41

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. A Fundamental Difference between Synchronous and Asynchronous Distributed Systems mous “Generals Paradox” shows that dis- We consider a synchronous distributed system in which a channel con- tributed systems with unreliable communi- nects each pair of processes and the only possible failure is the crash of cation do not admit solutions to the NBAC processes. To simplify, assume that only communication takes time and that problem.2 a process that receives an inquiry message responds to it immediately (in A site or transaction participant can fail ∆ zero time). Moreover, let be the upper bound of the round-trip communi- by crashing. Its state is correct until it cation delay. In this context, a process pi can easily determine the crash of crashes. A participant that does not crash ∆ another process pj. Process pi sets a timer’s time-out period to and sends during the transaction execution is also said an inquiry message to pj. If it receives an answer before the timer expires, it to be correct; otherwise, it is faulty. Because can safely conclude that pj had not crashed before receiving the inquiry we are interested only in commitment and message (see Figure B1). If pi has not received an answer from pj when the not in recovery, we assume that a faulty par- timer expires, it can safely conclude that pj ticipant does not recover. Moreover, what- ∆ has crashed (see Figure B2). ever the number of faulty participants and P In an asynchronous system, processes i the network configuration, we assume that can use timers but cannot rely on them, ir- Inquiry I am here each pair of correct participants can always P respective of the time-out period. This is i communicate. In synchronous systems, because message transfer delays in asyn- (1) crashes can be easily and safely detected by chronous distributed systems have no up- ∆ using time-out mechanisms. This is not the per bound. If pi uses a timer (with some P case in asynchronous distributed systems. ∆ i time-out period ) to detect the crash of pj, the two previous scenarios (Figures B1 and P The consensus problem B2) can occur. However, a third scenario i Crash (2) Consensus6 is a fundamental problem of can also occur (see Figure B3): the timer distributed systems (see the sidebar “Why ∆ expires while the answer is on its way to Consensus Is Important in Distributed Sys- ∆ P pi. This is because is not a correct upper i tems”). Consider a set of processes that can bound for a round-trip delay. fail by crashing. Each process has an input P Only the first and second scenarios can i value that it proposes to the other processes. occur in a synchronous distributed system. (3) The consensus problem consists of design- All three scenarios can occur in an asyn- ing a protocol where all correct processes chronous distributed system. Moreover, in Figure B. Three scenarios of unanimously and irrevocably decide on a such a system, p cannot distinguish the i attempted communication common output value that is one of the in- second and third scenarios when its timer in a distributed system: (1) put values. expires. When considering systems with communication is timely; The consensus problem comprises four process crash failures, this is the funda- (2) process pj has crashed; properties: mental difference between a synchronous (3) link (pi, pj) is slow. and an asynchronous distributed system. Termination. Every correct process eventually decides some value. Integrity. A process decides at most lay + receiving-process scheduling delay + once. message-processing time.” Such a value is a Agreement. No two correct processes constant, known by all the system’s sites. decide differently. Most real distributed systems are asyn- Validity. If a process decides a value, chronous in the sense that δ cannot be es- some process proposed that value. tablished.5 So, asynchronous distributed systems are characterized by the absence of The uniform consensus problem is defined such an a priori known bound; message by the previous four properties plus the uni- transfer and process-scheduling delays can- form agreement property: no two processes not be predicted and are considered arbi- decide differently. trary. Asynchronous distributed systems are When the only possible failures are sometimes also called time-free systems. process crashes, the consensus and uniform consensus problems have relatively simple Crash failures solutions in synchronous distributed sys- The underlying communication network tems. Unfortunately, this is not the case in is assumed to be reliable; that is, it doesn’t asynchronous distributed systems, where lose, generate, or garble messages. The fa- the most famous result is a negative one.

42 IEEE SOFTWARE July/August 2001

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. Why Consensus Is Important in Distributed Systems

In asynchronous distributed systems prone to process crash atomic broadcast problem. Among others, it is impossible to failures, Tushar Chandra and Sam Toueg have shown that the design an atomic broadcast protocol in an asynchronous dis- consensus problem and the atomic broadcast problem are tributed system prone to process crash failures without consid- equivalent.1 (The atomic broadcast problem specifies that all ering additional assumptions limiting the system’s asynchro- processes be delivered the same set of messages in the same nous behavior. order. This set includes only messages broadcast by processes Also, in asynchronous distributed systems with limited be- but must include all messages broadcast by correct process.) havior that make the consensus problem solvable, consensus Therefore, you can use any solution for one of these problems acts as a building block on the top of which you can design to solve the other. Suppose we have a solution to the atomic solutions to other agreement or coordination problems (such broadcast problem. To solve the consensus problem, each as nonblocking atomic commitment, leader election, and process “atomically broadcasts” its value and the first deliv- group membership). ered value is considered the decision value. (See Chandra and Toueg’s article1 for information on a transformation from con- Reference sensus to atomic broadcast). So, all theoretical results associ- 1. T.D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Dis- ated with the consensus problem are also applicable to the tributed Systems,” J. ACM, vol. 43, no. 2, Mar. 1996, pp. 245–267.

The FLP result states that it is impossible to asynchronous context. This impossibility design a deterministic protocol solving the result has motivated researchers to discover consensus problem in an asynchronous sys- a set of minimal properties that, when satis- tem even with only a single process crash fied by an asynchronous distributed system, failure6 (see the sidebar, “The Impossibility make the consensus problem solvable. of Consensus in Asynchronous Distributed Systems”). Intuitively, this is due to the im- Failure detection in asynchronous systems possibility of safely distinguishing a very Because message transfer and process- slow process from a crashed process in an scheduling delays are arbitrary and cannot

The Impossibility of Consensus in Asynchronous Distributed Systems

Let’s design a consensus protocol based on the rotating co- This discussion shows that the protocol just sketched does ordinator paradigm. Processes proceed by asynchronous not work. What is more surprising is that it is impossible to de- rounds; in each round, a predetermined process acts as the sign a deterministic consensus protocol in an asynchronous

coordinator. Let pc be the coordinator in round r, where c=(r distributed system even with a single process crash failure. This mod n) + 1. Here’s the protocol’s principle: impossibility result, established by Michael Fischer, Nancy Lynch, and Michael Paterson,1 has motivated researchers to

1. During round r (initially, r = 1), pc tries to impose its value find a set of properties that, when satisfied by an asynchro- as the decision value. nous distributed system, make the consensus problem solvable. 2. The protocol terminates as soon as a decision has been Minimal synchronism,2 partial synchrony,3 and unreliable fail- obtained. ure detectors4 constitute answers to such a challenge. Re- searchers have also investigated randomized protocols to get Let’s examine two scenarios related to round r: nondeterministic solutions.5

1. Process pc has crashed. A noncrashed process pi cannot distinguish this crash from the situation in which pc or its References communication channels are very slow. So, if pi waits for 1. M. Fischer, N. Lynch, and M. Paterson, “Impossibility of Distributed Con-

a decision value from pc, it will wait forever. This violates sensus with One Faulty Process,” J. ACM, vol. 32, no. 2, Apr. 1985, pp. the consensus problem’s termination property. 374–382. 2. Process p has not crashed, but its communication channel to 2. D. Dolev, C. Dwork, and L. Stockmeyer, “On the Minimal Synchronism c Needed for Distributed Consensus,” J. ACM, vol. 34, no. 1, Jan. 1987, pi is fast while its communication channels to pj and pc′ are pp. 77–97.

very slow. Process pi receives the value vc of pc and decides 3. C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the Presence of Partial Synchrony,” J. ACM, vol. 35, no. 2, Apr. 1988, pp. 288–323. accordingly. Processes pj and pc′ , after waiting for pc for a 4. T.D. Chandra and S. Toueg, “Unreliable Failure Detectors for Reliable Dis- long period, suspect that pc has crashed and start round r + tributed Systems,” J. ACM, vol. 43, no. 2, Mar. 1996, pp. 245–267. ′ 1. Let c = ((r + 1) mod n) + 1. During round r + 1, pj and pc′ 5. M. Rabin, “Randomized Byzantine Generals,” Proc. 24th Symp. Foundations ≠ decide on vc′ . If vc vc′, the agreement property is violated. of Computer Science, IEEE Press, Piscataway, N.J., 1983, pp. 403–409.

July/August 2001 IEEE SOFTWARE 43

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. be bounded in asynchronous systems, it’s im- detectors can only have “approximate” ac- possible to differentiate a transaction partic- curacy in purely asynchronous distributed Implementations ipant that is very slow (owing to a very long systems. Despite such approximate imple- scheduling delay) or whose messages are mentations, unreliable failure detectors, sat- of failure very slow (owing to long transfer delays) isfying only weak completeness and even- detectors can from a participant that has crashed. This tual weak accuracy, allow us to solve the have only simple observation shows that detecting consensus problem.7 failures with completeness and accuracy in “approximate” asynchronous distributed systems is impos- Nonblocking atomic commitment accuracy in sible. Completeness means a faulty partici- As we mentioned earlier, the NBAC pant will eventually be detected as faulty, problem consists of ensuring that all correct purely while accuracy means a correct participant participants of a transaction take the same asynchronous will not be considered faulty. decision—namely, to commit or abort the distributed We can, however, equip every site with a transaction. If the decision is Commit, all failure detector that gives the site hints on participants make their updates permanent; systems. other sites it suspects to be faulty. Such fail- if the decision is Abort, no change is made ure detectors function by suspecting partici- to the data (the transaction has no effect). pants that do not answer in a timely fash- The NBAC protocol’s Commit/Abort out- ion. These detectors are inherently unreliable come depends on the votes of participants because they can erroneously suspect a cor- and on failures. More precisely, the solution rect participant or not suspect a faulty one. to the NBAC problem has these properties: In this context, Tushar Chandra and Sam Toueg have refined the completeness and Termination. Every correct participant accuracy properties of failure detection.7 eventually decides. They have shown that you can solve the Integrity. A participant decides at most consensus problem in executions where un- once. reliable failure detectors satisfy some of Uniform agreement. No two partici- these properties: pants decide differently. Validity. A decision value is Commit or Strong completeness. Eventually every Abort. faulty participant is permanently sus- Justification. If a participant decides pected by every correct participant. Commit, all participants have voted Yes. Weak completeness. Eventually every faulty participant is permanently sus- The obligation property pected by some correct participant. The previous properties could be satisfied Weak accuracy. There is a correct par- by a trivial protocol that would always out- ticipant that is never suspected. put Abort. So, we add the obligation prop- Eventual weak accuracy. Eventually erty, which aims to eliminate “unexpected” there is a correct participant that is solutions in which the decision would be in- never suspected. dependent of votes and of failure scenarios. The obligation property stipulates that if all Based on these properties, Chandra and participants voted Yes and everything went Toueg have defined several classes of failure well, the decision must be Commit. “Every- detectors. For example, the class of “even- thing went well” is of course related to fail- tual weak failure detectors” includes all fail- ures. Because failures can be safely detected ure detectors that satisfy weak completeness in synchronous systems, the obligation prop- and eventual weak accuracy. We can use erty for these systems is as follows: time-outs to meet completeness of failure detections because a faulty participant does S-Obligation: If all participants vote Yes not answer. However, as we indicated previ- and there is no failure, the outcome de- ously, accuracy can only be approximate be- cision is Commit. cause, even if most messages are received within some predictable time, an answer not Because failures can only be suspected, yet received does not mean that the sender possibly erroneously, in asynchronous sys- has crashed. So, implementations of failure tems, we must weaken this property for the

44 IEEE SOFTWARE July/August 2001

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. Figure 1. A generic

procedure NBAC (vote, participants) nonblocking atomic begin commitment (NBAC) (1.1) multicast (vote, participants); protocol. (2.1) wait ( (delivery of a vote No from a participant) (2.2) or (∃ q ∈ participants: exception(q) has been notified to p) (2.3) or (from each q ∈ participants: delivery of a vote Yes) (2.4) ); (3.1) case (3.2) a vote No has been delivered → outcome := propose (Abort) (3.3) an exception has been notified → outcome := propose (Abort) (3.4) all votes are Yes → outcome := propose (Commit) (3.5) end case end

problem to be solvable.8 So, for these sys- according to the votes and the exception no- tems we use this obligation property: tifications that p has received, it executes the statement outcome := propose(x) (lines AS-Obligation: If all participants vote 3.1 to 3.5) with Commit as the value of x if Yes and there is no failure suspicion, the it has received a Yes from all participants, outcome decision is Commit. and with Abort in all other cases. We’ll discuss propose in the next section. A generic NBAC protocol Next, we describe a generic protocol9 for Instantiations the NBAC problem. In this protocol, the Now we’ll look at instantiations of the control is distributed: each participant generic protocol for synchronous and asyn- sends its vote to all participants and no chronous distributed systems (see Table 1).9 main coordinator exists.4 Compared to a central coordinator-based protocol, it re- Synchronous systems quires more messages but fewer communi- For these systems, the instantiations are cation phases, thus reducing latency. The Rel_Multicast(v,P), timer expira- protocol is described by the procedure tion, and x. nbac (vote, participants), which each correct participant executes (see Fig- Multicast(v,P). The Rel_Multicast(v,P) ure 1). The procedure uses three generic primitive allows a process to reliably send a statements (multicast, exception, and message m to all processes of P. It has these propose) whose instantiations are specific properties:3,10 to synchronous or to asynchronous distrib- uted systems. Each participant has a vari- Termination. If a correct process multi- able outcome that will locally indicate the casts a message m to P, some correct final decision (Commit or Abort) at the end process of P delivers m (or all processes of the protocol execution. of P are faulty). A participant p first sends its vote to all Validity. If a process p delivers a mes- participants (including itself) by using the sage m, then m has been multicast to a multicast statement (Figure 1, line 1.1). set P and p belongs to P (there is no spu- Then p waits until it has been either rious message).

delivered a No vote from a participant (line 2.1), Table 1 notified of an exception concerning a participant q (line 2.2), or Instantiations of the Generic Protocol for delivered a Yes vote from each partici- Distributed Systems pant (line 2.3). Generic statement Synchronous instantiation Asynchronous instantiation Multicast(v,P) Rel_Multicast(v,P) Multisend(v,P) At line 2.2, the notification exception(q) exception timer expiration failure suspicion concerns q’s failure and will be instantiated propose(x) x Unif_Cons(x) appropriately in each type of system. Finally,

July/August 2001 IEEE SOFTWARE 45

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. Figure 2. A protocol stack. Applications Asynchronous systems Nonblocking atomic commitment For these systems, the instantiations are Consensus Multisend(v,P), failure suspicion, and Unif_Cons(x). These instantiations Point-to-point Failure detection communication “reduce” the NBAC problem to the uniform consensus problem. This means that in asynchronous distributed systems, if we have a solution for the uniform consensus problem, we can use it to solve the NBAC Integrity. A process p delivers a message problem8 (see Figure 2). m at most once (no duplication). Uniform agreement. If any (correct or Multicast(v,P). The primitive Multi- not) process belonging to P delivers a send(m,P) is a syntactical notation that message m, all correct processes of P de- abstracts for each p ∈ P do send(m) liver m. to p end do where send is the usual point-to-point communication primitive. The previous definition is independent Multisend(m,P) is the simplest multicast of the system’s synchrony. In synchronous primitive. It is not fault tolerant: if the distributed systems, the definition of sender crashes after having sent message m Rel_Multicast(v,P) includes the addi- to a subset P′ of P, message m will not be tional property3 of timeliness: there is a time delivered to all processes belonging to P but constant ∆ such that, if the multicast of m is not to P′. In our instantiation, if a partici- initiated at real-time T, no process delivers pant crashes during the execution of Mul- m after T+∆. tisend, it will be suspected by any failure Let f be the maximum number of pro- detector that satisfies the completeness cesses that might crash and δ be the a priori property. known upper bound defined earlier in the section “Distributed systems and failures.” Exception. When the failure detector associ- We can show that ∆ = (f + 1)δ. Özalp ated with participant p suspects (possibly Babao˘glu and Sam Toueg describe several erroneously) that a participant q has message- and time-efficient implementa- crashed, it raises the exception associated tions of Rel_Multicast(m,P).3 So, in with q by setting a local Boolean flag sus- our instantiation, Rel_Multicast en- pected(q) to the value true. If failure sures that if a correct participant is deliv- detectors satisfy the completeness property, ered a vote sent at time T, all correct par- all participants that crashed before sending ticipants will deliver this vote by T + ∆. their vote will be suspected. (The protocol’s termination is based on this observation.) Exception. Because the crash of a participant q in a synchronous system can be safely de- Propose(x). The function propose is in- tected, the exception associated with q is stantiated by any subprotocol solving the raised if q crashed before sending its vote. uniform consensus problem. Let Unif_Cons We implement this by using a single timer be such a protocol; it is executed by all cor- for all possible exceptions (there is one pos- rect participants. 5 sible exception per participant): p sets a timer to δ + ∆ when it sends its vote (with the Rel_Multicast primitive). If the timer expires before a vote from each par- ticipant has been delivered, p can safely con- clude that participants from which votes he study of the NBAC problem have not been delivered have crashed. teaches us two lessons. In the pres- T ence of process crash failures, the Propose(x). In this case, propose is simply first lesson comes from the generic state- the identity function—that is, propose(x) ment propose. In a synchronous system, a = x. So, outcome := propose(x) is correct participant can locally take a glob- trivially instantiated by outcome := x. ally consistent decision when a timer

46 IEEE SOFTWARE July/August 2001

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply. About the Authors

Michel Raynal is a professor of computer science at the University of Rennes, France. He is involved in several projects focusing on the design of large-scale, fault-tolerant distrib- uted operating systems. His research interests are distributed , operating systems, expires. In an asynchronous system, the par- parallelism, and . He received his Doctorat dEtat en Informatique from the Uni- ticipant must cooperate with others to take versity of Rennes. Contact him at IRISA Campus de Beaulieu, 35042 Rennes Cedex, France; this decision. [email protected]. The second lesson comes from failure de- tectors. They let us precisely characterize the set of executions in which we can solve Mukesh Singhal is a full professor of the NBAC problem in purely asynchronous computer and information science at Ohio State University, Columbus. He is also the program director of the Operating Systems and Compilers program at the National Science Foundation. systems. Weak completeness and eventual His research interests include operating systems, database systems, distributed systems, per- weak accuracy delineate the precise frontier formance modeling, mobile computing, and computer security. He has coauthored Advanced beyond which the consensus problem can- Concepts in Operating Systems (McGraw-Hill, 1994) and Readings in 11 Systems (IEEE CS Press, 1993). He received his Bachelor of Engineering in electronics and com- not be solved. munication engineering with high distinction from the University of Roorkee, Roorkee, India, To master the difficulty introduced by the and his PhD in computer science from the University of Maryland, College Park. He is a fellow detection of failures in distributed systems, of the IEEE. Contact him at the Dept. of Computer and Information Science, Ohio State Univ., Columbus, OH 43210; [email protected]; www.cis.ohio-state.edu/~singhal. it’s necessary to understand the few impor- tant notions we’ve presented. These notions should be helpful to researchers and engi- neers to state precise assumptions under which a given problem can be solved. In fact, the behavior of reliable distributed ap- IEEE plications running on asynchronous distrib- uted systems should be predictable in spite of failures. CALL

FOR Articles References 1. P.A. Bernstein, V. Hadzilacos, and N. Goodman, Con- currency Control and Recovery in Database Systems, Software Engineering Addison-Wesley, Reading, Mass., 1987. of Internet Software 2. J.N. Gray, “Notes on Database Operating Systems,” Operating Systems: An Advanced Course, Lecture In less than a decade, the Internet has grown from a little-known back road of nerds Notes in Computer Science, no. 60, Springer-Verlag, Heidelberg, Germany, 1978, pp. 393–481. into a central highway for worldwide commerce, information, and entertainment. This shift 3. Ö. Babaoglu ˘ and S. Toueg, “Non-Blocking Atomic has introduced a new language. We speak of Internet time, Internet software, and the rise Commitment,” Distributed Systems, S. Mullender, ed., and fall of e-business. Essential to all of this is the software that makes the Internet work. ACM Press, New York, 1993, pp. 147–166. From the infrastructure companies that create the tools on which e-business runs to the 4. D. Skeen, “Non-Blocking Commit Protocols,” Proc. ACM SIGMOD Int’l Conf. Management of Data, Web design boutiques that deploy slick Web sites using the latest technology, software lies ACM Press, New York, 1981, pp. 133–142. behind the shop windows, newspapers, and bank notes. 5. F. Cristian, “Understanding Fault-Tolerant Distributed Systems,” Comm. ACM, vol. 34, no. 2, Feb. 1991, pp. How is this new Internet software different than the software created before every- 56–78. thing became e-connected? Are the tools different? Are the designs different? Are the 6. M. Fischer, N. Lynch, and M. Paterson, “Impossibility processes different? And have we forgotten important principles of software engineering of Distributed Consensus with One Faulty Process,” J. in the rush to stake claims in the new webified world? ACM, vol. 32, no. 2, Apr. 1985, pp. 374–382. 7. T.D. Chandra and S. Toueg, “Unreliable Failure Detec- We seek original articles on what it means to do Internet software, in terms that are tors for Reliable Distributed Systems,” J. ACM, vol. 43, useful to the software community at large and emphasizing lessons learned from practical no. 2, Mar. 1996, pp. 245–267. experience. Articles should be 2,800–5,400 words, with each illustration, graph, or table 8. R. Guerraoui, “Revisiting the Relationship between Non-Blocking Atomic Commitment and Consensus,” counting as 200 words. Submissions are peer-reviewed and are subject to editing for Proc. 9th WDAG, Lecture Notes in Computer Science, style, clarity, and space. For detailed author guidelines, see computer.org/software/ no. 972, Springer-Verlag, Heidelberg, Germany, 1995, author.htm or contact [email protected]. pp. 87–100. 9. M. Raynal, “Non-Blocking Atomic Commitment in Distributed Systems: A Tutorial Based on a Generic Pro- tocol,” to be published in Int’l J. Computer Systems Publication: Guest Editors: Science and Eng., vol. 15, no. 2, Mar. 2000, pp. 77–86. Elisabeth Hendrickson, Quality Tree Software, Inc. 10. V. Hadzilacos and S. Toueg, “Reliable Broadcast and March/April 2002 [email protected] Related Problems,” Distributed Systems, 2nd ed., S. Submission deadline: Mullender, ed., ACM Press, New York, 1993, pp. Martin Fowler, Chief Scientist, Thoughtworks 97–145. 15 August 2001 [email protected] 11. T.D. Chandra, V. Hadzilacos, and S. Toueg, “The Weakest Failure Detector for Solving Consensus,” J. ACM, vol. 43, no. 4, July 1996, pp. 685–822.

July/August 2001 IEEE SOFTWARE 47

Authorized licensed use limited to: University of Minnesota. Downloaded on January 28, 2010 at 14:57 from IEEE Xplore. Restrictions apply.