Deadlock Detection in Distributed Systems

SURVEY & TUTORIAL SERIES Deadlock Detection in Distributed Systems Mukesh Singhal Ohio State University distributed system is a network process acquires a resource before access- of sites that exchange informa- <I ing it and relinquishes it after using it. A A tion with each other by message process that requires resources for execu- passing. A site consists of computing and tion cannot proceed until it has acquired all storage facilities and an interface to local is a constant those resources. A set of processes is re- users and to a communication network. A problem, often offsetting source-deadlocked if each process in the primary motivation for using distributed set requests a resource held by another systems is the possibility of resource shar- the advantages of process in the set. ing - a process can request and release In communication deadlocks, messages resources (local or remote) in an order not resource sharing. are the resources for which processes known apriori; aprocess can request some Deadlock handling is wait.' Reception of a message takes a pro- resources while holding others. In such an cess out of wait -that is, unblocks it. A set environment, if the sequence of resource difficult in distributed of processes is communication-dead- allocation to processes is not controlled, a locked if each process in the set is waiting deadlock may occur. systems because no site for a message from another process in the A deadlock occurs when processes hold- has accurate knowledge set and no process in the set ever sends a ing some resources request access to re- message. In this article we limit our discus- sources held by other processes in the same of the system state. sion to resource deadlocks in distributed set. The simplest illustration of a deadlock systems. consists of two processes, each holding a To present the state of the art of dead- different resource in exclusive mode and lock detection in distributed systems, this each requesting an access to resources held article describes a series of deadlock de- by other processes. Unless the deadlock is tection techniques based on centralized, resolved, all the processes involved are and restoring all the relinquished resources hierarchical, and distributed control or- blocked indefinitely. Therefore, a dead- to their original states. In the simplest case, ganizations. The article complements one lock requires the attention of a process a process is aborted by starting it afresh by Knapp, which discusses deadlock de- outside those involved in the deadlock for and relinquishing all the resources it held. tection in distributed database systems.' its detection and resolution. Knapp emphasizes the underlying theo- A deadlock is resolved by aborting one Resource vs. communication dead- retical principles of deadlock detection and or more processes involved in the deadlock lock. Two types of deadlock have been gives an example of each principle. In and granting the released resources to other discussed in the literature: resource dead- contrast, this article examines deadlock processes involved in the deadlock. A lock and communication deadlock. In re- detection in distributed systems more from process is aborted by withdrawing all its source deadlocks, processes make access the point of view of its practical implica- resource requests, restoring its state to an to resources (for example, data objects in tions. It presents an up-to-date and com- appropriate previous state, relinquishing database systems, buffers in store-and- prehensive survey of deadlock detection all the resources it acquired after that state, forward communication networks). A algorithms, discusses their merits and November 1989 0018-9162/89/1100-0037$01.00 0 1989 IEEE 37 drawbacks, and compares their perform- Deadlock handling is complicated in ance (delays as well as message complex- distributed systems because no site has ity). Moreover, this article examines re- accurate knowledge of the current state of lated issues, such as correctness of the the system and because every intersite algorithms, performance of the algorithms, communication involves a finite and un- and deadlock resolution, which require predictable delay. Next, we examine the further research. complexity and practicality of the three deadlock-handling approaches in distrib- Graph-theoretic model of deadlocks. uted systems. The state of a system is in general dynamic; that is, system processes continuously Deadlock prevention. Deadlock pre- acquire and release resources. Characteri- vention is commonly achieved either by zation of deadlocks requires a representa- having a process acquire all the needed tion of the state of process-resource inter- resources simultaneously before it begins actions. The state of process-resource executing or by preempting a process that interactions is modeled by a bipartite di- holds the needed resource. In the former rected graph called a resource allocation Figure 1. Resource allocation graph. method, a process requests (or releases) a graph. Nodes of this graph are processes remote resource by sending a request and resources of a system, and edges of the message (or release message) to the site graph depict assignments or pending re- where the resource is located. This method quests. A pending request is represented Deadlock-handling has the following drawbacks: by a request edge directed from the node of strategies (1) It is inefficient because it decreases a requesting process to the node of the system concurrency. requested resource. A resource assignment The three strategies for handling dead- (2) A set of processes may get dead- is represented by an assignment edge di- locks are deadlock prevention, deadlock locked in the resource-acquiring phase. rected from the node of an assigned re- avoidance, and deadlock detection. In For example, suppose process PI at site SI source to the node of the process assigned. deadlock prevention, resources are granted and process P, at site S, simultaneously For example, Figure 1 shows the resource to requesting processes in such a way that request two resources R, and R, located at allocation graph for two processes PI and a request for a resource never leads to a sites S, and S,, respectively. It may happen P, and two resources RI and R,, where deadlock. The simplest way to prevent a that S, grants R, to PI and S, grants R, to edges RI-+ PI and R,+ P, are assignment deadlock is to acquire all the needed re- P,, resulting in a deadlock. This problem edges and edges P,+ RI and PI-+ R, are sources before a process starts executing. can be handled by forcing processes to request edges. In another method of deadlock prevention, acquire needed resources one by one, but A system is deadlocked if its resource a blocked process releases the resources that approach is highly inefficient and allocation graph contains a directed cycle requested by an active process. impractical. in which each request edge is followed by In deadlock avoidance strategy, a re- (3) In many systems future resource an assignment edge. Since the resource source is granted to a process only if the requirements are unpredictable (not allocation graph of Figure 1 contains a resulting state is safe. (A state is safe if known a priori). directed cycle, processes PI and P, are there is at least one execution sequence deadlocked. A deadlock can be detected by that allows all processes to run to comple- In the latter method, an active process constructing the resource allocation graph tion.) Finally, in deadlock detection strat- forces a blocked process, which holds the and searching it for cycles. egy, resources are granted to a process needed resource, to abort. This method is In a distributed database system without any check. Periodically (or when- inefficient because several processes may (DDBS), the user accesses the data objects ever a request for a resource has to wait) be aborted without any deadlock. of the database by executing transactions. the status of resource allocation and pend- A transaction can be viewed as a process ing requests is examined to determine if a Deadlock avoidance. For deadlock that performs a sequence of reads and set of processes is deadlocked. This exami- avoidance in distributed systems, a re- writes on data objects. The data objects of nation is performed by a deadlock detec- source is granted to a process if the result- a database can be viewed as resources that tion algorithm. If a deadlock is discovered, ing global system state is safe (the global are acquired (by locking) and released (by the system recovers from it by aborting one state includes all the processes and re- unlocking) by transactions. In DDBS lit- or more deadlocked processes. sources of the distributed system). The erature the resource allocation graph is The suitability of a deadlock-handling following problems make deadlock avoid- referred to as a transaction-wait-for (TWF) strategy greatly depends on the applica- ance impractical in distributed systems: graph.3 In a TWF graph, nodes are transaction. Both deadlock prevention and dead- (1) Because every site has to keep track tions and there is adirected edge from node lock avoidance are conservative, overly of the global state of the system, huge TIto node T, if TI is blocked and must wait cautious strategies. They are preferred if storage capacity and extensive communi- for T, to release some data object. A sys- deadlocks are frequent or if the occurrence cation ability are necessary. tem is deadlocked if and only if there is a of a deadlock is highly undesirable. In (2) The process of checking for a safe directedcycle in its TWFgraph. Since both contrast, deadlock detection is a lazy, opti- global state must be mutually exclusive. graphs denote the state of process-resource mistic strategy, which grants a resource to Otherwise, if several sites concurrently interaction, we will collectively refer to a request if the resource is available, hop- perform checks fora safe global state (each them as state graphs.

Deadlock Detection in Distributed Systems

Israel-Palestine Through the Lens of Game Theory

Collusion Constrained Equilibrium

Deadlock: Why Does It Happen? CS 537 Andrea C

The Dining Philosophers Problem Cache Memory

CSC 553 Operating Systems Multiple Processes

Supervision 1: Semaphores, Generalised Producer-Consumer, and Priorities

On Some Winning Strategies for the Iterated Prisoner's Dilemma Or Mr

Order and Containment in Concurrent System Design Copyright Fall 2000

Q1. Multiple Producers and Consumers

Managing Coopetition: Is Transparency a Fallacy Or a Mandatory Condition? a Case Study of a Coopetition Project with an Opportunistic Partner

Deadlock-Free Oblivious Routing for Arbitrary Topologies

Scalable Deadlock Detection for Concurrent Programs