CORE Metadata, citation and similar papers at core.ac.uk
Provided by PolyU Institutional Repository
An Efficient Distributed Mutual Exclusion Algorithm Based on Relative Consensus Voting
Jiannong Cao Jingyang Zhou Daoxu Chen Jie Wu Hong Kong Polytechnic Nanjing University Nanjing University Florida Atlantic University Nanjing, China Nanjing, China University Kowloon, Hong Kong [email protected] [email protected] Boca Raton, USA [email protected] [email protected]
Abstract performance measures are often used to evaluate their performance. They are message complexity, response time Many algorithms for achieving mutual exclusion in and synchronization delay [16]. The message complexity is distributed computing systems have been proposed. The measured in terms of the number of messages exchanged three most often used performance measures are the between the nodes per CS execution. The response time is number of messages exchanged between the nodes per the time interval a request waits for its CS execution to be Critical Section (CS) execution, the response time, and over after its request messages have been sent out. The the synchronization delay. In this paper, we present a new synchronization delay is the time interval between two fully distributed mutual exclusion algorithm. A node successive executions of the CS. The response time and requesting the CS sends out the request message which synchronization delay both reveal how soon a requesting will roam in the network. The message will be forwarded node can enter the CS and are measured in terms of the among the nodes until the requesting node obtains average message propagation delay Tn. enough permissions to decide its order to enter the CS. Distributed mutual exclusion algorithms can be divided The decision is made by using Relative Consensus Voting into two categories: structured and non-structured. (RCV), which is a variation of the well-known Majority Structured algorithms impose some logical topologies, such Consensus Voting (MCV) scheme. Unlike existing as tree, ring and star, on the nodes in the system. These algorithms which determine the node to enter the CS one algorithms usually have good message complexity when the by one, in our algorithm, several nodes can be decided load is “heavy”, i.e., there is always a pending request for and ordered for executing the CS. The synchronization mutual exclusion in the system. For example, Raymond’s delay is minimal. Although the message complexity can tree-based algorithm [12] requires only 4 messages be up to O()N in the worst case in a system with N nodes, exchanged per CS execution at heavy loads. However, these our simulation results show that, on average, the algorithms increase average response time delay as high algorithm needs less number of messages and has less asO()log()N . Meanwhile, the organization and maintenance response time than most of those existing algorithms of the specified topology also lead to a large overload. which do not require a logical topology imposed on the Furthermore, most structured algorithms work well only nodes. This is especially true when the system is under under their specified topologies, and may be inefficient in heavy demand. Another feature of the proposed algorithm some other environments [20]. In this paper, we are is that it does not require the FIFO property of the concerned with non-structured algorithms which are generic underlying message passing mechanism. in the sense that they are suitable for arbitrary network topologies. 1. Introduction For non-structured algorithms, the message complexity can be as low as O( N ) orO()log()N . The response time can Solving the mutual exclusion problem in a distributed be 2T at light loads and N*(T +T ) at heavy loads, where system imposes more challenges than in a centralized n n c Tc is the average CS execution time. But either the system. The mutual exclusion problem states that to enter reduction of the message complexity is achieved at the cost a Critical Section (CS), a process must first obtain the of long synchronization delay or the decrease in response lock for it and ensure that no other processes enter the time is gained at the cost of high message complexity. In same CS at the same time. When competing processes are other words, they either cause high message complexity or distributed on the nodes over a network, how to achieve result in long response time. More importantly, most of the mutual exclusion efficiently still remains a difficult algorithms require the FIFO (First In First Out) property as problem to solve in distributed systems. Over the last two prerequisite for the underlying message passing decades, many algorithms for mutual exclusion in communications. If this property can not be satisfied, extra distributed computing systems have been proposed. Three
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. messages or mechanisms needed to be employed to solve proposed algorithm. Sections 5 and 6 contain the possible deadlock [5]. correctness proof of and performance evaluation of the In this paper, we present a novel non-structured proposed algorithm, respectively. Finally we conclude the algorithm that can solve distributed mutual exclusion paper in Section 7. efficiently and resiliently. A node requesting the CS sends out the request message which will roam in the network. 2. Background and related works The message will be forwarded among the nodes until the requesting node obtains enough permissions to decide its Some of the non-structured algorithms employ a logical order to enter the CS. The decision is made by using token to achieve mutual exclusion [3, 14, 17]. In the token- Relative Consensus Voting (RCV), which is a variation of based algorithms, a unique token is shared among the nodes the well-known Majority Consensus Voting (MCV) and only the node which possesses the token is able to enter scheme [18]. In RCV, the request of a node can be the CS. The most representative algorithm that uses token is granted if it either can eventually obtain the largest broadcast [17]: a requesting node sends token requests to number of permissions against other currently competing all other nodes and the token holder then passes the token to requests, or the node has the smallest id among the the requesting node after it finishing executing the CS or it requesting nodes potentially with the same number of no longer need the token. An optimization on the broadcast permissions. Since nodes are not always required to is that a node only sends its token requests to nodes that collect permissions from the majority of all the nodes in either has the token or is going to get it in near future [14] the system, the number of messages exchanged can be so that the number of messages exchanged per CS reduced. execution can be reduced from N to N/2 on the average at The proposed algorithm requires no pre-configuration light loads, and the response time keeps 2Tn at light loads on the system but only needs to know the total number of and N*(Tn+Tc) at heavy loads. [3] proposes an interesting the network nodes that are involved. It possesses several algorithm where the token contains an ordered list of all other advantages. First, it does not require the FIFO requesting nodes that have been determined the order to property of the underlying message passing mechanism. enter the CS. The messages needed to exchange per CS Even when messages are delivered out of order, there is execution is 3-2/N at heavy load. But when calculating the no impact on the algorithm’s correctness and performance. response time, an extra “request collect time” must be Second, unlike existing algorithms which determine the considered. Another drawback of the algorithm is that it is node to enter the CS one by one, in our algorithm, several not a fully distributed algorithm because at any time, there nodes can be decided and ordered for executing the CS so is an “arbiter” acting as the coordinator in the system. In that the delay time before entering the CS can be reduced. addition, it is difficult for token-enabled algorithms to The algorithm generates a sequence of requesting nodes detect loss of the token and regenerate a new unique one. that describes their order to execute the CS. Each node Although some efforts have been made to tackle this executes the CS directly if it stands on the top the problem [2, 6, 10], solutions always induce extra high sequence or waits for a message from its immediate overloads. preceding node in the sequence informing it to enter the For algorithms without using token, usually several CS, so the synchronization delay is minimal, i.e., T (T is rounds of message exchanges among the nodes are required the average delay of passing a message between two to obtain the permission for a node to enter the CS. nodes). Another advantage introduced by the RCV Lamport’s logical timestamp [7] is often adopted in this scheme is resiliency which is inherited from the MCV. type of non token-based algorithms. Ricart and Agrawala Since the correct operation of the algorithm does not proposed an algorithm [13] as an optimization of Lamport’s depend on any specific node, crash of nodes will not algorithm. In their algorithm, a node grants multiple affect the algorithm’s execution. Although the message permissions to requesting nodes immediately if it is not complexity can be up to O()N in the worst case, our requesting the CS or its own request has lower priority. simulation results show that, on average, the algorithm Otherwise, it defers granting permission until its execution needs less number of messages and has less response time in the CS is over. Only after receiving grants from all other than most of those existing algorithms which do not nodes, can the requesting node enter the CS. Ricart- require a logical topology imposed on the nodes. This is Agrawala’s mutual exclusion algorithm has low delays especially true when the system is under heavy demand. because of parallelism in transfer of messages. The We argue that performance of distributed mutual response time is 2Tn and N*(Tn+Tc) under light load and exclusion algorithms under light load is not as critical as heavy load, respectively. But the number of messages under the heavy loads, because system resources are rich exchanged per CS execution is a constant of 2*(N-1), which under light load, thus algorithms with higher overhead can is quite large. Under light load, the average number of work well. messages can be reduced to N-1 by using a dynamic The remainder of this paper is organized as follows: algorithm [15]. A more recent work described in [8] Section 2 overviews related work. Section 3 describes our reduces message traffic of the Ricart-Agrawala type system model. In Section 4, we present the design of the algorithms to somewhere between N-1 and 2(N-1) by
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. making use of the concurrency of requests and some other It is assumed that each node can issue a request for methods. Nevertheless, the message complexity entering the CS only when there is no outstanding request remainsO()N . issued from the same node. Figure 1 shows the structure of Another type of non-structured algorithms that does not a node. On each node, a MPM (Message Processing Model) need to use token is the quorum-based algorithms. A is deployed. It processes messages cached in the Incoming quorum is a set of nodes associated with each node in the Message Queue of that node and sends messages to other system and every two quorums have a nonempty nodes when necessary. Also, every node maintains a table intersection. The commonality of quorum-based recording the system information (SI). Figure 2 illustrates algorithms lies in that a requesting node can enter the CS the data structure used for SI. It contains three fields: with permissions from only the nodes in its quorum. Incoming Message Queue Obviously, messages needed to be exchanged are decided Server Nodei by the size of the quorum. A well known example is Maekawa’s algorithm [9], where nodes issue permission SI MPM Incoming Message only to one request at a time and a requesting node is only needed to receive permissions from all members of its Message sent out quorum before it is able to enter the CS. In [9], the quorum size is N while in the Rangarajan-Setia-Tripathi Figure 1. Node Structure + algorithm [11], the size is reduced to G 1 N , where G Next indicates which node, if any, will enter the CS 2 G immediately after this node. is the subgroup size. [1] organizes all N nodes to a binary NONL (Node Ordered Node List) is a sequence of tree and a quorum is formed by including all nodes along ordered tuples. A tuple, in the form of < NodeID, TS >, any path that starts from the root of the spanning tree and records the requesting node’s ID and the timestamp at terminates at a leaf. So the quorum size is log(N) in the which moment the corresponding request message was best case and (N+1)/2 in the worst case. However, the firstly initialized. algorithm will degenerate to a centralized algorithm NSIT (Node System Information Table) consists of N because the root node is included in all quorums when it is rows, one for each node in the system (including the always available. node itself). Each row records the information about a As to the response time, it is comparatively high under node known to it, including the ID, the timestamp TS, heavy load in Maekawa’s algorithm because the and a tuple list MNL of that node. MNL is a list of tuples synchronization delay is 2T . Some improvements have n like
A distributed system consists of N nodes that are Figure 2. Figure 3. numbered from N to N . The term node used here refers 0 N-1 Data structure of SI RM initialized by Nodeiii to a process as well as the computer on which the process is executing. There is no shared memory or global clock In the remaining part of the paper, we denote a node and the nodes communicate with each other only through with ID “i” as Ni, and the SI maintained by Ni as SIi. message passing. In this paper, we do not consider fault Only three types of messages are employed in our tolerance issues. We assume that the nodes do not crash proposed algorithm. They are: underlying communication medium is reliable so that the Request Message (RM) messages will not be lost, duplicated. Enter Message (EM)
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. Inform Message (IM) its immediate preceding node “k” in the list will receive an Since a RM message can be forwarded by different IM informing it to update the Next field to “i”. When on top nodes, we call the node that initially sends the message the of the Ordered Node List, Ni will either receive an EM from home node of the message. Each message is associated Nk which just exits the CS or from Nj where its order is with a flag which indicates the type of the message. Data determined. On getting enough permissions to enter the CS, structures contained in messages are similar to that used Ni will first invoke the Exchange procedure to update its SI for SI. As an example, figure 3 shows the data structure with the incoming EM and then enter the CS (Line 14-16). used in RM. “MONL” (Message Ordered Node List) is a Whenever finishing executing the CS, Ni must send an sequence of ordered tuples. “MSIT” (Message System EM to the node represented by Nexti, if any, and delete its Information Table) records the newest system information own tuple from the top of SIi.NONL (Line 17-24). updated during the roaming of the message. In addition, a At times Ni will receive an IM indicating that Nj is the field Host indicates the home node of the message. “UL” next node to it that enters CS. If Ni hasn’t entered the CS or records unvisited nodes’ ID. is currently executing the CS (this can be determined by The EM and IM messages do not have the UL and Host whether tuple is still in SIi.NONL), the only thing left fields. IM messages have a field “Next” recording the id to do reset the value SIi.Next to “j”. Otherwise, which of the node that will enter the CS after the message’s means that Ni has finished executing CS, Ni should generate destination node. an EM with a copy of its SI and send it to Nj immediately (Line 25-32). 4. The algorithm Upon receiving a RM originally initialized in Nj, the MPM in Ni must increase its timestamp and register tuple When a node wants to enter the CS, it initializes a RM
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. 20. If ( SIi . Next<> NULL ) then // send EM message more than one tuple are placed on the same number of informing the next node to enter CS if any MNLs, the tuple with smallest NodeID wins and will be 21. Initialize an EM with newest MONL and MSIT assigned the highest rank (Line 12). Afterwards, the first copy from SIi ; tuple in {TPi} is tested to determine whether it can be 22. Send the EM to the SIi.Next ; ordered (Line 13). 23. SIi.Next = NULL; All ordered tuple will be appended to NONL and 24. End if removed from all MNLs of the NSIT (Line 14, 15). The 25. Upon receiving an IM: boolean variables BeOrdered and Highest_Priority will be (IM indicating next to be node j) set to true if node Ni is ordered (Line 16-19) and is on top 26. If ( tuple which is immediate precedes tuple of the NONL (Line25).
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. First, the information in SIi.NONL and MONL are 17. if there exists a tuple
whose tuple is preceding it’s tuple in the Ordered Node Lemma 1. ᇕi ᇕj, |Si.NSIT[j].MNL| < N (|MNL| is the List have finished executing the CS and can be safely length of MNL) deleted from its own NONL. So, in this procedure, tuples Proof: In the assumed distributed system consisting of N that precede in Ordered Node List also can be nodes, each node contains only one process that makes a deleted (line 3, 4). Otherwise, it will be added to request to mutual exclusively access the CS and each SIi.NONL in the rest of the procedure. Next, SIi.NONL and process initiates at most one outstanding request at any time. MONL are combined if needed and relevant tuples are So, a node will never issue a new request message until it deleted from SIi.NSIT (line 5-12). finishes executing the CS for the previous request. In other Last, SIi.NSIT and MSIT are synchronized (line 13-22). words, there won’t be two different tuples for a node itself If the Nj’s timestamp in MSIT of the message and the in the node’s own NSIT. current node is the same, then no information need to be A node’ knowledge about other nodes is reflected in its exchanged because the message and the current node NSIT. The NSIT stores tuples in the field of MNL, maintain the same state about Nj. Otherwise, information representing who and when sent request message to that update is needed. First, outdated tuples, if any, are deleted node. It is obviously that the tuples come from the from SIi.NSIT and MSIT (line 15-18). Then SIi.NSIT[j] is information carried by an incoming message. The message set as MSIT if its TS is smaller (line 19, 20). copies information from the node where it is generated. The Exchange Procedure When a message (RM, EM or IM) arrives in a node, it will 1. if ∃ ∈ and ∉ and a MONL a SI i .NONL firstly processed by the Exchange procedure, which ensures ∉ that a MNL does not contain any pair of tuples that has the a SIi .NSIT[Host].MNL and same NodeID: if there exist two tuples with the same MSIT [a.NodeID ].TS < SI .NSIT [a.NodeID ].TS then i NodeID but different TS in MSIT and NSIT respectively, the 2. delete tuples which precede a and a from MONL; one with smaller timestamp must be outdated and will be 3. if ∃ ∈ and b∉ MONL and b SI i .NONL deleted in the procedure. As only one tuple survives for any b ∉ MSIT [Host].MNL and other nodes in each entry of the NSIT, there will be at most N tuples in a MNL. MSIT[a.NodeID].TS > SI .NSIT[a.NodeID].TS then i Lemma 2. When every MNL in NSIT is nonempty, at 4. delete tuples which precede a and a from least one node can be ordered after the execution of the SIi.NONL; Order procedure. 5. if Length (MONL) >Length (SIi.NONL) then Proof: During the execution of Order, a sequence of 6. for every tuple c in MONL and not in SIi.NONL tuples regarding their order is generated. The order is 7. delete c from any entry of SIi.NSIT; determined by the number of MNLs of which a tuple stands 8. Set SIi.NONL = MONL; on the top and the value of its NodeID. If every MNL in the 9. else NSIT is nonempty, we then have − M = . At least, N = Sh 0 10. for every tuple c in in SIi.NONL but not in MONL ¦h 1 11. delete c from any entry of MSIT the first tuple in the sequence can be ordered after the 12. end if execution of the Order procedure because either the first 13. for k = 1 to N do tuple in the sequence either holds more first positions or it 14. if SIi.NSIT[k].TS <> MSIT[k].TS then has smaller NodeID value. 15. if there exists a tuple
16. delete
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. t t Proof: Since the RM i (RM i means the request Lemma 4, either AM precedes all tuples in NONL or BM’ message was initialized at node Ni at timestamp t) will not precedes all tuples in MONL. But in the first case, since B1 be sent to a node that it has already been forwarded to, must know AM is ordered before it got ordered, AM is sure to after N-1 times of forwarding and information exchange, exist in NONL. In the second case, A1 must know BM’ is tuple will appear in each MNL of NSIT of the last ordered before it got ordered, then BM’ should exist in visiting node (we assume the node is Nk). It’s obviously MONL. This is obviously contrary to NONL ŀ MONL =∅. that every MNL isn’t empty. By Lemma 2, at least one When NONL ŀ MONL ∅, we consider the minimum k node can be ordered, so its corresponding tuple
k’2 in NONL. By Lemma 5, two different tuples could not A∈MONL, A∉NONLk while another outdated ordered tuple ∅ achieve the same order. Since NONL ŀ MONL= , by B∈ NONLk, B∉MONL and B precedes A. By Lemma 4, A
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. will be appended to NONLk after B and before any other Theorem 3. Starvation is impossible. tuples in NONLk’. It is contradiction to A∉NONLk. Proof: Starvation occurs when one node must wait Thus, tuples in any two different NONLs are ranked in indefinitely to enter its critical section even though other the same order. nodes are entering and exiting their own critical section. Lemma 8. When a node Ni is executing the CS, tuple Assume the contrary, that starvation is possible. In our must stand on the top of NONLi. algorithm, time need to execute the algorithm and CS and Proof: From the algorithm presented above, there are the time for message transfer are all finite. Since a only two cases in which Ni can enter the CS. One is that requesting node’s order to enter the CS is determined after when Ni is ordered at some node Nk after executing the its corresponding RM message has been forwarded at most Order procedure, its corresponding tuple is nicely N-1 times (Lemma 3), the reason that cause a node to be in on the top of NONLk. Since gets Highest_Priority, starvation must be waiting for its preceding nodes infinitely. Nk will send an EM message to Ni informing it to enter the But by Lemma 4 and Lemma 7, the sequence of ordered CS immediately. When Ni receives the EM, it will invoke nodes which precede it is determined once one node gets the Exchange procedure. Since tuple is on the top of ordered. So, the node in starvation will receive an EM MONL of the incoming EM, after executing the Exchange message from its directly preceding node and enter the CS procedure, will stand on the top of NONLi by in finite time. Thus a contradiction occurs and the theorem Lemma 6. must be true. In another case, the order of Ni is determined without Highest_Priority. Only when Ni receives an EM message 6. Performance Evaluation from its directly preceding node Nj can it enter the CS. As mentioned before, there are three measures to Since Nj has finished executing the CS, Nj will delete its evaluate the performance of a mutual exclusion algorithm: corresponding tuple from NONLj so that will stand message complexity, response time and synchronization on the top. When Ni receives the EM, it will delete all delay. The performance of an algorithm depends upon tuples which precede from NONLi. Thus tuple loading conditions of the system and has been usually will be also on the top of NONLi. Theorem 1. Mutual exclusion is achieved. studied under two special loading conditions: light load and Proof: Mutual exclusion is achieved when any pair of heavy load. Under the light load condition, there is seldom nodes is never simultaneously executing the critical more than one request for mutual exclusion simultaneously in the system while under the heavy load condition, there is section. Assuming the contrary that two nodes Ni and Nj always a pending request in a node. We first present an are in the CS at the same time, so tuple and
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. Under the heavy load condition, when a total of m NME50 nodes are competing for the same CS, a node who is 45 granted the privilege must have its ID standing on the top 40 Our Algorithm Maekawa of at least [N/m]+1 MNLs. The minimum times the RM 35 Ricart Broadcast
message is forwarded is [N/m]+2. So the message 30
complexity is calculated by counting how many nodes in 25
the system are competing the CS simultaneously, and the 20 number of competing nodes m is decided by the 15 distribution model that describes how frequently a node 10 requests the CS. 5 6.1.2. Synchronization delay. It is obvious that the 0 0 5 10 15 20 25 30 35 40 45 50 synchronization delay of our algorithm is T where T is N Figure 4. Message nnumberumber Vs nnodeode nnumberumber average message propagation delay, because only one enter message is needed to be passed between two Time500 successive executions of the CS. 450
400 6.1.3. Response Time. Under low load condition, before Our Alogrithm Maekawa 350 a node can enter the CS, its corresponding request Broadcast Ricart 300 message needs to be forwarded N/2 to N-1 times to determine its order and one enter message for entering the 250 200 CS. So the response time will be ([N/2]+2)*Tn to (N- 150 1)*Tn. Under high load condition, if each node will wait for an enter message to enter the CS, the response time is 100 50 sure to be N*(Tn+Tc) under heavy load condition where Tc is the average CS executing time. 0 0 5 10 15 20 25 30 35 40 45N 50 6.2. Simulation Results Figure 5. Response ttimeime Vs nnodeode nnumberumber time delay both increase. Moreover, our algorithm has the Although our algorithm has high message complexity least messages exchanged of the four algorithms while its and long response time in the worst case, it has good average response time is similar to the other three’s. performance in general. In fact, the heaver the system load Afterwards, a system of 30 nodes is simulated with is, the better our algorithm performances. To demonstrate different request arrival rate λ. We run the simulation for that, a simulation is conducted to evaluate our algorithm enough long time (100000 time units) repeatedly and record against several other algorithms including the Maekawa the NME and RT for all successful requests. Figure 6 and which is low in message complexity and the Broadcast, Figure 7 illustrate the performance comparison of the four Ricart which are low in response time. simulated algorithms. As the load increases, messages We adopt a simulation model similar to the one used in needed for per mutual exclusion decreases in our algorithm. [14]: requests for CS execution arrive at a site according λ It is clear that the heaver the system load is, the better our Poisson distribution with parameter ; message algorithm outperforms the Maekawa in average NME. propagation delay between any pair of nodes Tn and CS Although the average response time of our algorithm is a execution time Tc are all constant to be 5 and 10 time units little higher than that of the Broadcast and the Ricart, it is (although this condition is not necessary, we still take it much lower than the Maekawa’s. Since the Broadcast and for ease). The whole system is free of node crash and the Ricart need much more messages exchanged to achieve communication failure. Two measure results are collected, mutual exclusion, our algorithm performances better than viz. number of message exchanged (NME) and response the other three in general. time (RT) per CS execution. Here, we use the first method mentioned in [9] to generate quorums for Maekawa’s 7. Conclusion algorithm. We first consider the situation that all nodes are In this paper, we described an efficient fully distributed requesting the CS simultaneously as soon as the system is algorithm for mutual exclusion. Our algorithm imposes no initialized. Every node only requests once. When the specified structure on the system topology and doesn’t force system is initialized, each node knows nothing about messages to be delivered in FIFO order. These two merits others in our algorithm. So in this situation, we can see make our algorithm more attractive in applications. In how soon and sufficient the system information exchanges. addition, we adopt the RCV scheme to schedule nodes that Figure 4 and 5 plot the average NME and RT against the are intended to execute the critical section. Since the RCV number of nodes in the system. It is shown that when the scheme comes from the famous MCV scheme, the number of nodes increases, the messages exchanged and algorithm gains high resiliency. We have presented the
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply. of 16th International Conference on Distributed NME20 Computing Systems, 1996, Hong Kong, May. 27-30, 19.5 1996, pp. 717 -724. [4] G. Cao, and M. Singhal, “A Delay-Optimal Quorum-Based 19 Mutual Exclusion Algorithm for Distributed Systems”, IEEE Transaction on Parallel and Distributed Systems, Vol. 18.5 12(12), Dec. 2001, pp. 1256-1267. [5] Ye-In Chang, “Notes on Maekawa's O( Ĝ N) Distributed 18 Our Algorithm Mutual Exclusion Algorithm”, In Proceedings of the 5th IEEE Symposium on Parallel and Distributed Processing, Dallas, 17.5 Maekawa USA, Dec. 1-4, 1993, pp. 352-355. [6] M. Choy, “Robust Distributed Mutual Exclusion”, In 17 Proceedings of the 16th International Conference on 1/λ 0 5 10 15 20 25 30 Distributed Computing Systems, Hong Kong, May. 27-30, Figure 6. Message nnumberumber Vs λ 1996, pp. 760 -767. Time [7] L. Lamport, “Time, Clocks and Ordering of Events in Distributed Systems”, Communications of the ACM, Vol. 400 21(7), Jul. 1978, pp. 558-565. [8] S. Lodha, and A. Kshemkalyani, “A Fair Distributed Mutual Our Algorithm Maekawa 380 Exclusion Algorithm”, IEEE Transactions on Parallel and Broadcast Ricart Distributed Systems, Vol. 11(6), June 2000, pp. 537-549. [9] M. Maekawa, “A ĜN Algorithm for Mutual Exclusion in 360 Decentralized Systems”, ACM Transaction on Computer Systems, Vol. 3(2), May. 1985, pp. 145-159. [10] S. Nishio, K. F. Li, and Manning E. G. “A Resilient Mutual 340 Exclusion Algorithm for Computer Networks”, IEEE Transactions on Parallel Distributed Systems, Vol. 1(3), Jul. 320 1990, pp. 344-355. 1/λ 0 5 10 15 20 25 30 [11] S. Rangarajan, S. Setia, and S.K. Tripathi, “A Fault- Figure 7. Response ttimeime Vs λ Tolerant Algorithm for Replicated Data Management”, IEEE Transactions on Parallel and Distributed Systems, proof of correctness of the algorithm, with respect to Vol. 6(12), Dec. 1995, pp. 1271-1282. guaranteed mutual exclusion, deadlock freedom and [12] K. Raymond, “A Tree-based Algorithm for Distributed starvation freedom. Both analysis and simulation are used Mutual Exclusion”, ACM Transactions on Computer Systems, to evaluate the algorithm’s performance. Simulation Vol. 7(1), Feb. 1989, pp. 61-77. results compare our algorithms with some existing [13] G. Ricart, and A. K. Agrawala, “An Optimal Algorithm for algorithms and show that the proposed algorithm Mutual Exclusion in Computer Networks”, Communications outperforms other algorithms especially under high load of the ACM, Vol. 24(1), Jan. 1981, pp. 9-17. condition. [14] M. Singhal, “A Heuristically-Aided Algorithm for Mutual Exclusion in Distributed Systems”, IEEE Transactions on In our future work, we will conduct simulation studies Computers, Vol. 38(5), May. 1989, pp. 651-662. to compare with more existing algorithms. We will also [15] M. Singhal, “A Dynamic Information Structure Mutual investigate how to improve the algorithm by designing Exclusion Algorithm for Distributed Systems”, IEEE different methods for forwarding the request messages. Transactions on Parallel and Distributed Systems, Vol. 3(1), Jan. 1992, pp. 121-125. Acknowledgement [16] M. Singhal, “A Taxonomy of Distributed Mutual Exclusion”, Journal of Parallel and Distributed Computing, Vol. 18, This work is partially supported by Hong Kong 1993, pp. 94-101. Polytechnic University under HK PolyU research grants [17] I. Suzuki, and T. Kasami “A Distributed Mutual Exclusion G-YD63 and G-YY41, and The National 973 Program of Algorithm”, ACM Transactions on Computer Systems, Vol. China under grant 2002CB312002. 3(4), Nov. 1985, pp. 344 - 349. [18] R. H. Thomas, “A Majority Consensus Approach to Concurrency Control for Multiple Copy Databases”, ACM Reference Transactions on Database Systems, Vol. 4(2), Jun. 1979, [1] D. Agrawal, and A. E. Abbadi, “An Efficient and Fault- pp. 180–209. Tolerant Solution for Distributed Mutual Exclusion”, ACM [19] T. Tsuchiya, M. Yamaguchi, and T. Kikuno, “Minimizing Transactions on Computer Systems, Vol. 9(1), FebDŽ 1991, the Maximum Delay for Reaching Consensus in Quorum- pp. 1-20. Based Mutual Exclusion Schemes”, IEEE Transactions on [2] D. Agrawal, and A. E. Abbadi, “A Token-Based Fault- Parallel and Distributed Systems, Vol. 10(4), Apr. 1999, Tolerant Distributed Mutual Exclusion Algorithm”, Journal pp. 337-345. of Parallel and Distributed Computing, Vol. 24(2), 1995, [20] J. E. Walter, J. L. Welch, and N. H. Vaidya, “A Mutual pp. 164-176. Exclusion Algorithm for Ad Hoc Mobile Networks”, [3] S. Banerjee , and P. K. Chrysanthis, “A New Token Passing Wireless Networks, Vol. 7(6), Nov. 2001, pp. 585-600. Distributed Mutual Exclusion Algorithm”, In Proceedings
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)
0-7695-2132-0/04/$17.00 (C) 2004 IEEE Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on March 4, 2009 at 01:31 from IEEE Xplore. Restrictions apply.