A Highly Safe SelfStabilizing Mutual Exclusion



Algorithm

ILing Yen Farokh B Bastani

Department of Department of Computer Science

A Wells Hall University of Houston

Michigan State University Houston TX

East Lansing MI Email FBastaniuhedu

Email yencpsmsuedu

Abstract

Conventional selfstabilizing cannot b e used for safetycritical systems due to

the p erio d of vulnerability that exists after a transient failure o ccurs till the system stabilizes

In this pap er we consider a highly safe selfstabilizin g system where the vulnerability problem

is tackled The design principles we use to achieve this goal include sobriety test and pro cessor

sp ecialization Sobriety test is used to prevent the system from p erforming incorrect actions

when the system state may b e faulty Sp ecialization disables individual pro cessors from making

faulty moves We have develop ed a selfstabilizi ng mutual exclusion that guarantees

mutual exclusion with a very high probability even in the presence of failures

Keywords Selfstabilization systems mutual exclusion algorithm fault tolerance distributed

computing

Intro duction

The concept of selfstabilizin g systems was rst prop osed by Dijkstra He illustrated the con

cept with a cyclic relaxation algorithm and three mutual exclusion proto cols Over the

past two decades Dijkstras selfstabilizing mutual exclusion algorithm has b een enhanced in

several ways dierent top ologies reduction in state elimination of centralized demon elimi

nation of asymmetry randomized versions and various applications The ma jor

attraction of selfstabilization is the p erceived elegance of eliminating clumsy mechanisms for

detecting illegal states and initiating recovery actions It also do es not require the system



This material is based in part up on work supp orted by NSF under Grant No CCR

to b e initialized a task which is dicult to co ordinate in distributed systems Further the

decentralized control where communication is required only b etween neighb oring pro cessors

greatly reduces the p otential overhead for global control

A ma jor problem that prevents the selfstabilizing approach from b eing used for critical

applications is its vulnerability Essentially the system may go through a vulnerable p erio d

after failures that bring the system into an illegitimate state During this p erio d the system

b ehavior may not fully satisfy the sp ecied requirements In order to deal with this vulnera

bility we classify failures into two typ es ie random and malicious failures In the random

failure mo del a transient failure can bring the system into any illegitimate state with equal

probability In the malicious failure mo del some failed pro cessors may maliciously try to vi

olate the system legitimacy without b eing detected by lo cal checks and subsequently cause

critical damages Generally algorithms with much higher complexity are required to tackle

malicious failures

In this pap er we develop a selfstabilizin g mutual exclusion algorithm that cop es with the

vulnerability problem assuming random failures The system we consider consists of a group

of pro cessors or intelligent devices interconnected via a ring network where each pro cessor

has only a oneway communication channel with the pro cessor immediately to its right Let

P i N denote the N pro cessors in the system We assume that only the simplex

i

communication is allowed from pro cessor P to P The goal is to provide mutually

i

i mo dN

exclusive access to some shared resource for the N pro cessors The required prop erties of

the algorithm include decentralized control where only the states of the nearest neighb ors

P for pro cessor P can b e examined for the global control and fully asynchronous

i

i mo dN

execution of all pro cessors without requiring any centralized daemon pro cess The algorithm

we develop guarantees that with a high probability the system is safe from failures

SelfStabilizing Mutual Exclusion Algorithm

Selfstabilizin g systems naturally achieve decentralized control However due to the decen

tralized control it is very dicult for a selfstabilizing system to deal with the vulnerability

problem When violation of the global requirement do es not directly imply violation of the

lo cal requirement then the system can b e in a state in which no pro cessor can determine by

itself whether the system state is legitimate and hence the system may misb ehave However

algorithms can b e designed such that the system will enter a small set of states that can

make the system misb ehave with a very small probability Also various pro cessors can b e

assigned dierent key tasks and monitor each others b ehavior These new design principles

for selfstabilizing algorithms are discussed in the following

Sobriety test The system must b e designed so that the set of legitimate states is a very

small fraction of the set of all p ossible states This will allow nonfaulty units to detect

illegitimate states and go to a home state that is known to b e safe This also allows rapid

restabilization of the system to a stable op erating mo de

Specialization To prevent a group of faulty units from collectively subverting the safety

of the system the privileged pro cessors in the system are sp ecialized into two or more

categories such that no pro cessor by itself has sucient information or capability to

damage the system To op erate prop erly two or more pro cessors must co op erate with

each other either by sharing information or p o oling their capabilities to eect changes

in the system state Thus this approach guarantees that the failure of a pro cessor will

not result in any unsafe op eration

In our algorithm the sobriety test is implemented by using two metho ds expanding the

set of p ossible states and using public key cryptography In the mutual exclusion algorithm

for a given state of a given pro cessor there is only one value that can b e the new state value

if the system is to continue to b e legitimate On the other hand if the p ossible states of any

pro cessor can b e any integer then the ratio of the numb er of legitimate states to the numb er

of illegitimate states is very small Thus with random failures the probability that the state

values of the current and next states represent a legitimate transition is very small Also due

to the use of public key cryptography the probability of a nonrandomly generated value that

can b e the next state value is also very small The sp ecialization approach can b e realized

by using dierent private keys for the top and b ottom pro cessors A new state value is not

generated solely by the top pro cessor or any other single pro cessor but by the b ottom and

top pro cessors together Each one is sp ecialized by its own private key which is only known

by itself Thus the failure of some pro cessors will not make the system vulnerable

Let P denote the top pro cessor as dened by Dijkstra P denote the b ottom

N

pro cessor and P i N denote the other pro cessors The top pro cessor has

i

knowledge of a private encryption key K while the b ottom pro cessor has knowledge of a

T

private encryption key K The public keys for decryption D and D are known to all

B T B

the pro cessors All the pro cessors have a state variable which is an integer in the co de for

pro cessor P this is referred to as S while R refers to the state of the pro cessor to its right

i

ie P Also we use CR to indicate that the pro cessor is in the critical region The

i mo dN

selfstabilizing algorithm without a vulnerable p erio d is given in the following

Top processor

do

S decr y ptD R CR S encr y ptK R

B T

end do

Other processors

do

S decr y ptD decr y ptD R CR S R

B T

S decr y ptD decr y ptD R S R S R

B T

end do

Bottom processor

do

S decr y ptD R CR S encr y ptK R

T B

S decr y ptD R decr y ptD S R S encr y ptK R

T B B

end do

Functions decr y pt and encr y pt p erform the public key encryption and decryption op erations

Instead of having an increasebyone function for the top pro cessor to generate new state

values as in Dijkstras algorithm we use the encr y pt function Thus the legitimacy of the new

state propagated from the right pro cessor can b e easily veried by decryption using the public

keys Consequently the faulty state of a pro cessor can b e easily detected when referenced by

its left pro cessor and hence will not allow incorrect access to a To prevent a

faulty top pro cessor from arbitrarily generating a stream of tokens we let the b ottom pro cessor

also encrypt the state using its private encryption key K In the case of failure a new value

B

generated by the top pro cessor will not b e eective for entering the critical section unless it has

traversed the entire ring and has b een encrypted by the b ottom pro cessor Here we assume that

K and K have b een chosen such that encr y ptK encr y ptK x do es not generate the

T B B T

same value within N iterations More formally let f x denote encr y ptK encr y ptK x

B T

k

The prop erty k k N f x x should hold

Assume that deco ding of the public key cryptosystem is not p ossible other than by random

trials It can b e formally proved that with a high probability the algorithm guarantees that

no two units will enter the critical section at the same time in spite of nonmalicious pro cessor

failures or arbitrary state transitions Let p denote the probability that an enco ded value

using a private key can b e deco ded with a random selection of a numb er Consider a bit

value We have p Let S denote the system state where

S fS S S S g

N N

and S i N is the lo cal state of pro cessor P We divide the legitimate states of the

i i

system into top privileged states TPS where the top pro cessor holds the privilege bottom

privileged states BPS where the b ottom pro cessor holds the privilege and other legitimate

states O LS where a pro cessor other than the top and b ottom pro cessors holds the privilege

Each of these sets can b e expressed as follows

TPS f S jS decr y ptD S i i N S S g

N B i N

BPS f S jS decr y ptD S i i N S S g

T N i N

OLS f S jS encr y ptK S j j N

N T

i j i N S S i i j S decr y ptD S g

i N i B

These sets of states are referred to in the following correctness pro ofs We will rst show that

from any state the system will converge to a legitimate state BPS if there are no further

failures We then show that from a legitimate state the system will always move to a legitimate

state after a legitimate transition Finally we will show that the system guarantees mutual

exclusion irresp ective of random failures

Theorem A state in BPS will b e reached from any arbitrary initial state

Pro of The pro of follows the convergence stair discussed in with the following intermediate

steps

R ftr ueg

R fS encr y ptK S g

N T

R fS encr y ptK S i i N S S g

N T i N

If the condition given in R is not satised then the top pro cessor will make a move and

R will b e satised Otherwise R is already true Thus from any arbitrary state in R the

system will obviously move to R

Assume that the system state S is in R Let S S S b e divided into k segments

N

where segment i contains S S and S S for all j j l and S S

b b l b b i b b

i i i i i+j i i

In other words a segment contains consecutive pro cessors with the same state value and the

length of segment i is l Assume that S S S contains k such segments We have

i N

for all i We have k N if k then R is true Let v denote the value of S

i b

i

v S Since R is satised the top pro cessor will not make a move unless the sp ecic

N

condition i i k v S is satised Assume that top pro cessor is not making a

i N

is the rst pro cessor move b efore BPS is satised Thus no new value will b e generated S

b

2

will make a move in a limited time since in the second segments with v as its state value P

b

2

v Thus l will increase in a limited time This v v and after the move we have S

b

2

implies l will eventually b e N and hence R is satised

It is p ossible that the top pro cessor will make a move b efore R is satised Since the

current value of S should not have propagated to pro cessors in segment i for i k

N

the probability of having v S is p Hence the probability that the top pro cessor will

i N

k N

make a move b efore BPS is reached is p p And the probability that the

N l

When top pro cessor will make l moves b efore BPS is reached is less than p

N l

l b ecomes large p Thus the top pro cessor will eventually not make any

moves till i i N S S holds In other words the system will converge from

i i

R to R eventually 2

Theorem shows the convergence prop erty of the system where the system is guaranteed to

stabilize to a legitimate state irresp ective of the current state For a bit state value we have

p Assume that the system consists of pro cessors The probability that the system

will converge to a legitimate state in one iteration S is propagated to all pro cessors is

N

N N

p p Note that converging from R to R only

requires a single step

In Theorem the closure prop erty of the algorithm is proved

Theorem If the system is in a legitimate state then each move of the prop osed algorithm

will guarantee that the system remains in a legitimate state if there is no failure

Pro of If the system is in a legitimate state then the system state S is in TPS BPS O LS

S is in TPS then only the top pro cessor will make a move assuming that there is no failure If

The top pro cessor will enter the critical section and set S encr y ptK S Thus the

N T

new system state is in O LS with j N If s is in BPS then the b ottom pro cessor will make

the move by executing S encr y ptK S Thus the new system state will b e in TPS

B

Otherwise the system will b e in O LS Let pro cessor j b e the one that holds the privilege

where j N The new system state will still remain in O LS but with j decreased

by one after setting S S If pro cessor P makes the move then the system state will

j j

satisfy BPS In conclusion if the system state is legitimate then after a legitimate transition

the system will continue to remain in a legitimate state 2

The fairness prop erty of the algorithm is obvious We will omit the formal pro of Es

sentially when the system is in legitimate states the privilege moves circularly from the top

pro cessor down to the b ottom pro cessor iteratively Thus the system satises all the prop erties

required for conventional selfstabilizing systems Finally we need to show that the system is

free of vulnerable p erio d with a high probability In Theorems and we show that our

algorithm guarantees the mutual exclusion prop erty with a high probability

Theorem If the system is in a legitimate state then the mutual exclusion prop erty is

guaranteed

Pro of As with the conventional selfstabilizing mutual exclusion algorithms this algorithm

allows only one privileged pro cessor to b e in its CR when the system is in a legitimate state

Thus the mutual exclusion prop erty is guaranteed 2

Theorem If the system is in an illegitimate state then the probability that mutual exclusion

N N

prop erty is satised is p

Pro of Each pro cessor has to pass its sobriety test to enter the critical section However

passing all lo cal checks do es not imply global correctness The system can b e in an illegitimate

state and have more than one pro cessor entering the critical section since more than one lo cal

condition for entering the critical section is satised However the probability that this o ccurs

is very small Essentially at least one pro cessor has to b e in a faulty state and either currently

or after several moves some lo cal sobriety test b ecomes true illegally

Let PL denote the lo cal condition of pro cessor P that allows P to enter the critical

i i i

section If pro cessor P k N is in a faulty state and causes PL to b ecome

k

k mo dN

true then P will enter the critical section illegally It is also p ossible that b oth S and S

k k k

are faulty and PL is true Further it is p ossible that for all i k i j S is faulty and after

k i

S has b een propagated to S PL b ecomes true this can include the case when S is also

j k k k

faulty In all these cases given any index k S can b e of any value faulty or not and as long

k

as there exists an index j such that S is faulty and S can propagate to S and cause

j j

k mo dN

PL to b e true then the mutual exclusion condition may b e violated Consequently as long

k

as there exists no pair such that the ab ove condition is true then the mutual exclusion can b e

guaranteed Given an arbitrary pair of pro cessors P and P the probability that S can cause

j k j

PL to b e true illegally is p which is the probability that for a given value a randomly selected

k

value happ ens to b e the decrypted value of that given value Thus the probability that no

N N

pair of pro cessors has the matching state values is p Hence the probability that

N N

mutual exclusion prop erty is violated is p 2

Consider N The probability of violation of mutual exclusion using this algorithm is

approximately To increase the reliability we can increase the numb er of bits in the

state variable Thus the system can essentially guarantee mutual exclusion with an arbitrarily

high probability With the increasing numb er of bits in the state variable the system will also

have an increased probability for convergence after a failure in one iteration

Conclusion

The motivation for this pap er was whether it is p ossible to use selfstabilization for safety

critical systems Toward this goal we have develop ed a highly safe selfstabilizing mutual ex

clusion algorithm that guarantees with a high probability that the mutual exclusion prop erty

is satised even in unstable states We used two rules that app ear to have general applicabili ty

namely the sobriety test and pro cessor sp ecialization Even in the presence of faulty states

our algorithm guarantees mutual exclusion with a probability of when a bit state

value is used Our approach can b e extended to provide a general metho d of tolerating failures

in selfstabilizing algorithms for a class of xedp oint computations

Currently we are also investigating a mutual exclusion algorithm that is generalfailure

pro of selfstabilizin g Algorithms with this prop erty will incur a higher cost but can tolerate

Byzantine general typ e of failures

References

A Arora M Gouda and T Herman Comp osite routing proto cols Proc nd IEEE

Symp on Paral lel and Distributed Systems Dallas pp

EW Dijkstra EWD The solution to a cyclic relaxation problem Reprinted

in Selected Writings on Computing A Personal Perspective SpringerVerlag pp

EW Dijkstra Selfstablizing systems in spite of distributed control Comm ACM Vol

No Nov pp

S Ghosh Stabilizing Petri Nets rd IEEE Symp on Paral lel and Distributed Systems

Dec

MG Gouda and NJ Multari Stabilizing communication proto cols IEEE Trans on

Computers Vol No April pp

RL Rivest A Shamir and L Adleman A metho d for obtaining digital signatures and

public cryptosystems Comm of the ACM Vol No Feb pp

M Schneider Selfstabilization ACM Comp Surveys Vol No Mar pp

Y Zhao and FB Bastani A selfadjusting algorithm for Byzantine Agreement Dis

tributed Computing Vol pp