<<

Pattern-Based Modeling and Analysis of Failsafe Fault-Tolerance in UML∗

Ali Ebnenasir† Betty H.C. Cheng Department of Science Computer Science and Engineering Department Michigan Technological University Michigan State University Houghton MI 49931 U.S.A. East Lansing MI 48824 U.S.A. [email protected] [email protected]

Abstract would potentially crosscut all components of the concep- In order to facilitate incremental modeling and analysis tual model of an existing system. As such, developers need of fault-tolerant embedded systems, we introduce an object rigorous techniques and reusable artifacts for adding fault- analysis pattern, called the detector pattern, that provides tolerance concerns to conceptual models. This paper in- a reusable strategy for capturing the requirements of fail- troduces a pattern-based approach for adding failsafe fault- safe fault-tolerance in an existing conceptual model, where tolerance to an existing conceptual model (in UML [7]), a failsafe system satisfies its safety requirements even when where a failsafe fault-tolerant system is expected to meet faults occur. We also present a method that (i) uses the de- its safety requirements even when faults occur. tector pattern to help create a behavioral model of a fail- Numerous approaches exist to support fault-tolerance safe fault-tolerant system in UML, (ii) generates and model [1,5,14,30,32,33] and to analyze system safety in the pres- checks formal models of UML state diagrams of the fault- ence of failures [6, 23, 27], most of which (i) rely on a spe- tolerant system, and (iii) visualizes the model checking re- cific fault-tolerance /implementation mechanism; (ii) sults in terms of the UML diagrams to facilitate model re- do not emphasize on reuse in analysis phases, and (iii) lack finement. We demonstrate our analysis method in the con- sufficient support for formal analysis of the potential inter- text of an industrial automotive application. ferences between fault-tolerance and functional concerns. For example, Saridakis [32] presents a set of design patterns Keywords: Requirements Analysis, Fault-Tolerance, based on existing recovery mechanisms [14,30]. Bondavalli Formal Methods, Detector, UML et al. [6] translate UML structural and behavioral diagrams to Stochastic Petri Nets in order to provide quantitative pre- 1 Introduction dictions for system dependability/reliability. Leveson and The complexity of developing fault-tolerant systems is, Stolzy [27] present a formal framework based on Timed in part, due to the crosscutting and evolving nature of fault- Petri Nets to model and analyze fault-tolerance in real-time tolerance requirements. Since it is difficult to anticipate all systems. The UML profile for fault-tolerance [1] and sev- types of faults2 at early stages of development, it is better to eral aspect-oriented approaches [18, 34] use of have techniques that enable existing analysis artifacts to be services [33] to mask faults, which is sometimes impracti- revised once a new type of fault is detected. Such a revision cal and costly [2]. Approaches based on safety cases [23] present a framework for goal-based failure analysis and sys- ∗This work was partially sponsored by NSF grants EIA-0000433, EIA- tematic specification of lessons and recommendations for 0130724, CDA-9700732, CCR-9901017, CNS-0551622, CCF-0541131, post-failure corrections of safety-critical systems. Further- NSF CAREER CCR-0092724, ONR grant N00014-011-0744, DARPA Grant OSURS01-C-1901, Siemens Corporate Research, a grant from the more, some existing analysis methods for fault-tolerance Michigan State University’s Quality Fund, and a grant from Michigan [22, 31] assume that a specific fault-tolerance design mech- Technological University. † anism (e.g., , redundancy) will be used The work presented here was performed largely while this author was and specify analysis requirements within those design con- a postdoctoral researcher at Michigan State University with support from NSF and ONR. straints, which may overly constrain and preclude useful 1Email: [email protected], [email protected] fault-tolerance solutions. (For example, it is difficult to Tel: +1-906-487-4372, Fax: +1-906-487-2283 specify and model self-stabilization [9] solely based on ex- ception handling.) While all aforementioned approaches 2A fault-type is a representation of the hypothesized cause of an error. An error is characterized by a (set of) state(s) from where failures may present useful and important techniques for modeling and occur. A failure is a behavioral deviation from system specifications [4]. analyzing fault-tolerance concerns, three important features distinguish our approach from existing work: (1) provid- framework [28] to generate formal specifications of faults, ing reusable modeling artifacts for incremental modeling of fault-tolerance and functional concerns in the Promela mod- fault-tolerance; (2) ensuring the fault-tolerance of the mod- eling language [21]. Subsequently, we use the Spin model eling elements added for fault-tolerance purposes, and (3) checker [21] to detect inconsistencies between the detec- facilitating automated reasoning (coupled with visualiza- tor pattern instances and the functional model of the FIS. tion) in analyzing the mutual impact of fault-tolerance and The automated analysis with the Spin model checker cou- functional concerns. pled with a new visualization tool, called Theseus [20], that We introduce an object analysis pattern, called the de- animates counterexample traces enables a roundtrip engi- tector pattern, that provides a reusable strategy for elicit- neering process for modeling and analyzing failsafe FTSs. ing and specifying the requirements of error detection in We demonstrate our approach by modeling and analyz- UML object models for embedded systems. Our focus ing an adaptive cruise control (ACC) system in UML. We on capturing error detection requirements is an extension have also validated our approach for several other examples of Arora and Kulkarni’s [3] results where they show that from industry [12], including a diesel filter system for re- detection is necessary and sufficient for developing a rich ducing soot from diesel truck exhaust. The remainder of class of failsafe fault-tolerant systems. As such, the de- this paper is organized as follows. Section 2 introduces an tector pattern is intended to provide a generic artifact in approach to modeling faults in UML state diagrams. Sec- model-driven development of failsafe systems. Object anal- tion 3 discusses the relation between failsafe fault-tolerance ysis patterns apply a similar approach to that used by de- and error detection. Section 4 presents the detector pattern. sign patterns [19], but instead of focusing on design they Section 5 focuses on formal analysis of UML models of address the construction of the conceptual model of a sys- FTSs using Spin [21]. Finally, Section 6 gives concluding tem [10]. Patterns for the analysis stage of devel- remarks and discusses future work. opment are not new (see [17,24]). For example, Fowler [17] 2 Modeling presents a method for characterizing recurring ideas in busi- In this section, we present the basic concepts of model- ness modeling as reusable analysis patterns. Konrad et al. ing FISs, faults, and failsafe fault-tolerance in UML. We re- [24] present domain-specific object analysis patterns for an- iterate the definitions of faults and fault-tolerance from [2]. alyzing the conceptual models of embedded systems. The The motivation behind using UML is two-fold. First, UML proposed detector pattern in this paper provides a reusable is the de facto standard for object-oriented modeling. Sec- building block for the construction of the conceptual models ond, UML state diagrams enable us to capture any form of failsafe systems. of fault-tolerance that can be expressed in a state machine- Our pattern-based method comprises fault modeling, based formalism. fault-tolerance modeling, and automated analysis of the UML Models. We use conventional UML nota- UML models of fault-tolerant embedded systems. Specif- tions [7] to represent the UML-based conceptual models of ically, to construct the UML model of a Fault-Tolerant FISs (respectively, FTSs). Since our focus is on modeling System (FTS), we start with the UML model of its fault- and analyzing fault-tolerant embedded systems, we follow intolerant version, where a Fault-Intolerant System (FIS) Douglass [10] in using UML class diagrams to model struc- meets its functional requirements in the absence of faults tural constraints of (software and hardware components of) (i.e., when no faults occur) and provides no guarantees in embedded systems during the object analysis phase. We use the presence of faults (i.e., when faults occur). Then we state diagrams to capture high-level behavioral information model faults in the UML model of the FIS to produce a of UML object models. The combination of class and state model with faults. We use the notion of state perturbation to diagrams yields an object analysis model. model different types of faults in UML state diagrams [2,9]. Depending on the semantics of object interactions, the Next, we specify error states that are reached due to the oc- complexity of automatic analysis of a FTS varies from poly- currence of faults. Subsequently, we add instances of the nomial (in a shared memory model [25]) to undecidable (in detector pattern to the model with faults to capture the re- an asynchronous message-passing model [16]). To facilitate quirements of error detection and to generate a candidate an automated analysis method with a manageable complex- UML model of a failsafe FTS. To create a valid UML model ity, in this paper, we consider a high atomicity model where of the failsafe FTS, we have to ensure that the candidate transitions of UML state diagrams are executed atomically, UML model is interference-free. That is, in the absence and any instance of message passing between two objects of faults, the candidate model meets all functional require- takes place in an atomic step. Another motivation be- ments of the FIS, and in the presence of faults, the candidate hind this assumption is that modeling fault-tolerance in a model at least meets the safety requirements of the FIS and high atomicity model provides an impossibility test in early the instances of the detector pattern. To ensure interference- stages of development (which could potentially reduce de- freedom, we extend McUmber and Cheng’s formalization velopment costs). That is, if a conceptual model of an FTS cannot be derived from the conceptual model of its fault- meets the functional requirements of M is a formal repre- intolerant version in the high atomicity model, then it would sentation of a functional scenario. be impossible to derive a model of the FTS in a lower atom- Running Example: Adaptive Cruise Control (ACC). icity level.3 The ACC system comprises a standard cruise control sys- Underlying Computational Model. In an UML ob- tem and a radar system to control the distance between the ject model M = O1, ···,On, where each Oi (1 ≤ i ≤ n) car and the front vehicle, called the target vehicle.The is an object, we denote the state transition diagram of each ACC system has different modes of operation (see Figure object Oi by SDi = ,whereSi is the set of 1), namely closing, coasting and matching. When the radar states in the state diagram SDi and δi denotes the set of detects a target vehicle, the ACC system enters the closing transitions of SDi. A state of an object Oi is a valuation of mode. In the closing mode, the goal is to control the way its state variables (i.e., attributes). A transition of an object that the car approaches the target vehicle, and to keep the Oi is of the form (a, evt[grd]/act, b),bywhichOi transi- car in a fixed trail distance from the target vehicle with a tions from state a to state b if a triggering event evt occurs zero relative speed. The trail distance is the distance that and a condition grd holds. During such a transition, Oi ex- the target vehicle travels in a fixed amount of time (e.g., 2 ecutes an action act.Aglobal state predicate is a Boolean seconds). The distance to the target vehicle must not be less expression defined over a set of states of multiple objects. A than a safety zone, which is 90% of the trail distance.The local state predicate is a Boolean expression specified over ACC system calculates a coasting distance that is the dis- the set of states of only one object Oi (i.e., Si). A computa- tance at which the car should start decelerating in order to tion of an object Oj (1 ≤ j ≤ n) is a sequence s0,s1, ··· achieve the trail distance; i.e., the car enters the coasting such that ∀si : i ≥ 0:(si ∈ Sj) ∧ (si,si+1) ∈ δj.Acom- zone. When the car reaches the trail distance, the relative putation of an UML object model M = O1, ···,On is a speed of the car should be zero; i.e., the ACC system is in sequence of states s0,s1, ···,where∀si : i ≥ 0:(∃Oj : the matching mode. The safety requirements of the ACC 1 ≤ j ≤ n :(si ∈ Sj) ∧ ((si,si+1) ∈ δj)). system state that the car is never in the safety zone and it Modeling Functional Requirements. We consider will never accelerate in the coasting or matching modes. the set of functional requirements as a conjunction of a set of safety requirements and a set of liveness requirements. Intuitively, safety requirements stipulate that nothing bad ever happens, and the liveness requirements specify that something good will eventually occur in a finite amount of time. For example, in a cruise control system, the ac- tual speed of the car must not exceed 1% of the desired speed set by the driver (i.e., safety), and when the driver applies the brakes, the cruise control system will eventu- ally be deactivated (i.e., liveness). We represent safety re- Figure 1. The adaptive cruise control system. quirements by a set of transitions, say B, that must not oc- cur in the computations of any object. We do not explic- 2.1 Modeling Faults in State Diagrams itly specify liveness requirements, instead, since we want to derive a model of an FTS from a valid model of its fault- We systematically model a fault-type as a set of transi- intolerant version, we stipulate that during such transforma- tions in UML state diagrams (see Figure 2). Representing tions no deadlock states (states with no outgoing transitions) faults as a set of transitions has already appeared in previ- should be introduced in the absence of faults. The dead- ous work [2, 9], and it is known that state perturbation is lock freedom requirement captures the fact that, in the ab- sufficiently expressive to represent different types of faults, sence of faults, fault-tolerant embedded systems have non- such as crash, input-corruption and Byzantine, from differ- terminating computations and always react to their environ- ent behavioral categories; i.e., transient, intermittent, per- manent [2, 9]. To model a fault-type f in a UML model ment. We say a computation s0,s1, ··· meets safety re- M = O1, ···,On, we model the effect of f on the state quirements iff (if and only if) ∀i : i ≥ 0:(si,si+1) ∈B/ . diagram SDi of each object Oi by introducing a new set of A computation σ = s0,s1, ··· meets functional require- 1 ≤ ≤ ments if and only if σ meets safety requirements and does transitions in SDi denoted fi,for i n (see dashed not deadlock. A computation of an UML model M that arrows in Figure 2). We denote the set of transitions of SDi in the presence of faults fi by δi ∪ fi. An object Oi does 3 A theoretical investigation of this claim can be found in [26]. Also, in not have control over the execution of faults fi, whereas the cases such atomicity assumptions do not hold (e.g., distributed systems), execution of regular transitions is controlled by the thread one can use existing fault-tolerance-preserving refinements (e.g., [8]) to generate a refined model from a high atomicity model developed using our of execution in Oi (see solid arrows in Figure 2). Model- proposed approach. ing a fault-type f in all state diagrams of the UML model M creates a model with faults f.Inacomputation with Radar object samples the speed and the distance of the car faults s0,s1, ···, there exists a fault transition (si,si+1), to the target vehicle. Figure 3 illustrates an excerpted state for some i ≥ 0. A computation with faults is a formal rep- diagram of the Car object in the ACC system. The Car ob- resentation of a scenario with faults. ject continuously compares the real speed of the car, saved Fault-Span in the variable RealV, with respect to the setpoint,whichis ErrorState_1 ErrorState_2 the desired speed determined by the driver. If the setpoint is less than the real speed of the car, then the Car object tran- Normal_States sitions to the Decelerating state. If the setpoint is greater

State1 State2 than the real speed of the car, then the Car object transi- tions to the Accelerating state. The ACC system is subject ErrorState_5 to a transient fault-type fACC that causes the Car object to transition to the Accelerating state non-deterministically. State4 State3 Hence, we model the effect of fACC on the Car object as a set of transitions fcar that perturb the state of Car to the Accelerating state. In this case, the fault-type fcar does not ErrorState_3 ErrorState_4 introduce any new states. Excerpted state machine of the car object Legend: [RealV = setpoint]/ Object transitions [RealV > setpoint]/ Transitions of fault-type Object transitions fi in the fault-span [RealV = setpoint]/ CalculatingRealV [RealV < setpoint]/ Figure 2. Modeling faults in UML state diagrams. fcar [RealV = setpoint]/ When modeling a fault type fi in a state diagram SDi of fcar the UML model of the FIS, modelers should identify the ef- Accelerating fcar Decelerating fect of fi on the behavior of Oi. In the absence of faults, an object Oi of the FIS is in a set of normal states from where Legend: RealV: Real speed it meets its functional requirements. When fi occurs, Oi Event[guard]/action Event[guard]/action of car State Setpoint: desired may reach error states that are outside of the set of normal Fault transition System transition speed set by driver states. The set of states reachable from the normal states by a combination of fault and regular transitions is the fault- Figure 3. Modeling faults in the state transition diagram span of Oi for fault fi , denoted fi-span of Oi [25]. For ex- of the car. ample, in Figure 2, all error states are only reachable when faults occur. Thus, modeling faults in a state diagram may 2.2 Failsafe Fault-Tolerance introduce new states and transitions in that state diagram. A computation of an object that starts in its fault-span out- Intuitively, failsafe fault-tolerance requires that nothing side the set of normal states may lead to a failure scenario bad ever happens even in the presence of faults [2]. For a in which it may (i) violate safety requirements, (ii) fall into fault-type f and a UML model M of an FIS that is subject non-progress cycles, or (iii) reach a deadlock state. For ex- to f, we want to derive a UML model M from M such that ample, in Figure 2, if the object is in State1 then the faults M satisfies the following conditions: (1) in the absence fi may non-deterministically transition to ErrorState 1 from of f, the set of computations of M is non-empty and is a where the object may either be trapped in a non-progress cy- subset of the set of computations of M, and (2) all compu- cle (comprising ErrorState 1 and ErrorState 2)orbedead- tations of M , including the set of computations with faults locked in ErrorState 5. Since manual identification of fi- f, meet safety requirements. span is a tedious task, we use our previously develop Fault- Tolerance Synthesizer (FTSyn) [13] to automatically gener- 3 Necessity and Sufficiency of Detection ate the fault-span. In this section, we intuitively explain the relation be- ACC Example: The object model of the ACC system in- tween error detection in the presence of faults and fail- cludes three main objects, namely Control, Car and Radar safe fault-tolerance. Specifically, to preserve safety require- (see Figure 5). (We use Sans Serif font to denote sys- ments even if faults occur, a failsafe fault-tolerant system tem variables, objects and states.) The Control object cap- should ensure that none of its computations would reach a tures the controlling activities that set the mode of the ACC state from where faults may directly violate safety. More- system. The Car object models the engine management over, after faults perturb the state of a failsafe system out- functionalities such as acceleration and deceleration. The side its set of normal states, the system must not execute transitions that violate safety. To enable such functionali- conjunctive predicates. Moreover, since embedded systems ties, a failsafe system should be able to detect its current frequently detect the conditions of their underlying physical state and take necessary actions that will not lead to the system, the corresponding conditions remain stable relative violation of safety requirements. Hence, we define two to the processing speed of the embedded computing system categories of states that should be detected by a failsafe until the embedded system detects them and takes necessary system, namely fault-unsafe and at-risk states. The set of actions. Hence, in this section, we focus on stable global fault-unsafe states, represented by a state predicate Sunsafe, predicates that once hold remain true until detected. captures the set of states in the fault-span from where a ACC Example: In order to preserve safety requirements sequence of fault transitions alone may violate safety re- in the presence of fault fACC, the ACC system should detect quirements. A failsafe FTS must never reach a state in if it is in the coasting or matching mode before accelerating Sunsafe. In the set of at-risk states, denoted as a state pred- the car. The corresponding detection predicate in the ACC icate Sat−risk, a FIS itself may execute actions that violate system is the predicate XACC ≡ Xcar ∧ Xcontrol,where safety requirements. (FTSyn [13] automatically identifies Xcar ≡ (Car is in the accelerating state) and Xcontrol ≡ these state predicates.) For example, in the ACC system, a ((Control is in the coasting mode) ∨ (Control is in the global state where the control is in the coasting mode and matching mode)). the car is in the Accelerating state is an at-risk state since Detector Elements (Participants). A detector element di, the car may accelerate and violate the requirement of no ac- 1 ≤ i ≤ n, captures the detection of a local state predicate celeration while coasting. (A soundness proof of the above Xi in a functional object Oi. Each detector element di,for argument about necessity and sufficiency of detection for 1 ≤ i ≤ n, is indeed a participant of the detector pattern 4 failsafe fault-tolerance can be found in [3, 25].) and has its own detection predicate Xi. 4 Detector Pattern Distinguished Element. An element dindex (1 ≤ index ≤ n) that finalizes the detection of X based on the detection In this section, we introduce the detector pattern that we of X1, ···,Xn is called the distinguished element. use to augment the FIS conceptual model to derive a con- Structure. The choice of the structure of the detector pat- ceptual model of a failsafe FTS while preserving the safety tern depends on the inter-object associations in a UML ob- and liveness requirements of the FIS in the absence of faults. ject model. For example, if one needs to detect a global In order to facilitate its use, we define a template for the de- predicate over a set of functional objects that are associated tector pattern based on the fields used in the design patterns with each other in a tree-like structure, then it is appropriate presented by Gamma et al. [19], with modifications to re- to use an instance of the detector pattern with a hierarchical flect analysis-level information. For example, we do not use (parallel) structure. Due to space constraints, we omit the the Implementation and Sample Code fields. The Structure presentation of sequential and compositional detector pat- field captures structural constraints of the detector pattern terns (see [12] for details). In a parallel detector (see Figure represented by UML class diagrams. The detector pattern 4), the detection of X can be done in parallel, where all also includes several new fields that are added for the pur- elements di, 1 ≤ i ≤ n, detect their detection predicates pose of specifying and analyzing fault-tolerance concerns. concurrently. The shadowed objects represent the elements For example, the detector pattern includes the Detection Re- of the detector pattern encapsulated in a dashed box that de- quirements field that specifies a set of requirements that notes an instance of the detector pattern. The distinguished must be met by the resulting UML model to guarantee that element of the detector pattern is depicted by the dark shad- the detection occurs correctly. Next, we describe the fields ing. Each detector participant di is associated with an object of the detector pattern. Example application of each field to Oi (1 ≤ i ≤ n). In Figure 4, the distinguished element is the ACC system is denoted in italics. associated with all participants di. Detection Predicate. In a UML model M = ACC Example: Figure 5 depicts an excerpted class dia- O1, ···,On, a detection predicate is a global/local state gram of the ACC system in which an instance of the parallel predicate. In a distributed system, it is difficult for an ob- detector pattern has been instantiated. Such an instanti- ject to detect a global detection predicate X in an atomic ation is performed manually as a modeling activity. The step [29]. Thus, we decompose X into a set of local pred- instance of the detector pattern applied to the ACC system icates X1, ···,Xn, and specify the detection of X based comprises two elements dcontrol and dcar modeled as two on the detection of X1, ···,Xn, where each Xi is a local new objects in the class diagram of the ACC system. The state predicate specified for object Oi. Since the state pred- icate Sunsafe captures the local unsafe states of each ob- 4In the design and implementation phases, the detector elements may ject, Sunsafe is often specified as a conjunction of a set of be realized as independent software/hardware components that execute concurrently with other components of an embedded system. We con- local state predicates. For the same reason, Sat−risk most jecture that any additional execution overhead on system performance in- often has a conjunctive form. Hence, we limit the scope curred by adding detector elements would not be worse than the use of of the application of the detector pattern to the detection of conventional redundancy mechanisms. element dcontrol is responsible for detecting Xcontrol and 3,where3Y means that the state predicate Y eventually 5 the element dcar should detect Xcar. holds, we specify progress as 2(X ⇒ 3Z). Witness Predicate. Since we decompose the global de- Behavior. The state diagram of each detector element di in tection predicate X into a set of local detection predicates Figure 4 is composed with the state diagram of each object X1, ···,Xn, we should specify what implies the truth value Oi in a concurrent . The state machine of a detec- of X. Towards this end, we introduce the notion of a witness tor element di monitors Xi by reading the state of Oi.The predicate Z that is a local condition belonging to the distin- distinguished element can witness if all the detector partic- guished element, which implies that the detection is com- ipants di have already witnessed their detection predicates. plete by just communicating with detector elements. We distinguishedElement queries queries also consider a witness predicate Zi for each element di. We say di witnesses Xi iff Zi is true. The distinguished index i element d sets the value of Z to true if all d have wit- d_control d_car nessed their detection predicates. Distinguished element detects X if all detectors witness detectCoasting Detect Acceleration An instance of the Radar parallel detector distinguished element Control Car queries queries 1 1 1 1 1 1 queries d_1 d_i d_n Figure 5. Composition of a parallel detector pattern with ...... the ACC system.

1 1 1 ACC Example: Before accelerating, the Car object com- detectX_1 detectX_i detectX_n municates with the distinguished element to check whether 1 1 1 the witness predicate ZACC holds (see Figure 6). Even O_1 . . . O_i . . . O_n though in the case of the ACC system, such detection can also be done by calling a method of the Control object to

Partial System Model check its mode, the use of the detector pattern separates the concerns of fault-tolerance from functional concerns and Figure 4. The structure of the parallel detector. ACC Example: enables developers to easily trace and reason about fault- The distinguished element should set ZACC (i.e., the wit- tolerance. Moreover, in cases where more than two ob- ness predicate) to true if the elements dcontrol and dcar wit- jects are involved in the detection of a global predicate, it is ness their detection predicates Xcontrol and Xcar, thus in- difficult to use functional method calls for providing fault- dicating that the global detection predicate XACC has be- tolerance. come true. Such a detection functionality enables the FTS Excerpted state machine to accelerate only if XACC does not hold. of the car object Detection Requirements. In order to ensure that the de- [RealV = setpoint]/ Check with the [RealV > setpoint]/ tection occurs correctly, the detector pattern should meet the distinguished element following requirements (adapted from [3]): (1) Safeness. It before acceleration [RealV = setpoint]/ CalculatingRealV is never the case that the witness predicate Z is true when the detection predicate X is false; i.e., the detector pattern [RealV < setpoint]/ f f car [RealV = never lies. (2) Progress. It is always the case that if X is car setpoint]/ true then Z will eventually hold. (3) Stability. It is always if (~Z_ACC()) then Accelerate f the case that once Z becomes true, it will remain true as car Decelerating long as predicate X is true (i.e., Z remains stable). Each Legend: ~ denotes negation participant di should also meet the above requirements for Zi and Xi. The safeness and stability are safety requirements that Figure 6. The excerpted state diagram of the Car after can be specified in Linear Temporal Logic (LTL) [15] using applying the parallel detector pattern. (i) the universal operator 2,where2Y means that the state predicate Y always holds, and (ii) the next state operator Failsafe fault-tolerance of the Detector pattern. Since ,where Y means that in the next state Y holds. We the instances of the detector pattern are also subject to 2( ⇒ ) respectively specify safeness and stability as Z X 5Note that the detection requirements can also be represented in terms and 2(Z ⇒ ( (Z ∨¬X))). Using the eventuality operator of Dwyer’s specification patterns [11]. faults, we ensure that the detector pattern is itself failsafe analyzed. fault-tolerant. To this end, we verify that the composition We use Spin to simulate and verify the Promela specifi- of the UML model M of an FIS, and an instance of the de- cations generated by Hydra. For example, while verifying tector pattern meet the following requirements: (1) In the the UML model of a FTS against detection requirements, absence of faults, the functional requirements of M are met we may find counterexamples that represent the inconsis- (i.e., safety is not violated and no deadlock is reached), and tencies of the detector pattern and the functional objects. (2) In the presence of faults, the safeness and the stabil- To analyze such inconsistencies, we use Theseus [20] to vi- ity of the detector pattern and the safety requirements of sualize each step of the Spin simulation in UML state dia- M are satisfied. Note that when faults occur, the only re- grams. Such a visualization of counterexamples facilitates quirement for a failsafe FTS is to preserve its safety require- the analysis and refinement of the UML models. ments. Thus, the detector pattern need not meet its progress ACC Example: In the UML model of the ACC system, requirement when faults occur. For example, in the ACC we verified the safeness of the detector pattern to ensure example, if XACC holds but ZACC never becomes true, that the detector pattern is itself failsafe fdetector-tolerant, then safety requirements are still met since the Car object where fdetector represents the effect of fACC on dcar,which never accelerates if ZACC is false. However, the liveness of is the resetting of Zcar. We specified the safeness require- the ACC system may be violated. ments as the LTL property 2(Zcar ⇒ Xcar); i.e., it is al- Remark. While the ACC example is small, the number of ways the case that if dcar has witnessed then the Car ob- the elements of an instance of the detector pattern cannot go ject is in the Accelerating state. A sample counterexam- beyond the number of system components. Moreover, even ple that we found was for the case where dcar witnessed though composing an instance of the detector pattern with that its detection predicate Xcar holds, but the state of the the functional model of an FIS may add a layer of complex- Car object had been changed to another state without reset- ity, the modularity provided by the detector pattern facili- ting Zcar (i.e., Xcar was no longer true). In this case, the tates the management of such complexity. (Other pattern- safeness of dcar was violated. The Theseus visualization driven methods may also suffer from this additional layer tool highlighted all transitions that leave the Accelerating of complexity introduced by pattern instantiation.) state of the Car object as the transitions that would vio- 5 Model Analysis late the safeness of the detector pattern. To remedy this inconsistency, we manually revised the object model so that In order to verify whether composing an instance of the any transition originating from the Accelerating state would detector pattern with the conceptual model of a FIS gener- be accompanied with the simultaneous reset of Zcar (i.e., ates a valid model of a failsafe system, we extend the Hydra Zcar := false). Notice that, in this case, resolving the formalization framework [28] to generate a Promela specifi- inconsistencies of the detector pattern and the functional cation of a candidate UML model of an FTS. Subsequently, model required a change in the functional model. We note we use the Spin model checker [21] to detect potential in- that this change does not affect the computations of the consistencies between the instances of the detector pattern functional UML model in the absence of faults. and the functional model of the FIS. The general UML- to-Promela formalization approach of Hydra is to map ob- 6 Conclusions and Future Work jects to processes in Spin that exchange messages via chan- nels. Nested and concurrent states are also formalized as In this paper, we introduced an object analysis pattern, processes. Additional details on the modeling and analysis called the detector pattern, for modeling and analyzing fail- process, and the underlying formalization framework can safe fault-tolerance, where instances of the detector pat- be found in [28]. We use the current Hydra formalization tern are added to the UML model of a system to create the directly to generate Promela specifications of the functional UML model of its failsafe fault-tolerant version. The detec- objects of the FIS and each di element as a separate process. tor pattern also provides a set of constraints for verifying We extend Hydra to include a number of new formaliza- the consistency of functional and fault-tolerance require- tion rules that treat the transitions of a fault-type f differ- ments and the fault-tolerance of the detector pattern itself. ently than other transitions. As we distinguish fault tran- We extended McUmber and Cheng’s formalization frame- sitions from regular transitions by defining a <> work [28] to generate formal specifications of the UML stereotype [7], the extended Hydra integrates the transitions model of fault-tolerant systems in Promela [21]. Subse- of f modeled in different state diagrams in a separate pro- quently, we used the Spin model checker [21] to detect the cess Fault f in Promela that is concurrently run with all other inconsistencies between fault-tolerance and functional re- processes. Such a formalization is advantageous in that quirements. To facilitate the automated analysis of failsafe the resulting Promela model separates faults from the func- fault-tolerance, we employed the Theseus visualization tool tional part of the Promela specifications so that the effect [20] that animates counterexample traces and generates cor- of faults on system behaviors can easily be simulated and responding diagrams at the UML level. As an extension of this work, we are currently investigating the incorporation [18] R. France and G. Georg. An aspect-based approach to mod- of timing issues in the detector pattern. eling fault-tolerance concerns. Technical Report 02-102, Computer Science Department, Colorado State University, References 2002. [19] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design [1] UML profile for modeling quality of service and Patterns: Elements of Reusable Object-Oriented Software. characteristics and mechanisms. Addison-Wesley Publishing Company, 1995. www.omg.org/docs/ptc/04-06-01.pdf, 2002. [20] H. Goldsby, S. Konrad, B. H. Cheng, and S. Kamdoum. [2] A. Arora and M. G. Gouda. Closure and convergence: A Enabling a roundtrip engineering process for the model- foundation of fault-tolerant computing. IEEE Transactions ing and analysis of embedded systems. Proceedings of on Software Engineering, 19(11):1015–1027, 1993. the ACM/IEEE International Conference on Model Driven [3] A. Arora and S. S. Kulkarni. Detectors and Correctors: A Engineering Languages and Systems (MODELS), Genova, theory of fault-tolerance components. IEEE International Italy, October 2006. Conference on Systems (ICDCS), [21] G. Holzmann. The model checker SPIN. IEEE Transactions pages 436–443, May 1998. on Software Engineering, 1997. [4] A. Avizienis, J. Laprie, B. Randell, and C. Landwehr. Basic [22] D. Ilic and E. Troubitsyna. Modeling fault tolerance of tran- concepts and taxonomy of dependable and secure comput- sient faults. Proceedings of Rigorous Engineering of Fault- ing. IEEE Transactions on Dependable and Secure Com- Tolerant Systems, pages 84 – 92, 2005. [23] T. P. Kelly and J. A. McDermid. A systematic approach puting, 1(1):11–33, January – March 2004. to safety case maintenance. In SAFECOMP, pages 13–26, [5] D. Beder, B. R. A. Romanovsky, C. Snow, and R. Stroud. 1999. An application of fault-tolerance patterns and coordinated [24] S. Konrad, B. H. C. Cheng, and L. A. Campbell. Object atomic actions to a problem in railway scheduling. ACM analysis patterns for embedded systems. IEEE Transactions SIGOPS Operating System Review, 34(4), 2000. on Software Engineering, 30(12):970 – 992, 2004. [6] A. Bondavalli, I. Majzik, and I. Mura. Automatic depend- [25] S. S. Kulkarni and A. Arora. Automating the addition of ability analysis for supporting design decisions in UML. fault-tolerance. In Proceedings of the 6th International IEEE International Symposium on High-Assurance Systems Symposium on Formal Techniques in Real-Time and Fault- Engineering, pages 64–71, 1999. Tolerant Systems, pages 82–93, 2000. [7] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Mod- [26] S. S. Kulkarni and A. Ebnenasir. Enhancing the fault- eling Language User Guide. Addison-Wesley, 1999. tolerance of nonmasking programs. Proceedings of the 23rd [8] M. Demirbas and A. Arora. Convergence refinement. In- IEEE International Conference on Distributed Computing ternational Conference on Distributed Computing Systems, Systems (ICDCS), pages 441–449, 2003. pages 589 – 597, 2002. [27] N. G. Leveson and J. L. Stolzy. Safety analysis using [9] E. W. Dijkstra. Self-stabilizing systems in spite of dis- petri nets. IEEE Transactions on Software Engineering, tributed control. Communications of the ACM, 17(11), 1974. 13(3):386–397, 1987. [10] B. P. Douglass. Doing Hard Time: Developing Real- [28] W. McUmber and B. H. C. Cheng. A general framework for Time Systems with UML, Objects, Frameworks and Patterns. formalizing UML with formal languages. In the proceedings Addison-Wesley, 1999. of 23rd International Conference of Software Engineering, [11] M. Dwyer, G. Avrunin, and J. Corbett. Patterns in property pages 433 – 442, 2001. specifications for finite-state verification. In Proceedings of [29] N. Mittal and V. K. Garg. On detecting global predicates in the 21st International Conference on Software Engineering distributed computations. In Proceedings of the 21st IEEE (ICSE99), Los Angeles, CA, USA, pages 411–420, 1999. International Conference on Distributed Computing Sys- [12] A. Ebnenasir and B. H. C. Cheng. A framework for model- tems (ICDCS), Phoenix, Arizona, USA, pages 3–10, April ing and analyzing fault-tolerance. Technical Report MSU- 2001. CSE-06-5, Computer Science and Engineering, Michigan [30] B. Randall. System structure for software fault-tolerance. State University, East Lansing, Michigan, January 2006. IEEE Transactions on Software Engineering, pages 220– [13] A. Ebnenasir and S. S. Kulkarni. FTSyn: A 232, 1975. framework for automatic synthesis of fault-tolerance. [31] C. M. F. Rubira, R. de Lemos, G. R. M. Ferreira, and F. C. http://www.cs.mtu.edu/ aebnenas/research/tools/ftsyn.htm. Filho. Exception handling in the development of dependable [14] E. N. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. component-based systems. Software Practice and Experi- A survey of rollback-recovery protocols in message-passing ence, 35:195–236, 2005. systems. ACM Computing Surveys, 34(3):375–408, 2002. [32] T. Saridakis. A system of patterns for fault-tolerance. The [15] E. Emerson. Handbook of Theoretical Computer Science: 7th European Conference on Pattern Languages of Pro- Chapter 16, Temporal and Modal Logic. Elsevier Science grams (EuroPLoP), pages 535–582, 2002. Publishers B. V., 1990. [33] F. B. Schneider. Implementing fault-tolerant services using [16] M. J. Fischer, N. A. Lynch, and M. S. Peterson. Impos- the state machine approach: A tutorial. ACM Comput. Surv., sibility of distributed consensus with one faulty processor. 22(4):299–319, Dec. 1990. Journal of the ACM, 32(2):373–382, 1985. [34] M. Tkatchenko and G. Kiczales. Uniform support for mod- [17] M. Fowler. Analysis Patterns: Reusable Object Models. eling crosscutting structure. Appeared in AOM Workshop Addison-Wesley, 1997. held in conjunction with AOSD, 2005.