From: AAAI Technical Report SS-94-04. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.

System Reliability and Risk Assessment: A Quantitative Extension of IDEF Methodologies

AndrewKusiak, Intelligent Systems Laboratory, Departmentof Industrial Engineering Nick Larson, Intelligent SystemsLaboratory, Deparlmentof Industrial Engineering The University of Iowa, Iowa City, Iowa 52242-1527

1. Introduction for information analysis (IDEF1), dynamic analysis Evaluating system reliability requires modeling the (IDEF2), and process modeling (IDEF3). interaction of resources, information, and material within This paper presents procedures for integrating system the system. Such a model must consider quantitative data reliability and risk assessment techniques with IDEF0and describing the reliability of each element of the system, as IDEF~modeling. The paper is motivated by the need to well as logical data describing the relationship between increase the value of IDEF0 and IDEF3 models by individual components. For example, a manufacturing incorporating quantitative data, thus, extending model use system may assemble products X, Y, and Z, on machines to applications such as risk assessment. By extending the M1, M2, and M3, respectively, and package the products power of existing IDEF models, process modeling will on a fourth machine, M4. Therefore, three different become more attractive to managementand evolve as a relationships exist between the componentsof the system, powerful tool for reengineering design and manufacturing one for each product. If the reliability of each machine systems. differs, the system reliability will differ dependingon the product. Similarly, the reliability of the system as a whole 2. Definitions of System Reliability and Risk will be affected by the production levels of products X, Y, Assessment and Z. Reliability maybe defined as the ability of an item Given the example above, with only three products and (product, system, etc.) to operate under designated operating four machines, it becomes apparent that determining conditions for a designated period of time or number of systemreliability requires a significant amountof data and cycles (Modarres, 1993). An item’s reliability is often a structured modeling methodology. Furthermore, to measuredby the probability it will perform without failure obtain an accurate assessment of system reliability, it is given a set of conditions. The following expression is a necessary to include additional data describing information probabilistic representationfor reliability. and material componentsof the system; such as inspection procedures, assembly specifications, parts, and R(t) = P(T > tlc~, c2, ...) (1) subassemblies. Systemreliability describes the likelihood of success or failure in the operation of a system. Risk assessmentis In (1), t is the period of time for the item’s operation, a technique for identifying scenarios that lead to problems T is the time to failure of the item, R(t) is the reliability of in a system, determining the likelihood of each scenario, the item, and Cl,C2 .... are the conditions under whichthe and the evaluating the consequenceof each scenario, i.e., item is operating. Thevariable t is often referred to as the the problem. Although quantitative approaches to risk mission time. In practice, T is a random variable assessmentemploy the principles of system reliability, the representing the time-to-failure of the item, and Cl,C2 .... identification, quantification, and evaluation of risk is a are implicitly considered. Furthermore, fit) mayrepresent more comprehensive modeling activity. Therefore, risk the probability density function of the randomvariable T. assessmentprojects are well suited for the tools developed The probability that the item fails prior to time t is defined for process modelingand analysis. in (2). In 1978, the United States Air Force selected SADT (Structured Analysis and DesignTechnique) as the language to support the Integrated Computer Aided Manufacturing P(T < t) = I~ f( O)dO = F(t), for t>0 (2) (ICAM) program. SADTactivity modeling was adopted by the ICAMprogram and revised by Sofrech, Inc. to Since F(t) denotes the probability the item will fail develop the ICAMDefinition Methodology (IDEF0). prior to time t, it is formally the unreliability of the item. Ross (1985) states that "thousands of people from hundreds Thus, the reliability of the item is determinedby (3). of organizations working on more than one hundred major projects" proceeded to use the methodology for system definition and design, as well as . R(t) = 1- F(t)= It’f(O)dO (3) IDEF0introduced manufacturing to techniques that were developed for computer system and A system is a collection of entities (i.e., information applications. Additional IDEFtechniques were developed and material) and resources (i.e., machines and workers)

88 whichinteract to performa set of activities in a given techniques for determining system reliability and process. Successfulcompletion of the process is dependent integrating risk assessment and IDEFmodels. Section 5 uponproper completionof the individual activities in the discusses issues related to risk assessment, such as process. Therefore, it is necessary to model the developing quantitative risk modelsbased on IDEF0and relationship betweenvarious items (entities and resources), IDEF3. as well as the reliability of individualitems to assess the reliability of the system. Complexmanufacturing systems 3. Fundamentals of IDEF0 and IDEF3 producemany different products through manydifferent IDEF0was developedfor modelinga wide variety of sequences of manufacturing activities. The system systems which use hardware, software, and people to reliability is a function of the activities performedand, perform activities (U. S. Air Force, 1981). An IDEF0 thus, is product dependent.Therefore, if the production modelconsists of three components,diagrams, text, and a volumeof a product exhibiting low systemreliability is glossary, all cross-referencedto each other. Thebox and increased,the reliability of the entire manufacturingsystem arrow diagramsare the major componentsof the model. In will decrease. At this point, the necessity for evaluating a diagram, a box represents a function and an arrow system reliability in a manufacturingsetting becomes represents an interface. A box is assigned an active verb obvious. phrase to represent the function. Aninterface maybe an Manyof the commontechniques for modelingsystem input, an output, a control, or a mechanism,and is reliability are difficult to applyto complexmanufacturing assigneda descriptivenoun phrase. Inputs (I) enter the box systems with multiple product types. Therefore, the fromthe left, are transformedby the function, and exit the principles of such tools are moreuseful whenapplied to boxto the right as an output(O). A control (C) enters modelingschemes developed for manufacturingsystems, top of the box and influences or determinesthe function such as IDEF0and IDEF3.Also, the task of evaluating performed. A mechanism(M) is a tool or resource which system reliability in a manufacturingsetting is more performsthe function. Theinterfaces are generallyreferred attractive if performedas a componentof a risk assessment to as the ICOMs(see Figure 1). study. Risk is a measureof the probability and severity of adverse effects (Lowrance,1976). Several types of risk Control (C) associated with project planningand software development have been cited in the literature, see Angand Gay(1993) and Chittister and Haimes (1993). The following manufacturingrisks are generalized from those cited in variousengineering disciplines. Input(I) Output (0) 1. Requirementsrisk. Theconcept of whatthe FUNCTION productis intendedto accomplishis not accurate. 2. Technicalrisk. Theproduct does not adhereto the requirementsset forth by its design. 3. Schedulerisk. Theproduct will not be completed by the deadlineset forth by productionplanning. 4. Costrisk. Theproduction cost will overrunits budget. Mechansim(M) 5. Networkrisk. The mechanismfor linking various productionactivities will not performas intended. Figure 1. IDEF0function box and interface arrows Risk assessmentis a process that attempts to answer three questions: (1) Whatcan go wrong?(2) Whatis Eachdiagram has betweenthree and six function boxes likelihood that it will go wrong? (3) What are the placed on a diagonal. Theboxes each have a specific node consequences?(Kaplan and Garrick, 1981). Basedon these numberand are connectedby all relevant interfaces. Each questions,(4) is a quantitativedefinition of risk, whereSi box on the diagrammay be decomposedinto a lower level is a scenario of eventsthat leads to a problem,Pi is the of detail. This feature restricts the amountof information that maybe containedin the modelon a single level. The likelihood of scenario i, and Ci is the consequenceof resulting diagramsform a hierarchy of informationwhich scenarioi. is summarizedin a nodetree. IDEF0provides a structured representation of the R = {Si, Pi, C~} i =1, 2 ..... n (4) functions,information, and objects whichare interrelated in a manufacturingsystem. IDEF3was created specifically to This section has provided definitions of system model the sequence of activities performed in a reliability and risk assessment. Section 4 discusses manufacturingsystem. AnIDEF3 model enables an expert

89 Reject proposal Evaluate Negotiate

Award contract Accept

Figure 2. IDEF3process flow diagram(Mayer et al. 1992) to communicatethe process flow of a system through are integrated with IDEF0and IDEF3. For a detailed defining a sequenceof activities and the relationships discussionof each technique,see Modarres(1993). betweenthose activities. There are two basic components of the IDEF3process description language, the process 4.1 Reliability Block Diagrams flow description and the object state transition network Reliability block diagrams model the effect of description. The two componentsare cross-referenced to componentfailure on systemperformance by capturing the build IDEF3diagrams (Mayer et al., 1992). physical arrangement of the system. Typical system The IDEF3process flow description is madeup of configurations include series systems, parallel systems, units of behavior (UOBs),links, and junction boxes. standby redundant systems, shared load systems, complex UOBrepresents a function or activity occurring in the parallel-series systems, and complexnonparallel-series process. For example,assemble parts, performinspection, systems (Modarres, 1993). Additional system or evaluate proposal are all activities which maybe configurationsmay be identified in various applications, represented as UOBsin a process model. Relationships however, most manufacturingsystems may be accurately between UOBsare modeledwith three types of links, described using those listed above. Figure 3 shows precedencelinks, relational links, and object flowlinks. reliability blockdiagrams for series and parallel systems Precedence links express simple temporal precedence and Figure 4 illustrates complexsystems. Thereliability betweenUOBs. Relational links highlight the existence of of complex systems may be calculated using various a relationship between two or more UOBs,however, no analytical methods, however, such methods become temporalconsWaint is implied. Objectflow links providea computationally intensive as the numberof components mechanismfor capturing object related constraints between increases (Shooman,1990). UOBsand carry the same temporal semantics as a Thesystem configurations illustrated in Figures 3 and precedencelink. Thelogic of branchingwithin a processis 4 mayalso be used to describe IDEF0and IDEF3models. modeledusing junctions. Several classifications are used Ang and Gay(1993) discuss extensions to IDEF0models to def’mejtmction boxes. Junctions are classified according which enable project risk assessment. Several project to logical semanticsas and(&), or (O), and exclusive or situations are described whichmay be generalized to the (X). Multipleprocess paths are classified as fan-in or fan- systemconfigurations described above. Theextensions are out corresponding to converging and diverging paths, efficient for includingquantitative data in an IDEF0model, respectively. The relative timing of process paths that such as a probability of occurrence. However,due to the converge or diverge at a junction are classified as decompositionprinciple of IDEF0,it is difficult to identify synchronous or asynchronous. An example of an IDEF3 complex system configurations in the model. The process flow diagramis shownin Figure 2 (Mayeret al., concepts of reliability block diagrams are more easily 1992). adapted to IDEF3models. Consider Figure 2; the five activities in the modelare arrangedin a complexparallel- 4. Integrating System Reliability Techniques series configuration. Unlike IDEF0models, the numberof and IDEF Models activities (i.e., functions)in IDEF3models is not restricted Asstated in section 2, systemreliability tools maybe to six per level. AlthoughIDEF3 allows for elaboration quite useful whenapplied to modelingschemes developed on a particular UOB(i.e., activity), the entire processflow for manufacturingsystems, such as IDEF0and IDEF3.In maybe constructedon a single level. This representation this section, several systemreliability modelingtechniques is moresuitable for IDEFapplications of risk assessment

90 which are based on the principles of reliability block diagrams.

l ll +

(a)

(a)

Co) Figure3. Series (a) and parallel (b) systemconfigurations

In IDEF3,series systems(or the series componentsof a complexsystem) are identified by UOBsconnected by precedenceand/or object flow links. A series of UOBsmay not contain a junction of any type, however,a junction box maybegin or terminate a series. Furthermore,a UOB is independentif it is not connectedto another UOBby a Co) relational link. Thereliability of a series of independent UOBsis definedby (5), whereRs(t) is the reliability of the Figure 4. Complexparallel-series (a) and system, the system contains N UOBs,and Ri(t) is the nonparallel-series(b) systems reliability of the ith UOB. Thesystem reliability for parallel UOBsfollowing an N or (O) junctionbox is moredifficult to determine.At least R,(t) = Rl(t) × R:(t)X...×RN(t) = H Ri(t) (5) one, and as manyas all, of the UOBsfollowing an O i=l junction box maybe executed. Unlike the & and X cases, the numberof different UOBsperformed in parallel is Parallel systemsare defined in IDEF3using junction unknown.However, if the system contains N UOBs,RiO) boxes. Each UOBimmediately following an and (&) is the reliability of the ith UOB,and Pi is the probability junction box will be performedin parallel. Therefore,the of occurrenceof the ith UOB(Pi < 1), a lower boundon reliability of the parallel systemfollowing an & junction the systemreliability is given by (7). This describes the box is determinedby (5) as well. However,only one UOB worst case, where all UOBsare executed and subject to immediatelyfollowing an exclusive or (X) junction box failure. If the reliability is to be evaluatedfor a knownset performed.Thus, the systemreliability for parallel UOBs of MUOBs in the parallel system (M < N), (5) may following an X junction box is defined by (6), wherethe used. Figure 5 illustrates parallel systemUOBs for &, X, systemcontains N UOBs,Ri(t) is the reliability of the ith and 0 junction boxes in IDEb3. UOB,and Pi is the probability of occurrenceof the ith UOB(PI+P2+...+PN = 1). R,(t) = (PI x Rl(t)) X (P: R2(t))x...x(Pn x Rn N Rift) = (PI x Rl(t)) (P: x Rz(t))+...+(Pn x R~ = H ( P, x R,(t)) (7) N i=1 = E(P~ X Ri(t)) (6) i=l Twoor moreUOBs connected by relational links may be considered a single UOBwhen calculating system reliability in IDEF3models. In Figure 2, UOBs3 and 4

91 are connectedby a relational link, implyingan interaction models. A path set is a set of units (i.e., activities, with UOB4 if UOB3 is executed. In this example, functions, UOBs)that form a connectionbetween input and successfully executingUOB 3 will also require successful output whentraversed in the direction of the arrows completion of UOB4 (i.e., R3(t) = R3(t) x R4(t) (Modarres, 1993). For each path through an IDEF3model, However,UOB 4 does not require execution of UOB3 and, a minimalpath set will exist containing only the UOBson therefore, is not dependentupon its success. Thedirection the path. Onceagain, consider Figure 2. Thereare three of the relational link determinesthe reliability of the minimalpath sets in the IDEF3model; P1 = (1, 2), P2ffi connected UOBs.In general, if UOBi is connected to (1, 3, 5), and P3 = (1, 4, 5). Eachpath set represents UOB(i + 1) with a relational link in the direction of UOB event that wouldsuccessfully accomplishthe objective of (i + 1), the success or failure of UOBi will determine the system if each of the UOBson the path execute success or failure of UOB(i + 1). In IDEF3applications successfully. Therefore, the union of all m path sets of system reliability modeling,relational links maybe defines the set of all successfulcompletions of the system. avoidedby combiningrelated activities into a single (JOB. Theprobability of this unionrepresents the reliability of the system,as shownin (8).

R~(t) Prob(Pl u P2U...UPm)

Unfortunately,(8) requires that the path sets (Pi) disjoint and, in practice, this is seldomtrue. Anupper bound on the system reliability maybe determined by assuming that the path sets are disjoint, as in (9). However,for reliability values greater than 0.9 for the missiontime, as in mostpractical applications, (9) does (a) not yield a useful bound(Modarres, 1993).

Rs(t) < Prob(P0 + Prob(Pz)+...+Prob(Pm)

A cut set is a set of units (i.e., activities, functions, UOBs)that interrupt all possible connectionsbetween the input and output points in the diagram(Modarres, 1993). In IDEF3process flow models,the minimalcut set is the smallest set of UOBswhich prevent flow from input to output. Failure of all UOBsin the minimumcut set results in systemfailure. Theminimum cut sets in Figure 0,) 2 are C1= (1), C2= (2, 3, 4), and C3 ffi (5). If the model has n minimalcut sets and Ci represents the event that all UOBsin the cut set fall prior to the missiontime t, the system reliability is obtained from (10). Since the probabilitythat all the UOBsfall in at least one of the cut sets is the probabilitythat the systemfails, this value is subtractedfrom 1 to obtainthe reliability of the system.

R,(t) = 1 - Prob(C, u C2w...uC~) (10)

Asin (8), the unionin (10) is not usually disjoint, (c) thus, (11) gives the lowerbound for systemreliability. Sincethe probabilityof failure is usedto evaluatethe cut Figure 5. Parallel systemsmodeled using IDEF3notation sets and, in practice, these values are muchlower than for and(a), exclusiveor Co),and or (c) junctionboxes reliabilities, the lower bound in (11) is a better representationof systemreliability.

4.2 Path Set and Cut Set Methods R,(t) > 1 - [Prob(C0+ Prob(C2)+... Path set and cut set methods were developed to +Prob(Cn)] (11) determinethe reliability of complexsystems described by reliability block diagrams. However,the principles of Theprinciples of path sets maybe adaptedto evaluate these methodsare very useful in the analysis of IDEF3 decision making processes modeled with IDEF3. Each

92 minimalpath set identified in an IDEF3model corresponds It must be realized that the values for Pi and Ci are to a set of decisions in the operation of the system. based on approximations and assumptions and, thus, Junction boxes identify points where decisions are made possess a high level of uncertainty. Formal methods for within a system. The decision set corresponding to path treating uncertainty analysis are discussed in the literature = set P3 (1, 4, 5) is 3 ={do not re ject pr oposal, accept (Morgan and Henrion, 1990). proposal}. Twodecisions are made in the process; one Uponcalculating the expected risk in the system, two corresponding to each diverging (or fan ou0 junction box. courses of action may be taken; (1) explore alternative The first decision is to "not reject the proposal" and the managementdecisions to avoid risk (i.e., decrease the second is to "accept the proposal." Therefore, the likelihood of high risk scenarios), and (2) reengineer reliability of the decision set maybe easily determinedby processes to mitigate the consequences. calculating the probability of successfully completing all IDEF0 and IDEF3 were developed to provide a UOBsin the corresponding path set. mechanism for evaluating the performance of complex Cut sets maybe used to expose critical activities in manufacturing systems. System reliability and risk the system. UOBsin the intersection of all or manyof the assessment are applications which provide a useful cut sets maybe considered critical to the operation of the opportunity for extending the power of these system. Furthermore, a cut set with a high probability of methodologies. failure (i.e., there are few UOBsin the cut set and each has a high probability of failure) maybe considered a critical Acknowledgment group of UOBs. This researchhas been partially supported by grant No. Applying system reliability techniques to IDEF3 DAAE07-93-C-R080 from the U.S. Army Tank models requires incorporating quantitative data regarding Automotive Command. probability of occurrence of UOBsand probability of failure for each UOB.Reliability values for each of the References ICOMs in IDEF0 models may be used to obtain Ang, C. L. and R. g. L. Gay (1993). "IDEF0 modeling probability of failure data for the correspondingUOB of an for project risk assessment," Computersin Industry, IDEF3model. Diverging (or fan-ou0 junction boxes in 22, pp. 31-45. IDEF3may also contain probability of occurrence values Chittister, C. and Y. Y. Haimes(1993). "Risk associated for each UOBthat immediately follows the junction box. with software development: A holistic frameworkfor assessment and management,"IEEE Transactions on 5. Risk Assessment in IDEF Models Systems, Man, and Cybernetics, 23(3), pp. 710-723. Evaluating risk in IDEFmodels requires identifying Kaplan, S. and B. J. Garrick (1981). "Onthe quantitative scenarios, determining the likelihood of these scenarios, definition of risk," Risk Analysis, 1(1), 1981. and estimating the consequences.This set of objectives is Lowrance, W. W. (1976). Of Acceptable Risk, William often called the "risk triplet." The first two componentsof Kaufmann, Los Altos, CA. the risk triplet are related to the techniquesfor determining Mayer, R. J., T. P. Cullinane, P. S. deWitte, W. B. system reliability discussed in Section 4. Path sets maybe Knappenberger,B. Perakath and M. S. Wells (1992). used to identify scenarios in the system. Each scenario is Information Integration for ConcurrentEngineering qualitatively evaluated to identify possible problemsthat (lICE) IDEF3Process Description Capture Method mayresult. The likelihood of each scenario is determined Report, ArmstrongLaboratory, Wright-Patterson by the probability of occurrence of the UOBsin the path AFB, Ohio 45433, AL-TR-1992-0057. set. Modarres, M. (1993). What Every Engineer Should Know Determiningthe consequences of the scenario requires About Reliability and Risk Analysis, Marcel Dekker, estimating the impact on a set of performance measures. Inc., NewYork, NY. Typical performance measures in manufacturing systems Morgan, M. G. and M. Hem’ion (1990). Uncertainty: A are in-process inventory levels, lead times, set-up times, Guide to Dealing with Uncertainty in Quantitative scrap, rework, and resource utilization. Whenpossible, Risk and Policy Analysis, CambridgePress, consequences related to different performance measures Cambridge, U.K. should be converted to a single unit of measurement,such Ross, D. T. (1985). "Applications and extensions as dollar loss. SADT," Computer, April, pp. 25-34. The total expected risk in the system is calculated by Shooman,M. L. (1990). Probabilistic Reliability: An (12), wherePi is the probability of scenario i, and Ci is the Engineering Approach, 2nd Ed., Kreiger, Melbourne, consequence,or cost, of the scenario. FL. U. S. Air Force (1981). Integrated ComputerAided n Manufacturing (ICAM)Architecture Part I1, Volume R=~(P, xC,) (12) W-Functional Modeling Manual(IDEFO), Air Force iffil Materials Laboratory, Wright-Patterson AFB, Ohio 45433, AFWAL-tr-81-4023.

93