NASANASA ActivitiesActivities inin RiskRisk AssessmentAssessment

NASANASA ProjectProject ManagementManagement ConferenceConference

MarchMarch 30-31,30-31, 20042004

Michael G. Stamatelatos, Ph.D.,Director Safety and Assurance Requirements Division Office of Safety and Mission Assurance NASA Headquarters

1 NASA is a Pioneer and a Leader in Space; Therefore Its Business Is Inherently Risky

International Space Station ‹ Safe assembly and operation

Space Transportation ‹ ‹ Orbital Space Plane

2 Our Goal z Improve risk awareness in the Agency ¾ Conduct PRA training for line and project managers and for personnel z Develop a corps of in-house PRA experts z Transition PRA from a curiosity object to baseline method for integrated system safety, reliability and risk assessment z Adopt organization-wide risk informed culture ¾ PRA to become a way of life for safety and technical performance improvement and for cost reduction ¾ Implement risk-informed management process

3 Probabilistic Risk Assessment (PRA) Answers Three Basic Questions

Risk is a set of triplets that answer the questions:

1) What can go wrong? (accident scenarios)

2) How likely is it? (probabilities)

3) What are the consequences? (adverse effects)

Kaplan & Garrick, Risk Analysis, 1981

Event Event Sequence Sequence Frequency Modeling Evaluation

2. How frequently does it happen? (Scenario frequency quantification)

Event Initiating Sequence Risk Event PRA Insights Logic Integration Selection Development

1. What can go wrong? Risk statement Decision Support (Definition of scenarios) 3. What are the consequences? (Scenario consequence quantification)

Consequence Modeling PRA is generally used for low-probability and high- consequence events 4 Relationship Between and Probabilistic Risk Assessment (PRA)

Continuous Risk Management Method Technique Application

QualitativeQualitative FMEA. RiskRisk FMEA. MLD, AssessmenAssessmentt MLD, ESD,ESD, Technical ETA, Technical ETA, RiskRisk FTA,FTA, RBDRBD and/or ProbabilisticProbabilistic and/or Risk Risk Program Assessment ActuActuarial/arial/ Program Assessment Risk StatisticalStatistical Risk AnalyAnalysesses

Legend: Decision Decision FMEA - Failure Modes & Effects Analysis Analysis Analysis MLD - Master Logic Diagram ManagementManagement ESD - Event Sequence Diagram SySystemstem ETA - Event Tree Analysis FTA - Fault Tree Analysis RBD - Reliability Block Diagram

5 NASA Risk Management and Assessment Requirements

• NPG 7120.5A, NASA Program and Project Management Processes and Requirements ¾ The program or project manager shall apply risk management principles as a decision-making tool which enables programmatic and technical success. ¾ Program and project decisions shall be made on the basis of an orderly risk management effort. ¾ Risk management includes identification, assessment, mitigation, and disposition of risk throughout the PAPAC (Provide Products And Capabilities) process. • NPG 8000.4, Risk Management Procedures and Guidelines ¾ Provides additional information for applying risk management as required by NPG 7120.5A. • NPG 8705.x (draft) PRA Application Procedures and Guidelines ¾ Provides guidelines on how to apply PRA to NASA’s diversified programs and projects

6 How Does PRA Help Safety?

Provides a basis for risk reduction through: 1. Accident/Mishap Prevention (best) 2. Accident/Mishap Consequence Mitigation

Adverse Event in a Sequence

Prevention Mitigation

7 The Concept of an Accident Scenario

PIVOTAL EVENTS

Pivotal Pivotal End State IE Event 1 Event n

Detrimental The perturbation ‰ Mitigative consequence of ‰ Aggrevative interest (Quantity of intereset to decision-maker)

Risk Scenario is a string of events that (if they occur) will lead to an undesired end state.

8 Exact vs. Uncertain Probabilities

Uncertainty distribution of probability values

Exactly known probability ability ability b value b Pro Pro

A narrower range means a more precise knowledge of the distribution, or less uncertainty 9 Quantification of Uncertainty

Probability density function, e.g., probability of LOCV Uncertainty Distribution: P(x) is the probability (or xth percentile ρ(x) confidence) that the result value is x P(x) median is the 50th percentile is area under 5% x 50% 95% curve between Median 0 and x Uncertainty Range: Uncertainty range 5th percentile 95th percentile (spread) from the 5th to the 95th percentile Uncertainty (confidence) range

10 Event- and Fault-Tree Scenario Modeling

No damage to No damage to Hydrazine leaks Leak detected Leak isolated flight critical scientific End state Leak not detected avionics equipment IE LD LI A S

OK

Loss of science

Loss of Spacecraft Presure Presure Controller fails transducer 1 transducer 2 OK fails fails

Loss of science CN P1 P2

common cause Loss of failure of P. Spacecraft transducers OK PP distribution Loss of science Loss of Spacecraft Fault tree Event tree

11 PRA Methodology Synopsis

Inputs to Decision Making Process Master Logic Diagram (Hierarchical Logic) Event Sequence Diagram (Logic)

IE End State: ES1 A B End State: OK

EnEndd S Stattate:e: ES2 ES2 End State: ES2

C D E End State: ES1

End State: ES2 LING DE C MO Event Tree (Inductive Logic) Fault Tree (Logic) One to Many Mapping of an ET-defined Scenario I

End Not A IE A B C D E NEW STRUCTURE State LOG

Logic Gate 1: OK Basic Event ‰ Internal initiating events One of these events ‰ External initiating events 2: ES1 ‰ Hardware components AND 3: ES2 ‰ Human error 4: ES2 ‰ Software error one or more ‰ Common cause of these 5: ES2 elementary ‰ Environmental conditions events 6: ES2 ‰ Other Link to another fault tree

Probabilistic Treatment of Basic Events Model Integration and Quantification of Risk Scenarios Risk Results and Insights 30 50 60 25 50 40 End State: ES2 Integration and quantification of 20 40 30 100 logic structures (ETs and FTs)

15 ‰ Displaying the results in tabular and graphical forms IC 30 and propagation of epistemic 10 20 80 20 uncertainties to obtain ‰ Ranking of risk scenarios 10 5 10 End State: ES1 60 ST 0.02 0.04 0.06 0.08 ‰ Ranking of individual events (e.g., hardware failure, I 0.01 0.02 0.03 0.04 0.02 0.04 0.06 0.08 ‰ minimal cutsets (risk 40 human errors, etc.) scenarios in terms of Examples (from left to right): 20 basic events) ‰ Insights into how various systems interact Probability that the hardware x fails when needed ‰ likelihood of risk ‰ Tabulation of all the assumptions ABIL Probability that the crew fail to perform a task 0.01 0. 02 0.03 0.04 0.0 5 scenarios

‰ Identification of key parameters that greatly B Probability that there would be a windy condition at the time of landing ‰ uncertainty in the likelihood estimates influence the results The uncertainty in occurrence of an event is ‰ Presenting results of sensitivity studies

characterized by a probability distribution PRO

12 What Decision Types Can PRA Support? z Safety improvement in design, operation, maintenance and upgrade (throughout life cycle); z Mission success enhancement; z Performance improvement; and z Cost reduction for design, operation and maintenance

For all these areas of application, PRA can help: z Identify leading risk contributors and their relative values z Indicate priorities for resource allocation z Optimize results for given resource availability

13 Areas of PRA Application at NASA

z In Design and Conceptual Design (e.g., Crew Exploration Vehicle, Mars missions, Project Prometheus) z For Upgrades (Space Shuttle) z For Development/construction/assembly (e.g., International Space Station) z When there are requirements for Safety Compliance (e.g., nuclear missions like Mars ’03; Project Prometheus, Mars Sample Return)

14 NASA Procedural Requirement NPR 8705 (Draft)

CONSEQUENCE NASA PROGRAM/PROJECT CRITERIA / SPECIFICS PRA SCOPE CATEGORY (Classes and/or Examples) Planetary Protection Program Mars Sample Return Missions F Requirement White House Approval Nuclear Payloads Public Safety F (PD/NSC-25) (e.g., Cassini, Ulysses, Mars 2003) Human Safety and Space Missions with Flight Launch Vehicles F Health Termination Systems International Space Station F Human Space Flight Space Shuttle F Orbital Space Plane/Space Launch Initiative F High Strategic Importance Mars Program F Launch Window High Schedule Criticality F (e.g., planetary missions) Mission Success Earth Science Missions L/S (for non-human (e.g., EOS, QUICKSCAT) rated missions) Space Science Missions All Other Missions L/S (e.g., SIM, HESSI) Technology Demonstration/Validation (e.g., L/S EO-1, Deep Space 1)

F = Full scope; L/S = Limited or Simplified PRA Scope Legend: F = Full scope; L = Limited scope; S = Simplified PRA

15 NASA Special PRA Methodology Needs z Broad range of programs: Conceptual non-human rated science projects; Multi-stage design and construction of the International Space Station; Upgrades of the Space Shuttle z Risk initiators that vary drastically with type of program z Unique design and operating environments (e.g., microgravity effects on equipment and humans) z Multi-phase approach in some scenario developments z Unique external events (e.g., micro-meteoroids and orbital debris) z Unique types of adverse consequences (e.g., fatigue and illness in space) and associated databases z Different quantitative methods for human reliability (e.g., astronauts vs. other operating personnel) z Quantitative methods for software reliability

16 Space Shuttle Probabilistic Risk Assessment

17 STS Nominal Mission Profile

ASCENT completes ORBIT completes

APU Shutdown TAL TIG-5

t ~ 150kft)

SSME start ENTRY completes APU start

18 Current Shuttle PRA Results for LOCV (provisional)

1/10025 median = 1 in 123 y t i s n e on i D mean = 1 in 76 t y t i l i 5th percentile = 1 in 617 Func ab b o

r 95th percentile = 1 in 23 P

1/250 0.1/00500 1/0.10001 0.1/0250 1/0.3303 1/0.2504 1/0.2005 truncated Number of Missions

19 Summary of Shuttle PRA Historical Results

1993 PRA 1988 PRA update the First PRA 2003 PRA 1998 PRA Galileo study conducted on Integrated Unpublished 1995 PRA results to reflect the Space PRA with all analysis using First major the current Shuttle for 0.1 elements. QRAS. No study of (April 1993) test 1996 PRA ascent only, 18 Orbiter Integration of 1997 PRA Shuttle PRA. and operational og scale) Bayesian up l systems. elements. First use of for Galileo date of the Analysis by experience Incl. MMOD Limited to 3 QRAS tool. Mission 1995 model by SAIC with base of the Orbiter Update of adding 17 input from Shuttle. systems and 1995 PRA with successful prime the propulsion new look at 1/78 flights contractors

values ( elements APU 0.01 1/90 1/2451/123 1/145

Median 1/147 1/161 1/2451/254 Probability of Loss of Crew and Vehicle Probability of Loss 0.001 SPRAT PRA Shuttle PRA Shuttle PRA Shuttle PRA Shuttle PRA Phase 1 - Galileo 1988 2003 1998 1997 1996 1995 1993 (Ascent (Ascent only) only) Ascent, Ascent and entry only entry, and orbital debris 20 Annual Voluntary Risks in Some Sports - Comparable in Magnitude to Shuttle Risk

z Professional stunting 1/100 z Dedicated mountain climbing 1/167 z Air show/air racing and acrobatics 1/200 z Amateur flying in home-built aircraft 1/333 z Experienced whitewater boating 1/370 z Sport parachuting 1/500

Source: R. Wilson and E. Crouch, Risk-Benefit Analysis, Harvard University Press, 2001

21 International Space Station (ISS) PRA

• 1999 -- The NASA Advisory Council recommended, the NASA Administrator concurred, and the ISS Program began a PRA. − The modeling will be QRAS- compatible. − First portion of PRA (through Flight 7A) - delivered in Dec. 2000; Second portion (through Flight 12A) delivered in July 2001.

22 23 Important ISS PRA Findings

‹MMOD: lead contributor to loss of station (LOS) risk

‹Illness in space: lead contributor to loss of crew (LOC) risk

24 Approach to PRA for NASA Top-Level Designs (e.g., Crew Exploration Vehicle)

Strawman Mission Level 1 Architecture Design Activities Requirements Cost and Design Risk an Safety Schedule Concepts Goals and Risk Requirements Assessments

Definition of Safety, Operational Reliability, Mission Phases PRA Schedule, Requirements and Cost Trade-offs First-Order Risk Goal Performance Allocation for Requirements Each Phase Design Engineering

System Reliability Allocation Past Related PRA Efforts; Mature e.g., Shuttle Design Top Level PRA

25 Advanced PRA Methods or Tools

z QRAS (Quantitative Risk Assessment System) – a state of the art integrated PRA computer program z Galileo/ASSAP – Dynamic fault tree program z Software reliability methodology for use in PRA z External event methodology for micro- meteoroid and orbital debris (MMOD) risk into the overall risk assessment

26 QRAS 1.7 Is Being Commercialized

Space Shuttle time in seconds R(i,x, t) = R0(i,x, t) t1 t2 t3 t4 SSME SRB ORBITER ET -6 0 128 510 where: SSME R is reliability of manifold weld HPOTP ______LPFTP ______LPFTP HPFTP HEX MCC i is physical state HEX ______x is vector of physical variables MCC ______t is time SRB TVC ______Manif Engineering Modes NOZZLE ______Bolt Weld Seal to determine initiating event probability Fail Fail Fail Mission Timeline Element/Subsystem Hierarchy Event Sequence Diagram Initiating Event Event Tree (quantification) Manif Is crack Is repair (furure unhancement: dynamic) Weld Yes Successful Manifold Success Prob. Failure detectable? 100% effect.? op. Weld S = 0.9 S = ... S = ... No Failure LOV Prob. F = ... Success Prob. Is crack small F = ... S = ... enoung to survive Yes Successful LOV Prob. 1 mission? op. F = ... Mission Fail. Prob. F = 0.1 No S = 0.96 LOV Prob. F = 0.04 Loss of flow HPFTP cavitates No Successful to LPFTP LOX rich op. op.

Fault Tree Risk Aggregation Probability Distribution (use to quantify pivotal events) for initiating event What if? Tool Box Probability of Results 1. Remove old subsystem and Bayesian Manifold Weld replace with design. Updating Failure 2. Change failure probabilities for initiating events. Mathematica 3. Eliminate failure modes. 4. Vary parameters of engineering Others (future) models. Pr 5% tile 95% tile median

27 In Summary, We Plan to z Continue to improve risk awareness ¾ Conduct PRA training for line and project managers and for personnel z Continue to develop a corps of in-house PRA experts z Transition PRA to baseline method for safety assessment z Integrate risk assessment with system safety and reliability assessment z Adopt organization-wide risk informed culture ¾ PRA to become a way of life for safety and technical performance improvement and for cost reduction ¾ Implement risk-informed management process

28