System Safety Engineering an Overview for Engineers And
Total Page:16
File Type:pdf, Size:1020Kb
A-P-T Research, Inc. SSyysstteemmSSaaffeettyyEEnnggiinneeeerriinngg AAnnOOvveerrvviieewwffoorr EEnnggiinneeeerrssaannddMMaannaaggeerrss P. L. Clemens October 2005 Topics… • What is System Safety Engineering? • When should System Safety be used? • How is System Safety done? • Who should perform System Safety analyses? • What does System Safety Cost? • Why do System Safety? M-05-02600-2 What’s a SYSTEM? (and a few other basics)… • SYSTEM: an entity, at any level of complexity, intended to carry out a function, e.g.: • A doorstop • An operating procedure • An aircraft carrier • An implantable insulin pump • Systems pose HAZARDS. Hazards threaten harm to ASSETS. • ASSETS are RESOURCES having value to be protected, e.g.: • Personnel • The product • The environment • Equipment • Productivity • Reputation • RISK, is an attribute of a hazard-asset combination — a measure of the degree of harm that is posed. M-05-02600-3 What’s System Safety? It has two chief aspects… • A DOCTRINE of Management Practice: • Hazards (threats to Assets) abound and must be identified. • Risk is an attribute of a hazard that expresses the degree 1 of the threat posed to an asset — risks must be assessed. • A non-zero Risk Tolerance Limit must be set — a management function. • Risks of Hazards exceeding the Tolerance Limit must be suppressed (or accepted by management). • A Battery of ANALYTICAL METHODS to support practice of the DOCTRINE — The analytical methods are divisible into: 2 • TYPES, addressing What / When / Where the analysis is done • TECHNIQUES, addressing How the analysis is done M-05-02600-4 The Types & Techniques of Analysis… TECHNIQUES (How)… TYPES (What / When / Where)… • Preliminary Hazard Analysis • Preliminary Hazard Analysis (PHA*)1/2/3 (PHA*) • Failure Modes and Effects • System Hazard Analysis Analysis (FMEA)1/3 • Subsystem Hazard Analysis • Fault Tree Analysis2/4 • Operating and Support Hazard • Event Tree Analysis3/4 Analysis • Cause-Consequence • Occupational Health Hazard Analysis3/4 Analysis • Hazard & Operability Study • Software Hazard Analysis (HAZOP)1/3 • many others… • Job Hazard Analysis But, 1/3 The TYPES and TECHNIQUES are to… (JHA/JSA) WHAT • Digraph Analysis1/3 • IDENTIFY HAZARDS, and to… IS • ASSESS THEIR RISKS. RISK? • many others… 1 Hazard Inventory 2 Top Down 3 Bottom Up 4 Logic Tree M-05-02600-5 What is RISK? RISK: An expression of the combined SEVERITY and PROBABILITY of HARM to an ASSET. SYSTEM ASSETS* PROGRAMMATIC may be: ASSETS* may be: • Personnel • Cost • Equipment • Schedule • Productivity • Mission • Product • Performance • Environment • Constructability • …others • …others RISK is an attribute of a THREATS to ASSETS HAZARD-ASSET are called combination! HAZARDS. M-05-02600-6 Hazards are TTHHRREEAATTSS to AASSSSEETTSS HAZARDS MUST BE IDENTIFIED! …or System Safety and Risk Management cannot be practiced! HAZARDS… are best described as terse Loss Scenarios, each expressing SOURCE MECHANISM OUTCOME Thusly: “Faulty control logic producing yaw overdrive and model damage.” NOT: “Pranged wind tunnel model.” OR “Occupancy of an unventilated confined space leading to death from asphyxia.” NOT: “Running out of air.” M-05-02600-7 The Risk Plane… SEVERITY and PROBABILITY, Cataclysmic the two variables that constitute risk, R = K3 > K2 g n si define a a e k RISK PLANE. r is c R In Likely R = K2> K1 R = P x S = K1 SEVERITY RISK Iso-risk is contours CONSTANT along any ISO-RISK CONTOUR. PROBABILITY is a function of EXPOSURE 0 INTERVAL. NEVER PROBABILITY M-05-02600-8 Using ISO-Risk Contours… ACCEPTANCE: Risk Tolerance Boundaries follow iso-risk contours. B NOT ACCEPTABLE SEVERITY A PROVISIONALLY ACCEPTABLE Further Risk Reduction Note that risk ACCEPTABLE Desirable. at A equals risk at B. (de minimis) 0 PROBABILITY 0 M-05-02600-9 The Risk Plane Becomes a Matrix… Segmenting the Risk Plane into tractable cells produces a Matrix to enable using subjective judgment. SEVERITY F E D C B A PROBABILITY I Matrix cell zoning approximates II the continuous, iso-risk contours in the Risk Plane. Zones in the Matrix SEVERITY III define Risk Tolerance Boundaries. IV Jeopardy PROBABILITY M-05-02600-10 A Typical Risk Assessment Matrix*… A guide for applying subjective judgment… Severity of Consequences Probability of Mishap** Category / Personnel Equipment Down Descriptive Injury / Loss F E D C B A Word Illness $ Time Impossible Improbable Remote Occasional Probable Frequent I >4 Catastrophi Death >1M c Mo 1 Severe II 250K 2Wks Injury or to to Critical Severe 2 Illness 1M 4Mo Minor III 1k 1 Day Injury or to to Marginal Minor 3 Illness 250K 2Wks IV No Injury <1K <1 Negligible or Illness Day *Adapted from MIL-STD-882D **Life Cycle: Personnel: 30 yrs / Others: Project Life Risk Code/ Imperative to Operation requires written, Operation 1 suppress risk 2 time-limited waiver, endorsed 3 permissible Action to lower levels by management M-05-02600-11 The “Flow” of System Safety Practice… Program Initiation • Documenting the • Documenting the Eight key Performance Steps SSyysstteemmSSaaffeettyy Approach are distributed through Approach Five Major Functional •• TTaasskkss Elements of the •• SScchheedduullee •• TTeeaamm System Safety Program •• TToooollss Hazard Identification ••RReeccooggnniizziinngg&& Risk Assessment DDooccuummeennttiinngg UUnnddeerrssttaannddiinngg ••AAsssseessssiinnggMMiisshhaappRRiisskk HHaazzaarrddss HHaazzaarrddss Understanding MMaattuurriinngg Continuous Risk Drivers DDeessiiggnn Hazard Iterative ll Tracking LLiiffee CCyyccllee Risk Reduction MMoonniittoorriinngg Continuous Changes Risk Reduction Risk Acceptance ••IIddeennttiiffyyiinnggMMiittiiggaattiioonnMMeeaassuurreess ••RReessiidduuaallRRiisskk UUnnddeerrssttaannddiinngg Review & RRiisskkOOppttiioonnss ••RReedduucciinnggRRiisskkttooAAcccceeppttaabbllee Review & LLeevveell AAcccceeppttaannccee ••VVeerriiffyyiinnggRRiisskkRReedduuccttiioonn M-05-02600-12 Major System Safety Cross-Link Disciplines… • Programmatic Risk Management • The “…ilities” ISN’T RELIABILITY PROGRAMMATIC • Reliability RISK MANAGEMENT ENGINEERING treats its own • Availability ENOUGH? special classes • Maintainability USUALLY NOT! of hazards, posing risk to, • Survivability • Reliability e.g.: • Configuration Management explores the • Cost Probability of • Schedule • Procedures Preparation Success, alone. • Performance • …others… • Constructability • System Safety • …others explores the Probability of Failure AND its Severity Penalty. M-05-02600-13 Topics… • What is System Safety Engineering? • When should System Safety be used? • How is System Safety done? • Who should perform System Safety analyses? • What does System Safety Cost? • Why do System Safety? M-05-02600-14 System* Safety Application throughout Life Cycle… DON’T OVERLOOK: • Operational / Mission Phases • Maintenance • Decommissioning A typical approach: • Etc… / / N N E IO N W IN N T IO O T IO N T D A T D G U T L G P E C A E IN O A IA G I E IL N I L K T R R T IN S C A R L A S E I N E T IG B A TE P IN D N E S A ST SH O N O D E F N LA C D I P LIFE CYCLE ry to n e ) v IL In C rd / a L z H is is a (P s s H ly s ly a a a n , n A d … A e d n d n re r a e then… T s) a ) t w lt si z A a o u y r a H ic D a l o p H P d - F a / U ( n p ( n d - s i To A n m si a to ly ) t a A REVISIT the ANALYSIS when there is… o n E B A M • Modification of: (F – System Design / Architecture – Applied Stresses (service or environmental) *NOTE – Maintenance Protocol • A “Near Miss” “System” includes hardware, software, procedures, • A Loss Event (to support autopsy) training / certification, maintenance protocol, etc. — the GLOBAL System. M-05-02600-15 Comparing Two Work Models for Design-Build Efforts*… Review Reliability / Maintainability / Safety / Constructability / other “ilities” Serial “Traditional” Design Approach 30% 60% 90% Modify / Retrofit / Recover Time Effort / Calendar Saved Concurrent “Enlightened” Engineering Approach 30% 60% 90% Ongoing Analysis / Review / Crossfeeding Results • Continuous, iterative feedback of analysis results into design CONCURRENT • Earlier accommodation to findings DESIGN • Manhours and calendar time conserved RESULTS: • Fewer “surprises” / performance-threatening retrofits • Fewer awkward compromises — more coherent design *Journal of the American Society of Safety Engineers; November, 1999 M-05-02600-16 What systems benefit best by System Safety application? • Use System Safety if the system… • is complex — i.e., interrelationships among elements is not readily apparent, and/or • uses untried or unfamiliar technology, and/or • contains one or more intense energy sources — i.e., energy level and/or quantity is high, and/or • has reputation-threatening potential, and/or • falls under the purview of a mandating regulation (e.g., 29 CFR 1910.119) M-05-02600-17 Why / When use more specialized analytical techniques? Top-down Analysis (e.g., Fault Tree Analysis) and / or Bottom-up Analysis (e.g., Failure Modes and Effects Analysis) • when SYSTEM COMPLEXITY exceeds PHA capability, and/or… • to evaluate risk more precisely in support of RISK ACCEPTANCE DECISIONS, and/or… • to support DESIGN DECISIONS on matters of component selection/system architecture, etc. M-05-02600-18 Design Decisions — an example… Risk too high? Then F Go Redundant! Subsystem A B C 1 Level F Redundancy But WHERE to redundify? F1 F2 • System? • Subsystem? A B C A´ B´ C´ • Assembly? • Subassembly? • Component? Component 2 Level • Piece Part? F Redundancy Variations in system architecture, using the same components, can produce profound differences in A A´ B B´