<<

Engineering a Safer and More Secure World: Part 2 Nancy Leveson

MIT © Copyright Nancy Leveson, April 2019 Syllabus

DAY 2 • Introduction to STPA (System Theoretic Process Analysis) • Evaluations (Does this new approach work?) • The way forward

2 STPA (System Theoretic Process Analysis) • A new hazard analysis technique based on STAMP causality model • Can analyze very complex systems. – ”Unknown unknowns” usually only found during ops can be identified early in development process • Can be started early in concept analysis • Includes software and operators in the analysis • Generates requirements for safety, security, or any emergent system property • Easily integrated into system engineering process and model based system engineering. – Models are functional models rather than simply physical or logical models. STPA

4 STPA Process Defining the Purpose of the Analysis Establish Analysis Goals (Stakeholders)

• Identify losses to be considered L1. Death or serious injury to passengers or people in the area of the aircraft L2. “Unacceptable” damage to the aircraft or objects outside the aircraft L3: Financial losses resulting from delayed operations L4: Reduced profit due to damage to aircraft or airline reputation • Identify System-Level Hazards H1: Insufficient thrust to maintain controlled flight H2: Loss of integrity H3: Controlled flight into terrain H4: An aircraft on the ground comes too close to moving or stationary objects or inadvertently leaves the taxiway H5: etc. Deceleration Hazards (H4)

H4-1: Inadequate aircraft deceleration upon , rejected , or taxiing H4-2: Deceleration after the V1 point during takeoff H4-3: Aircraft motion when the aircraft is parked H4-4: Unintentional aircraft directional control (differential braking) H4-5: Aircraft maneuvers out of safe regions (taxiways, runways, terminal gates, ramps, etc.) H4-6: Main gear wheel rotation is not stopped when (continues after) the is retracted High-Level (System) Requirements/Constraints

SC1: Forward motion must be retarded within TBD seconds of a braking command upon landing, rejected takeoff, or taxiing (H4-1). SC2: The aircraft must not decelerate after V1 (H4-2). SC3: Uncommanded movement must not occur when the aircraft is parked (H4-3). SC4: Differential braking must not lead to loss of or unintended aircraft directional control (H4-4) SC5: Aircraft must not unintentionally maneuver out of safe regions (taxiways, runways, terminal gates and ramps, etc.) (H4-5) SC6: Main gear rotation must stop when the gear is retracted (H4-6)

STPA analysis will refine these into detailed requirements/constraints • On system • On components Modeling the Control Structure

Examples of Requirements/Constraints Generated on the Interaction Between Deceleration Components

• SC-BS-1: Spoilers must deploy when the wheel brakes are activated manually or automatically above TBD speed. • SC-BS-2: Wheel brakes must activate upon retraction of landing gear. • SC-BS-3: Activation of ground spoilers must activate armed automatic braking (autobrake) system. • SC-BS-4: Automatic braking system must not activate wheel brakes with forward thrust applied. • SC-BS-5: Automatic system must retract the spoilers when forward thrust is applied. Wheel Braking System Control Structure Identifying Unsafe Control Actions (UCAs) Unsafe Control Actions

Four types of unsafe control actions Controller 1) Control commands required for safety Control Process are not given Algorithm Model 2) Unsafe commands are given 3) Potentially safe commands but given too Control Feedback early, too late, or in wrong order Actions 4) Control action stops too soon or applied too long (continuous control)

Controlled Process Analysis: 1. Identify potential unsafe control actions 2. Identify why they might be given 3. If safe ones provided, then why not followed?

16 Unsafe Control Actions for Crew (Context Table)

Control Action Not providing Providing Too soon, too Stopped too By Flight Crew: causes hazard causes hazard late, out of soon, applied sequence too long CREW.1 CREW.1a1 CREW.1b1 CREW.1c1 CREW.1d1 Manual braking Crew does not Manual braking Manual braking Manual braking via brake pedals provide manual provided with applied before command is braking during insufficient touchdown stopped before landing, RTO, or pedal pressure, causes wheel safe taxi speed taxiing when resulting lockup, loss of (TBD) is Autobrake is inadequate control, tire reached, not providing deceleration burst [H4-1, H4- resulting in braking (or during landing 5] overspeed or insufficient [H4-1, H4-5] overshoot [H4- braking), 1, H4-5] leading to overshoot [H4- 1, H4-5] Unsafe Control Actions for Crew (Context Table)

Control Action Not providing Providing Too soon, too Stopped too By Flight Crew: causes hazard causes hazard late, out of soon, applied sequence too long CREW.1 CREW.1a1 CREW.1b1 CREW.1c1 CREW.1d1 Manual braking Crew does not Manual braking Manual braking Manual braking via brake pedals provide manual provided with applied before command is braking during insufficient touchdown stopped before landing, RTO, pedal pressure, causes wheel safe taxi speed or taxiing when resulting lockup, loss of (TBD) is Autobrake is inadequate control, tire reached, not providing deceleration burst [H4-1, resulting in braking (or during landing H4-5] overspeed or insufficient [H4-1, H4-5] overshoot [H4- braking), 1, H4-5] leading to overshoot [H4- 1, H4-5]

What is another UCA for “providing”? Unsafe Control Actions for Crew (Context Table)

Control Action Not providing Providing Too soon, too Stopped too By Flight Crew: causes hazard causes hazard late, out of soon, applied sequence too long CREW.1 CREW.1a1 CREW.1b1 CREW.1c1 CREW.1d1 Manual braking Crew does not Manual braking Manual braking Manual braking via brake pedals provide manual provided with applied before command is braking during insufficient touchdown stopped before landing, RTO, or pedal pressure, causes wheel safe taxi speed taxiing when resulting lockup, loss of (TBD) is Autobrake is inadequate control, tire reached, not providing deceleration burst [H4-1, H4- resulting in braking (or during landing 5] overspeed or insufficient [H4-1, H4-5] overshoot [H4- braking), 1, H4-5] leading to Manual braking overshoot [H4- after V1 point 1, H4-5] resulting in … Unsafe Control Actions by Autobraking

Control Action Not providing Providing Too soon, too Stopped too by BSCU causes hazard causes hazard late, out of soon, applied sequence too long

BSCU.1 BSCU.1a1 BSCU.1b1 BSCU.1c1 BSCU.1d1 Brake command Brake command Braking Braking Brake command not provided commanded commanded stops during during RTO (to excessively before landing roll V1), resulting in during landing touchdown, before taxi inability to stop roll, resulting in resulting in tire speed attained, within available rapid burst, loss of causing reduced length deceleration, control, injury, deceleration [H4-1, H4-5] loss of control, other damage [H4-1, H4-5] occupant injury [H4-1, H4-5] [H4-1, H4-5] Unsafe Control Actions by Autobraking

Control Action Not providing Providing Too soon, too Stopped too by BSCU causes hazard causes hazard late, out of soon, applied sequence too long

BSCU.1 BSCU.1a1 BSCU.1b1 BSCU.1c1 BSCU.1d1 Brake command Brake Braking Braking Brake command not commanded commanded command stops provided excessively before during landing during RTO (to during landing touchdown, roll before taxi V1), resulting in roll, resulting in resulting in tire speed attained, inability to stop rapid burst, loss of causing within available deceleration, control, injury, reduced runway length loss of control, other damage deceleration [H4-1, H4-5] occupant injury [H4-1, H4-5] [H4-1, H4-5] [H4-1, H4-5]

What is another UCA for “too late”? Unsafe Control Actions by Autobraking Control Action Not providing Providing Too soon, too Stopped too by BSCU causes hazard causes hazard late, out of soon, applied sequence too long BSCU.1 BSCU.1a1 BSCU.1b1 BSCU.1c1 BSCU.1d1 Brake command Brake command Braking Braking Brake command not provided commanded commanded stops during during RTO (to excessively before landing roll V1), resulting in during landing touchdown, before taxi inability to stop roll, resulting in resulting in tire speed attained, within available rapid burst, loss of causing reduced runway length deceleration, control, injury, deceleration [H4-1, H4-5] loss of control, other damage [H4-1, H4-5] occupant injury [H4-1, H4-5] [H4-1, H4-5] BSCU.1.c.2: Braking commanded too late after touchdown resulting in … STPA-Generated Safety Requirements/Constraints

Unsafe Control Description Rationale Action FC-R1 Crew must not provide manual Could cause wheel lockup, braking before touchdown [CREW.1c1] loss of control, or tire burst FC-R2 Crew must not stop manual braking Could result in overspeed or more than TBD seconds before safe runway overshoot taxi speed reached [CREW.1d1] FC-R3 The crew must not power off the Autobraking will be BSCU during autobraking [CREW.4b1] disarmed BSCU-R1 A brake command must always be Could result in not stopping provided during RTO [BSCU.1a1] within the available runway length BSCU-R2 Braking must never be commanded Could result in tire burst, before touchdown [BSCU.1c1] loss of control, injury, or other damage BSCU-R3 Wheels must be locked after takeoff Could result in reduced and before landing gear retraction handling margins from [BSCU.1a4] wheel rotation in flight Identifying Loss Scenarios UNSAFE CONTROL ACTION – CREW.1a: Crew does not provide manual braking when there is no Autobraking and braking is necessary to prevent H4-1 and H4-5. Scenario 1: Crew incorrectly believes that the Autobrake is armed and expect the Autobrake to engage (process model flaw) Reasons that their process model could be flawed include: • The crew previously armed Autobrake and does not know it later became unavailable AND/OR • The feedback received is adequate when the BSCU Hydraulic Controller detects a fault. The crew would be notified of a generic BSCU fault but they are not notified that Autobrake is still armed (even though Autobraking is no longer available)

25 AND/OR • The crew is notified that the Autobrake controller is still armed and ready, because the Autobrake controller does not detect when the BSCU has detected a fault. When the BSCU detects a fault, it closes the green shut-off valve (making Autobrake commands ineffective), but the Autobrake system itself does not notify the crew. • The crew cannot process feedback due to multiple messages, conflicting messages, alarm fatigue, etc.

Possible new requirements for S1: The BSCU hydraulic controller must provide feedback to the Autobrake when it is faulted and the Autobrake must disengage (and provide feedback to crew). Other requirements may be generated from a human factors analysis of the ability of the crew to process the feedback under various worst-case conditions.

26 Generate Potential Causal Scenarios

Crew: Crew provides manual braking after V1, leading to … Scenario 1: Crew thinks …

Possible Requirement for S1: … BSCU.1a2: Brake command not provided during landing roll, resulting in insufficient deceleration and potential overshoot Scenario 1: Autobrake believes the desired deceleration rate has already been achieved or exceeded (incorrect process model). The reasons Autobrake may have this process model flaw include: – If wheel speed feedback influences the deceleration rate determined by the Autobrake controller, inadequate wheel speed feedback may cause this scenario. Rapid pulses in the feedback (e.g. wet runway, brakes pulsed by anti-skid) could make the actual aircraft speed difficult to detect and an incorrect aircraft speed might be assumed. – Inadequate external speed/deceleration feedback could explain the incorrect Autobrake process model (e.g. inertial reference drift, calibration issues, sensor failure, etc.) – [Security related scenarios, e.g., intruder changes process model] Possible Requirement for S1: Provide additional feedback to Autobrake to detect aircraft deceleration rate in the event of wheel slipping (e.g. fusion of multiple sensors) Generate Potential Causal Scenarios

Crew 1a1: BSCU provides autobraking too late after touchdown resulting in overshoot, … [H4-1, H4-5] Scenario 1: Autobrake process model …

Possible Requirement for S1: … Finally …

• Need to consider how hazard could occur even if controller operates correctly, but control action not executed. • This is essentially the case of failed components, which is found by standard hazard analysis methods • Example: Anti-skid valve fails

Why might a braking command be given but not executed?

30 Defining the Purpose of the Analysis Chemical Reactor Example

Requirements: • Produce product (add chemicals and catalyst to reactor) • Monitor plant status

Mishaps?

Hazards?

Constraints? Define Mishaps, Hazards, Safety Constraints • System Requirements – Produce product (add chemicals and catalyst to reactor) – Monitor plant status • Mishaps – M-1: Explosion – M-2: Chemical ingestion by human – M-3: Environmental pollution • Hazards – H-1: Overheating/overpressurization of reactor (M-1) – H-2: Release of chemicals within or outside plant (M-2, M-3) • Safety Constraints/Requirements – SC1: Pressure/temperature in reactor must stay in safe range (H-1) – SC2: Chemicals must not be released outside plant boundaries (H-2) – SC3: If chemicals are released, damage must be mitigated (H-2) Safety-Guided Design

Safety Constraints/Requirements SC1: Pressure/temperature in reactor must stay in safe range (H-1) SC2: Chemicals must not be released outside plant boundaries (H-2) SC3: If chemicals are released, damage must be mitigated (H-2)

Original hazards: H-1: Overheating/overpressurization of reactor (M-1) H-2: Release of chemicals within or outside plant (M-2, M-3)

Refined hazards? Original hazards: H-1: Overheating/overpressurization of reactor (M-1) H-2: Release of chemicals within or outside plant (M-2, M-3)

Refined hazards: • H-1b: Catalyst in reactor (reactor operating) when reflux condenser not operating (H-1a); • H-2b: Relief valve does not reduce pressure (H-1b) Modeling the Control Structure

Can add the process models

Control Structure: Identifying Unsafe Control Actions (UCAs) Four Ways Unsafe Control Can Occur

• Providing the control action leads to a hazard • Not providing the control action leads to a hazard • Timing or sequencing of control actions leads to a hazard • Duration (too long, too short) leads to a hazard

Identify conditions (context) under which control action can lead to a hazard

Control Not providing Providing Too early/too Stopped too Action causes hazard causes late, wrong soon/ applied hazard order too long

UCA Context Table for: Catalyst in reactor when reflux condenser not operating (H-1a)

Stopped Too Incorrect Soon / Not providing Providing Timing/ Applied too causes hazard causes hazard Order long Open water Water valve not Open Water more than X Stop before opened when Valve seconds after fully opened catalyst open open catalyst Close Water Valve

Open Catalyst Valve

Close Catalyst Valve

UCA Context Table for: Catalyst in reactor when reflux condenser not operating (H-1a)

Stopped Too Incorrect Soon / Not providing Providing Timing/ Applied too causes hazard causes hazard Order long Open water Water valve not Open Water more than X Stop before opened when Valve seconds after fully opened catalyst open open catalyst Close Water Valve

Open Catalyst Valve

Close Catalyst Valve Hazard: Catalyst in reactor without reflux condenser operating (water flowing through it)

Control Not providing Providing Too early/too late, Stopped too Action causes hazard causes wrong order soon/ applied hazard too long Open Not opened Open water more Stop before water when catalyst than X seconds after fully opened open (H-1) open catalyst Close Close while Close water before water catalyst catalyst closes open Open Open when Open catalyst more catalyst water valve than X seconds not open before open water Close Do not close Close catalyst more Stop before catalyst when water than X seconds after fully closed closed close water Hazard: Catalyst in reactor without reflux condenser operatingIs this (water safety flowing or security? through it)

Control Not providing Providing Too early/too late, Stopped too Action causes hazard causes wrong order soon/ applied hazard too long Open Not opened Open water more Stop before water when catalyst than X seconds after fully opened open (H-1) open catalyst Close Close while Close water before water catalyst catalyst closes open Open Open when Open catalyst more catalyst water valve than X seconds not open before open water Close Do not close Close catalyst more Stop before catalyst when water than X seconds after fully closed closed close water

STPA generates the following high-level requirements on the batch reactor:

• Water valve must always be fully open before catalyst valve is opened. – Water valve must never be opened (complete opening) more than X seconds after catalyst valve opens • Catalyst valve must always be fully closed before water valve is closed. – Catalyst valve must never be closed more than X seconds after water valve has fully closed.

Next step is to identify scenarios leading to the unsafe control actions (violation of safety requirements) and eliminate or mitigate them

Identifying Loss Scenarios Generating Causal Scenarios

1. Identify causes of the hazardous control actions (why hazardous control actions given)

2. Identify causes for a required control action (e.g., open water valve) being given by the controller but not executed

3. What design features (controls) might you use to protect the system from the scenarios you found? Identifying Causal Scenarios

Control input or external information wrong or Controller missing Missing or wrong Controller communication with Inadequate Control Process Model another controller Algorithm (inconsistent, (Flaws in creation, Inadequate or Inappropriate, incomplete, or missing feedback ineffective, or process changes, incorrect) missing control incorrect modification Feedback Delays action or adaptation)

Actuator Sensor Inadequate Inadequate operation operation Incorrect or no information provided Delayed operation Measurement inaccuracies Controller Controlled Process Feedback delays

Conflicting control actions Component failures Changes over time Process output Process input missing or wrong contributes to Unidentified or out- system hazard of-range disturbance 60 Causes of Unsafe Control Actions

• Identify causes of the hazardous control actions: Open catalyst valve when water valve not open

– HINT: Consider how controller’s process model could identify that water valve is open when it is not.

• What design features (controls) might you use to protect the system from the scenarios you found? Some Reasons for Incorrect Process Model

• Previously issued an Open Water Valve command but valve did not open (jammed, failed, etc.) • Assumed that command had been executed. Why? i. No feedback about effect of previous command (Control: put feedback in design)

ii. Feedback not received. [could go on to determine why here if want] (Control: Assume not executed)

iii. Feedback delayed (could go on to determine why if want) (Control: wait predetermined time and then assume not opened)

iv. Incorrect feedback received. Why? (maybe assumed that if reached valve, it would open [design error] (Control: add flow meter to detect water flow through pipe)

etc. Generates Potential New Requirements:

• Include feedback for Open Valve and Close Valve commands.

• Software shall check for feedback after issuing an Open/Close command. If not received in a specified time period, then assume valve not opened or closed and …

• There must be feedback to controller to determine that water is actually flowing through pipe before issuing an Open Catalyst command (perhaps use a flow monitor).

• …

Causes of Command Provided but not Executed

• Often caused by component failures • This is where STPA finds the same causes as the traditional hazard analysis methods • What are some causes of an open water valve being given but not executed or executed too late? • What are some design solutions to prevent these causes?

Could do same for human operator

Control Structure:

© Copyright John Thomas 2013 An Actual Mishap (you found the problem) STPA Extension to (Better) Handle Human Factors Megan France Dr. John Thomas

• Adds more sophisticated human error analysis to STPA

• Suggests engineering solutions; does not just identify problems

• Can be used earlier in design process than detailed simulations and prototypes

• Provides a “common language” for engineers to collaborate across domains Slide from presentation by Mark Vernacchia, GM Some Final Thoughts on STPA STPA can be used throughout product development and operations STPA-Sec Tailored Approach Capt. Martin “Trae” Span, AFIT, Aerial Refueling Concept (KC-X)

96 Safety Management System (SMS): Identifying Leading Indicators Operational Inputs

Active STPA Hazard/Risk (Hazard Management Analysis)

Prevention and Mitigation

Pre

97 A Standard Version of the Risk Matrix

• Assumes Risk = f (severity, likelihood) A Systems Approach to Safety and Security

• Emphasizes building in safety rather than measuring it or adding it on to a nearly completed design • Looks at system as a whole, not just components (a top-down holistic approach) • Takes a larger view of causes than just failures – Accidents today are not just caused by component failures – Includes software and requirements flaws, human behavior, design flaws, etc. • Goal is to use modeling and analysis to design and operate the system to be safe/secure, not to predict the likelihood of a loss. System Engineering Benefits

• Finds faulty underlying assumptions in concept development before flow downstream as anomalies (where more costly to change) – 70-80% of safety-critical decisions made during concept development

• Finds incomplete information, basis for further discussion with customer

• Both intended and unintended functionality are handled

• Includes software and operators in the analysis – Provides deeper insight into system vulnerabilities, particularly for cyber and human operator behavior. System Engineering Benefits (2)

• Can analyze very complex systems. – ”Unknown unknowns” usually only found during ops can be identified early in development process

• Can be started early in concept analysis – Assists in identifying safety/security requirements before architecture or design exists – Then used to design safety and security into system, eliminating costly rework when design flaws found later. – As design is refined and more detailed design decisions are made, STPA analysis is refined to help make those decisions

• Complete traceability from requirements to system artifacts – Enhances maintainability and evolution System Engineering Benefits (3)

• Models developed for the analysis provide documentation of system functionality (vs. physical or logical design) – Often missing or difficult to find in documentation for large, complex systems • Safety engineers can act as facilitators while design engineers participate in the analysis – Integrate safety engineering into design engineering Is it Practical? • STPA has been or is being used in a large variety of industries – Automobiles (>80% use) – Aircraft and Spacecraft (extensive use and growing) – Air Traffic Control – UAVs (RPAs) – Defense systems – Medical Devices and Hospital Safety – Chemical plants – Oil and Gas – Nuclear and Electric Power – Robotic Manufacturing / Workplace Safety – Finance • International standards in development or STPA already

included (already satisfies MIL-STD-882) 103 Evaluations and Estimates of ROI

• Hundreds of evaluations and comparison with traditional approaches used now – Controlled scientific and empirical (in industry) – All show STPA is better (identifies more critical requirements or design flaws) – All (that measured) show STPA requires orders of magnitude fewer resources than traditional techniques • ROI estimates only beginning but one large defense industry contractor claims they are seeing 15-20% return on investment when using STPA Ballistic Missile Defense System (MDA)

• Hazard was inadvertent launch • Analyzed right before deployment and field testing (so done late) – 2 people, 5 months (unfamiliar with system) – Found so many paths to inadvertent launch that deployment delayed six months • One of first uses of STPA on a real defense system (2005)

Sea-based sensors on the Aegis platform, upgraded early warning radars (UEWR), the Cobra Dane Upgrade (CDU), Ground-based Midcourse Defense (GMD) Fire Control and Communications (GFC/C), a Command and Control Battle Management and Communications (C2BMC) Element, and Ground-based interceptors (GBI). Future block upgrades were originally planned to introduce additional Elements into the BMDS, including Airborne Laser (ABL) and Terminal High Altitude Area Defense (THAAD). Control Structure for FMIS

Command Early Warning Radar Authority System

Exercise Results Radar Tasking Readiness Status Request Readiness Mode Change Status Status Request Wargame Results Status Launch Report Doctrine Track Data Engagement Criteria Status Report Training Heartbeat TTP Workarounds Engage Target Operational Mode Change Readiness State Change Weapons Free / Weapons Hold

Operators Fire Control Operational Mode Readiness State System Status Track Data Weapon and System Status Fire DIsable Abort Launcher Fire Enable Command Responses Arm Operational Mode Change System Status BIT Command Readiness State Change Launch Report Task Load Launch Position BIT Results Interceptor Tasking Launch Stow Position Launcher Position Task Cancellation Operating Mode Perform BIT Power Safe Software Updates

Interceptor Launch Station Simulator

Abort Arm Acknowledgements BIT Command Acknowledgements BIT Results Task Load BIT Results Health & Status Launch Health & Status Operating Mode Power Flight Safe Computer Software Updates

Breakwires Safe & Arm Status Arm Voltages BIT Info Safe Safe & Arm Status Ignite

8/2/2006 Interceptor 106 H/W Example Hazard Scenarios Found

• Missing software requirements, for example: – Operator could input a legal (but unanticipated) instruction at same time that radars detect a potential (but not dangerous) threat – Could lead to software issuing an instruction to enable firing an interceptor at a non-threat • Timing conditions that could lead to incorrectly launching an interceptor

• Situations in which simulator data could be taken as real data Navy Escort Vessels (Lt. Blake Abrecht)

• Dynamic positioning system • Ran into each other twice during test • Performed a CAST analysis (on two incidents) and STPA on system as a whole • STPA found scenarios not found by MIL-STD-882 analysis (fault trees and FMEA) • Navy admiral rejected our findings saying “We’ve used PRA for 40 years and it works just fine” • Put into operation and within 2 months ran into a submarine • Scenario was one we had found EPRI Evaluation

• Provided same design of a nuclear power plant safety system to everyone • Independent and expert teams did: FTA, ETA, FMEA, HAZOP, etc. and we did STPA (two students, two weeks) • After submitting final analyses, teams were told that there had been a very serious event in plant with that design • Only STPA found the scenario that occurred Future Vertical Lift for Army (Lt. David Horning)

• Both MIT and Boeing independently identified safety/security requirements early in concept development

• Used to evaluate the CONOPS

• Could it be used to create the CONOPS? UH-60MU (Blackhawk)

• Analyzed Warning, Caution, and Advisory (WCA) system

• STPA results were compared with an independently conducted hazard analysis of the UH-60MU using traditional safety processes described in SAE ARP 4761 and MIL-STD-882E. – STPA found the same hazard causes as the traditional techniques and – Also identified things not found using traditional methods, including design flaws, human behavior, and component integration and interactions STPA in Small Unmanned UAS testing at EDW (Lt. Sarah Folse and Maj. Sarah Summers)

• System safety requirements and recommendations to accommodate SUAS testing at EDW • Range Extender System for Electric Vehicles (Valeo) – FTA/CPA took 3 times effort of STPA, found less

• Medical Device (Class A recall)

FMECA STPA 70+ causes of accidents 175+ causes accidents (9 related to adverse event)

Team of experts Single semi-expert Time dedication: months/years) Time: weeks/month

Identified only single fault causes Identified complex causes of accidents Some Uses Beyond Traditional System Safety

• Airline operations (leading indicators of increasing risk) • Design and analysis of safety management systems • Cybersecurity • Quality • Producibility • Design for safe manufacturing/assembly • Production engineering • System engineering process optimization • Organizational culture • Workplace safety • Banking and finance • Criminal law To Make Progress We Need to:

• Develop and use different approaches that match the world of engineering today • Consider the entire sociotechnical system • Focus on building safety/security in rather than assuring/measuring it after the design is completed “The best way to predict the future is to create it.” Abraham Lincoln • Develop and use new approaches to certification, regulation, risk management, and risk assessment Nancy Leveson, Engineering a Safer World: Systems Thinking Applied to Safety

NANCY G. LEVESON JOHN P. THOMAS MARCH 2018

MIT Press, January 2012

Papers, examples, and more information at http://psas.scripts.mit.edu/home/ BACKUP Operator “Error” Not Just a Random Failure

EPRI Nuclear Power Plant Controlled Experiment • Compared FTA, FMEA, ETA, HAZOP and STPA • Two graduate students spent 2 weeks on this • Only STPA found accident that had occurred in plant but analysts did not know about

• Embraer Aircraft Smoke Control System requirements captured by STPA U.S. Air Force Flight Test • Automotive Electric Power Steering System • HTV Unmanned Japanese Spacecraft – STPA found all causes found by FTA plus a lot more