Review of Root Causes of Accidents Due to Design
EUROPEAN ORGANISATION FOR THE SAFETY OF AIR NAVIGATION
EUROCONTROL
EUROCONTROL EXPERIMENTAL CENTRE
REVIEW OF ROOT CAUSES OF ACCIDENTS DUE TO DESIGN
EEC Note No. 14/04
Project Safbuild
Issued: October 2004
The information contained in this document is the property of the EUROCONTROL Agency and no part should be reproduced in any form without the Agency’s permission. The views expressed herein do not necessarily reflect the official views or policy of the Agency.
REPORT DOCUMENTATION PAGE
Reference: Security Classification: EEC Note No. 14/04 Unclassified Originator: Originator (Corporate Author) Name/Location: EEC – SAS EUROCONTROL Experimental Centre (Safety Analysis and Scientific) Centre de Bois des Bordes B.P.15 F - 91222 Brétigny-sur-Orge CEDEX FRANCE Telephone : +33 (0)1 69 88 75 00 Sponsor: Sponsor (Contract Authority) Name/Location: EATM EUROCONTROL Agency Rue de la Fusée, 96 B -1130 BRUXELLES Telephone : +32 2 729 9011 TITLE:
REVIEW OF ROOT CAUSES OF ACCIDENTS DUE TO DESIGN
Authors Date Pages Figures Tables Appendix References Alfred Roelen, (NLR) Steve Kinnersly, (AEA) 10/04 viii+41 1 1 3 43 Fabrice Drogoul
EEC Contact Project Task No. Sponsor Period
120000-SRD-3-E1- Fabrice Drogoul Safbuild 2003 0000 Distribution Statement: (a) Controlled by: Head of SRT (b) Special Limitations: None
Descriptors (keywords):
Design, root causes of accidents, ATM, nuclear, rail, aviation
Abstract:
This document gives the results of a study of the proportion of accidents that have their root causes in design. In particular, it investigates the validity of the claim that 60% of the root causes of accidents arise in the design stages.
FOREWORD
This document gives the results of a review study from different industry databases of the proportion of accidents that have their root causes in design. The review has been carried out as part of the Eurocontrol SafBuild project. SafBuild aims to establish, within the context of EEC strong designs that are resistant to failure the first place, or else tolerant to it. SAFBUILD aims to support designers in their safety role, to develop a process and supporting tools for integrating safety into the design throughout the design life cycle, leading to a safer and more resilient ATM system. A necessary part of SafBuild is to determine the extent to which accidents are caused by design in order to provide a foundation for building safety into design. The study reviewed accident and incident data and reports from the aviation, railway and nuclear industries. Where the data or reports indicated that design was a root cause, this was taken as adequate evidence. In cases where the data or reports did not give root causes, or where the root causes were investigated or categorised in a way that precluded the possibility of identifying design as a root cause, a judgement was made according to the available evidence. Overall, the results show 50% of all accidents or incidents having a root cause in design. The proportions for the aviation and nuclear industries (51% and 46% respectively) are remarkably similar. In most cases, causes other than design also contributed to the accident or incident. Typical types of design-related oversight that are root causes of accidents are : • Use outside of the design envelope, • Changes of operational context, • Failure of defence in depth, • Misconceptions between designers and operators, • Unexpected failure mechanisms. • Incorrect functioning leading to mistrust of safety system by operator
Barry Kirwan Head of SRT
v
Page intentionally left blank
vi
CONTENTS
FOREWORD ...... v
1. Introduction ...... 1
2. Purpose of the Study ...... 1
3. DESIGN stage ...... 1
4. OTHER Definitions ...... 2
5. Scope and Content of the Review...... 3 5.1 Overall research approach...... 3 5.2 Existing reviews ...... 3 5.3 Aviation ...... 5 5.4 Nuclear ...... 6 5.5 Railways ...... 6
6. Findings of the Review ...... 7 6.1 Aviation ...... 7 6.2 Nuclear ...... 7 6.3 Railways ...... 8 6.4 Discussion ...... 8
7. Conclusions...... 11
8. Abbreviations and acronyms ...... 12
9. References...... 13
Appendix A: List of aviation accidents...... 17
Appendix B: List of Nuclear incidents ...... 31
Appendix C : Railway Accidents and Incidents ...... 39
vii
Page intentionally left blank
viii
1. INTRODUCTION
The Eurocontrol Experimental Centre (EEC) works to improve safety in Air Traffic Management (ATM) by better integration of safety in EEC projects, by provision of better safety methods for internal and external use, and by the development and implementation of safety learning processes in the ATM industry. An important element of these safety learning processes is an awareness of the potential impact of EEC projects on aviation safety. The aviation industry has been concentrating a lot of effort on the technical improvement of systems, but despite this, technical failures and failures that are contributed by the design still occur. While the Eurocontrol Experimental Centre is involved in many ATM system design projects, typically a lot of these projects do not lead directly to operational systems. Many are research projects that lead to prototypes for future development. Because of the specific role of EEC in the ATM system design process, it may not be clear how these research activities have an impact on aviation safety. It is essential that EEC be aware of these potential impacts to be able to build safety into the design process. Design is complex, so is safety - what looks easy on paper may not be so in practice. One of the tasks of SAFBUILD is to determine what role design plays in accidents, and the results of this task are described in the underlying report.
2. PURPOSE OF THE STUDY
The purpose of this study is to determine how the root causes of accidents can arise from the design stages of a project and to determine the implications for EEC projects that include design. There is a lack of clarity on the precise meaning or scope of the term ‘design’. After a clarification on this topic the study reviews the root causes of accidents from databases and expert experience from a range of relevant industries.
3. DESIGN STAGE
‘Design’ is the process by which detailed specifications sufficient for unambiguous production of an entity (e.g. software item; hardware; visual display; subsystem; system) are developed to satisfy the concepts, requirements, assumptions and constraints for that entity. It could be difficult to know who the designers really are. The design work is a diffuse distributed process. A lot of people involved in the design might not consider themselves as designer even if they are part of the activity. To clarify the position it has been decided to refer to design activity rather than designers. Another point that needs to be clarified is the coverage given to the term design. For engineers design is often considered as one stage after requirement capture. In our safety point of view we consider design to range from concept development and requirement capture to the point of fabrication/construction.
1
For the sake of the study Design does not include the generation of top-level concepts, requirements, assumptions and constraints for an entity. However, the design process may: • Generate lower level concepts, requirements, assumptions and constraints; • Cause top-level concepts, requirements, assumptions and constraints to be questioned and modified. Design does not include the production of the entity (e.g. software coding; production of hardware; etc.).
4. OTHER DEFINITIONS
For the purpose of this study, a definition of safety will be used that has been adopted by international organisations such as the International Standards Organisation (ISO) and the International Electrotechnical Commission (IEC). It is included in ISO/IEC Guide 51 (second edition) and is the definition of safety used by IEC 61508, the cross-industry international standard for the functional safety of electrical/electronic/programmable electronic systems.
The specific definition used in ISO/IEC Guide 51 and IEC 61508 is given below. ‘Safety’ is freedom from unacceptable risk.
‘Risk’ is a combination of the probability of occurrence of harm and the severity of the harm.
‘Harm’ is physical injury or damage to the health of people either directly, or indirectly as a result of damage to property or the environment.
The definition of accident that is used in this research project is the following: ‘Accident’ is defined as any incident that caused harm or is likely to have caused harm in closely related circumstances.
According to this definition, events such as a train passing a signal at danger are classified as accidents, regardless of whether this event led to damage and/or injury. The root cause of such an event is not influenced by the result of the event. The use of this definition allows a comparison with e.g. the nuclear industry, where damage and injury is extremely rare. ‘Root cause in design’ Accidents typically involve a combination of errors and faults. It is in many cases not possible to assign a single, unambiguous root cause. A root cause is a condition which is necessary for an accident. Whereas each root cause is necessary, only together they are sufficient for the accident to occur.
The approach used in this study is inclusive: if at least one credible root cause of an accident lies in the design stages then the accident is considered to have had a root cause in the design stages.
2
5. SCOPE AND CONTENT OF THE REVIEW
5.1 Overall research approach The issue of building safety into design has been important for regulators and industry for many years. Studies into the causes of accidents have been carried out both by regulators and the more highly regulated industries such as aviation, rail, and the nuclear industry. The research approach for determining the ratio for root causes of accident that are in the design stages was to start with these studies and then to investigate specific domains in more detail if (a) data were available and (b) they were relevant to the EEC context.
Detailed investigations were carried out for aviation, nuclear and railways by analysing accident and incident reports.
5.2 Existing reviews
Eurocontrol in their introduction to the EATMP Safety Management Guidelines has provided data on design relation with accident/incident causes. These cite: • 44% during specification • 15% during design and development • 6% installation and commissioning • 15% in maintenance • 20% changes after commissioning (See figure 1)
From this it can be seen that 59% are claimed to arise during the specification and design and development phases. This slightly richer picture is in keeping with the finding in other areas1, e.g. root causes of failure in project management identifies that the largest single area of problems is in the requirements identification and documentation phases.
This view of 44% in requirements and specification and 15% in design and development significantly changes the perception of the actions which may be required to ensure improvement placing some emphasis on the specification processes.
1 The first interim report from the AEA Technology study also supports a similar view.
3
figure 1: EATMP accident/incident causes by design stage
5.2.1 UK Health and Safety Executive HSE’s publication ‘Out of Control’ publicised the details of 34 incidents from a wide range of industries that had been reported to the Heath and Safety Executive. The main purpose of the study was to raise awareness of the technical causes of control system failures. A simple classification scheme, based on a safety life cycle, was devised for analysing the causes of the incidents. The scheme includes the following phases: 1. Safety requirements specification 2. Design and implementation 3. Installation and commissioning 4. Operation and maintenance 5. Changes after commissioning
In the HSE publication, a judgement is made as to which project phase was predominant in the cause of the incident. The analysis concludes that 44 % of incidents were caused by inadequate specification, and a further 15% were caused by inadequate design and implementation. This means that according to the definition of design provided in chapter 3, a total of 59% of the cases has inadequate design as a causal factor.
4
5.2.2 Boeing Commercial Aircraft Boeing has examined 232 commercial jet aircraft accidents over a 10-year period (1982-1991) to develop recommendations for accident prevention strategies. The accident data base excluded events involving sabotage, suicide, terrorism, experimental testing, or direct military action. Accidents to aircraft built in the former USSR were not included. Non-flying accidents (e.g. pushback accidents) were also not included. The sequence of events was established for each accident, and events were grouped using 37 categories for accident prevention strategies. An accident prevention strategy was defined as an action which, had it been utilised, would have broken the chain of events and prevented the accident. One of the accident prevention strategies is ‘design improvement’. If a currently practicable design improvement could be envisioned that might reasonably or significantly reduce the likelihood of a future similar accident, then ‘design improvement’ was assigned as an accident prevention strategy.
The Boeing study concluded that design improvement was an accident prevention strategy in 20% of all cases.
5.3 Aviation
5.3.1 Accident inclusion criteria. Several criteria were used to establish the final accident sample analysed in this investigation. The basic inclusion criteria were as follows:
(a) The accident resulted in 1 or more fatalities. Fatal accidents usually result in a thorough and detailed investigation and ditto investigation report.
(b) The accident involved aircraft with the following characteristics: • Fixed wing aircraft – helicopters are not considered • Aircraft heavier than 5,700 kg.
(c) The accident flight had the following characteristics: • Engaged in public transport • World-wide – no restriction on geographical location • Both scheduled and unscheduled flights • Cargo-, passenger-, positioning- and functional test flights • Both international and domestic flights.
(d) The accidents occurred during the time frame 1994 through 2002. This time frame is considered large enough to provide an acceptable number of accidents for this study, while at the same time the data is applicable to present day aviation.
(e) Comprehensive accident investigation information is available. The amount of detail that is needed for this analysis requires comprehensive accident investigation information. In practice, the required level of detail is only
5
found in (preliminary) accident investigation reports from the official accident investigation boards.
Application of the above criteria resulted in a sample of 35 accidents presented in Appendix A.
Note: While this study was limited to analysing fatal accidents, this does not mean that design does not play a role in non-fatal accidents or incidents. As a matter of fact, it can be expected that design problems play an equally important role in accidents and incidents.
5.3.2 Accident analysis protocol The general procedure for analysing each accident included a review of the accident investigation report. The information was then interpreted on the presence of design issues as contributing or causal to the accident, using the definition of ‘design’ provided above. In determining whether an issue was causal or contributing to an accident, the findings of the accident investigation report were strictly adhered to, no ‘own’ interpretation of the cause of the accident was made.
5.4 Nuclear A statement on incidents at nuclear installations in Britain which meet Ministerial reporting criteria, is reported to the Secretary of State for Trade and Industry and the Secretary of State for Scotland and is published every quarter by the Nuclear Safety Directorate of the Health and Safety Executive (HSE). The statement includes summary incident reports. All published incident reports covering the period September 2001 – June 2003 were included in the analysis. This resulted in a sample of 13 incidents presented in Appendix B.
5.5 Railways Extensive and detailed data for accidents and incidents are available for railways. Accidents are the subject of detailed investigations into (among other things) root causes, including aspects of equipment and track design that may have contributed to the accident. There are, however, few accidents so meaningful statistics on the contribution due to design cannot be obtained.
There are many more incidents than accidents. Among incidents, the most important for passenger safety are signals passed at danger (SPADs). In the UK alone there are over 100 SPADs per year that either cause harm or damage or where the train goes outside the area protected by the signalling system and there is therefore the possibility of harm or damage if circumstances had been different (e.g. another train approaching). In principle, the number of SPADs would permit meaningful statistics to be derived. However, SPADs are usually investigated and analysed mainly in terms of immediate operational causes such as ‘failure to locate signal’, which can be addressed by operational measures such as procedures or training. The deeper reasons for the operational cause, for instance design features that make it difficult for a driver to respond quickly to a signal at danger, are not investigated - unless, of
6
course, the SPAD leads to an accident. Thus, it is not possible to assess quantitatively the contribution of design to SPADs.
Insights into how design contributes to railway accidents and incidents can, however, be obtained from investigations into accidents and from the introduction in the UK of a new safety system (Train Protection and Warning System – TPWS) to reduce the risk from SPADs. Appendix C gives relevant details of three recent fatal railway accidents and experience of the introduction of TPWS into the UK. Note that Appendix C only covers accidents in which design contributed either to the cause or severity.
6. FINDINGS OF THE REVIEW
6.1 Aviation Of the 35 accident investigation reports investigated in this analysis, 18 reports (51%) mention design of the aircraft, design of ATM systems or design of the airport as causal or contributing to the accident.
• In 15 cases (43% of the sample), aircraft design was a causal or contributing factor. • In 2 cases (6% of the sample), Airport design was a causal or contributing factor. • In 1 case (3% of the sample), ATM systems design was a causal or contributing factor.
In most cases, aircraft accidents can not be attributed to a single cause. The observation that design issues were causal or contributing to the accident does not mean that other factors (such as human factors, operational procedures, maintenance issues, etc) did not play a role in the chain of events that resulted in the accident.
6.2 Nuclear Of the 13 incidents investigated in this analysis, 6 reports (46%) are design related. In 2 cases (15%) the Regulator required design changes to be made to the nuclear installation, In 4 cases (30%) alternative measures to prevent similar incidents have been proposed.
Similar to aviation, in most cases nuclear incidents cannot be attributed to a single cause.
7
6.3 Railways The available data does not permit statistically valid conclusions to be drawn regarding the proportion of accidents or incidents due to design. However, examination of the cases given in Appendix C shows that design was a significant factor in some recent major accidents, both as a cause of the accident and its severity (e.g. passenger survivability). Data from the introduction of a new safety system (TPWS) in the UK shows that design is a contributory factor to this safety system being ineffective on 15% of the occasions it was invoked in a hazardous situation. This is because previous incorrect functioning has lead to driver mistrust.
As with aviation and nuclear, accidents and incidents cannot usually be attributed to a single cause.
6.4 Discussion The results of the analysis of accidents and incidents in the air transport and nuclear industry carried out in this investigation show that design contributes to approximately 50% of accidents and incidents. This figure correlates well with HSE’s report ‘out of control’ which concludes that 59% of incidents from a wide range of industries has inadequate design as a causal factor. The fact that the figures correlate so well despite the variety in the types of activities is remarkable. The slightly higher percentage from the HSE study may be explained from the fact that this study focussed on failures of technical systems rather than accidents in general.
The results of the Boeing study seem to correlate less well with the findings (20% vs 50%). This may be due to the fact that the Boeing interpretation of a design related cause focuses on the design of the aircraft. Potential improvements in the design of ATC systems, design of airports and design of weather information systems are not included in the accident prevention strategy ‘design improvement. The Boeing study does however list the following separate accident prevention strategies: • ATC system performance 13 % • Weather information availability and accuracy 10% • Eliminate runway hazards 7% • Other airport 6%
This suggests that if the Boeing interpretation of a design related factor would have been expanded to include design of all elements of the aviation system, and not only design of the aircraft, a figure close to 50% would have been obtained.
For EEC it is important to have an understanding of what design practices and safety considerations are most beneficial to assuring a safe design. This requires an identification of the specific design stages where oversight happened and the types of oversights that caused accidents. Typical types of oversight are Design did not use accepted Standard Operator misunderstands or misinterprets Used outside design envelope Failure of ‘defence in depth’ Unrevealed failures and errors
8