Common Cause Failures, Particular Risks and Zonal Safety Analysis
Total Page:16
File Type:pdf, Size:1020Kb
SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 COMMON CAUSE FAILURES, PARTICULAR RISKS AND ZONAL SAFETY ANALYSIS R.G.W. Cherry & Associates Limited 2012. All rights reserved - 1 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 1 Common Cause Failures Common Cause Failures are often the limiting factor on the integrity of complex systems, and yet they are often overlooked in the safety assessment process. In this module consideration is given to the various forms of Common Cause Failures that have the potential for compromising the reliability of aircraft systems and the possible methods for identifying them during the design process. 1.1 THEORY V PRACTICE It is normally expected that if the probability of failure of one channel in a given period is X and there are N channels, any of which may achieve the intended function, then the probability of all channels failing is: XN …………………. Equation 1 The impact of Common Cause Failures on an aircraft electrical power generation system was assessed from a study carried out by Hawker Siddeley Aviation in the 1970s. The study was carried out on an in-service aircraft that had three otherwise independent electrical power generation channels. For this aircraft, the average failure rate for each of the channels was found to be approximately: 9.5 x 10 -4 per flight Now if the aircraft had only two electrical power generation channels then the probability of both failing due to independent causes might be expected to be :- (9.5 x 10 -4)2 per flight = 9 x 10 -7 per flight (approx.) And for the three-channel system: (9.5 x 10 -4)3 per flight = 8.6 x 10 -10 per flight (approx.) However, when the in-service record for the subject aircraft was investigated it was found that multi-channel failures occurred at a much greater frequency than predicted by this simple theoretical approach. A comparison between what might be expected if failures were totally independent and what was actually achieved on the in-service aircraft is illustrated in Figure 1. (The curve for the failure probability of the in-service aircraft has been adjusted to account for there being three ways in which two channels may fail in a three channel system.) RGW Cherry & Associates Limited - 2 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 Figure 1 - Probability of Failure against number of channels for an aircraft electrical power generation system It can be seen from Figure 1 that for the three-channel system the frequency of total system failure is several hundred times greater than the simple theory suggests. A further study carried out by Hawker Siddeley Aviation relating to hydraulic systems illustrated that the difference between the simple theory and what is actually achieved was also exhibited by aircraft hydraulic systems as illustrated in Figure 2. Figure 2 - Probability of Failure against number of channels for aircraft hydraulic systems This marked difference between the simple theory prediction, and what is actually achieved in terms of the probability of multiple system failures, is likely to be exhibited in other, probably all, multi-channel aircraft systems. RGW Cherry & Associates Limited - 3 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 The simple theory expressed in Equation 1 is predicated on there being no significant dormancies in the system. However, in most complex systems there could be a number of faults that are not normally detected until system checks are carried out during maintenance. The system designer needs to concentrate on arranging the system architecture to reduce the number of dormant faults; those remaining require a maintenance check set at an interval such that their probability of occurrence, in combination with other failures, is to an acceptable level. However, the primary reason for the differences between the simple theoretical curve and the practical curve shown in Figure 1 is that channel failures are not totally independent and that Common Cause Failures have a significant influence on the probability of total system failure actually achieved. Following their identification, consideration should be given to design changes, manufacturing techniques, maintenance actions and system operating procedures to eliminate or mitigate Common Cause Failures. Whilst not an exhaustive list, the foregoing sections address some of the most frequently occurring Common Cause Failures that may present a “threat” to most aircraft types. Those external to the system may be caused by a variety of occurrences, many of which are known as Particular Risks. RGW Cherry & Associates Limited - 4 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 1.2 COMMON MODE FAILURES The simple theory of multi-channel failure is predicated on failure independence amongst the channels. However, in order to optimise the design most multi-channel systems will have similar components. These similar components will have similar failure modes known as Common Mode Failures. Manufacturing or design faults resulting in high early life failure rates can have a pronounced effect on the probability of total system failure. A similar situation arises from “wear-out” failures. Consider a component in a two-channel system exhibiting the failure rate characteristics shown in Figure 3. Figure 3 - Component exhibiting “wear-out” characteristics When the aircraft first enters service, the component has a failure rate that is approximately 2 x 10 -4 per hour. However, if the component survives in both of the channels for 3,000 hours their failure rates increase to approximately 1 x 10 -3 per hour. For an aircraft that has an average flight time of one hour the probability of both channels failing due to this component, when the aircraft first enters service, is approximately (2 x 10 -4)2 - equal to 4 x 10 -8 per hour. However after the aircraft has been in service for 3,000 hours the probability of both channels failing due to this component is approximately = (1 x 10 -3)2 equal to 1 x 10 -6 per hour. Hence, the probability of both channels failing due to this component has increased by a factor of 25 . A similar situation exists if the component exhibits an infant mortality failure rate characteristic, perhaps resulting from manufacturing or maintenance induced defects. Another frequent form of Common Mode Failure is mal-assembly or mal-rigging of similar equipment in multi-channel systems. Despite the precautions taken to RGW Cherry & Associates Limited - 5 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 prevent failures occurring due to errors of this kind, they still present a significant risk to the integrity of vital aircraft systems. On an aircraft test flight inadvertent operation of the stall recovery system occurred just after take-off due to the incorrect rigging of the microswitches on the leading edge slat. The two microswitches were in each of two channels in the stall recovery system. The mal-rigging of both microswitches resulted in the stall recovery system being in the slats retracted mode, which at take-off speed resulted in stick pusher operation. Of course, dissimilar redundancy can sometimes alleviate the “threat” from Common Mode Failures. The BAe 146 Electrical Power Generation System represents a good illustration of the way in which dissimilar redundancy may be used to good effect. It has two engine-driven generators, one Auxiliary Power Unit (APU) driven generator, an hydraulically driven generator and a battery. This degree of dissimilar redundancy provides a good measure of protection against Common Mode Failures. However, this approach is not always practical since systems are designed to optimise the exacting requirements demanded of them. Therefore, any variation in this standard must result in penalties in terms of performance, weight, cost, operational reliability or any other of the design parameters. In most cases, the designer is forced to use similar equipment on all system channels, and must rely on close attention to the design, maintenance checks and operating procedures in order to achieve the levels of integrity required. RGW Cherry & Associates Limited - 6 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 1.3 CASCADE FAILURES Most multi-channel systems are designed so that under normal operating conditions they share the total system demands. It therefore follows that failure of a single channel will usually result in the remaining channels taking an increased load. This increase in load almost invariably produces a consequential increase in failure probability. Although cascade failures of this type are readily understood for structural components, they are not often expected in electrical systems. Multi-channel electrical systems are usually designed so that any one channel is capable of meeting the requirements of essential services. However, following a channel failure the increase in load on the other channels is likely to result in an increase in their probability of failure. Attempts have been made to quantify the relationship between load and failure rate for electrical components - the results of this work are contained in MIL-HDBK-217 "Reliability Prediction of Electronic Equipment.” Figure 4 shows the relationship between failure rate and load for a 125 VA transformer using the data from MIL- HDBK-217. It may be seen that for this particular component there is a marked difference between the predicted failure rate at rated power, 125 VA, and half-rated power. Most components will demonstrate an adverse relationship between failure rate and load, although not necessarily as severe as the example shown in Figure 4. Figure 4 Relationship between Failure rate and Electrical Load for a 125 VA Transformer RGW Cherry & Associates Limited - 7 - SYSTEM SAFETY ASSESSMENT COURSE 9th May 2012 Assuming that the failure rate of two similar components in a dual channel system is unaffected by any changes in load the probability of total system failure is given by the following expression: 2 λλλ1 λλλ2 t per flight Where t is the flight time in hours and the failure rates are on a per hour basis.