On Reliability Estimation of Large Electronic Systems A

ON RELIABILITY ESTIMATION OF LARGE ELECTRONIC SYSTEMS

A Thesis Presented To

The Faculty ofthe

Fritz J. and Dolores H. Russ College ofEngineering and Technology

Ohio University

In Partial Fulfillment

ofthe Requirements for the Degree

Master ofScience

Shailesh Sardesai \ August, 1997 ------\ \ ACKNOWLEDGEMENTS

I would like to thank my advisor Dr. Herman W. Hill Jr. for his guidance and unwavering support in the course ofmy research and educational endeavors at Ohio

University. I would also like to express my appreciation to Dr. Richard Gerth, Dr.

Douglas Lawrence, and Dr. Mehmet Celenk for doing me the honor ofparticipating on my thesis committee and for their invaluable suggestions in enhancing the quality ofthis thesis. ii

TABLE OF CONTENTS

Page

LIST OF TABLES v

LIST OF FIGURES vi

Chapter 1: Introduction 1

1.1 Introduction to ReliabilityEngineering 1 1.2 The Role ofModem Reliability Techniques 2 1.3 Types ofReliabilityPredictions 3 1.4 Failure Mode and Effect Analysis (F!\1EA) and the role of Mil-Hdbk-2I7E 4 1.4.1 The Role ofRedundancy in F!\1EA 5 1.5 Analysis, Design and Development ofan Automated ReliabilityEstimation System 5 1.6 Organization and Contribution of this Thesis 6

Chapter 2: Background and the Role ofLCC 8

2.1 Definition ofReliability as Accepted in this Research Work 8 2.2 Combining Mathematical Models and Engineering Factors 10 2.3 Maximum Reliability Versus Optimum Reliability 10 2.3.1 The Early Stages OfLife Cycle Cost Analysis 12 2.4 Reliability OfComplex Electronics Systems 13 2.4.1 Parts Versus Systems 13 2.4.2 Relevance ofthe Bathtub Curve to Part Failure Rates 15 2.5 Basis for the Assumption ofa Constant Failure Rate 21 2.6 Introduction to the MiI-Hdbk-2I7E Methods ofFailure Rate Analysis 22 2.6.1 Why this Research Work does not Employ the Parts Count Model 22 2.6.2 Introduction to The Stress Analysis Model ofFailure Rate Estimation .. 24 2.6.3 Why Stress Analysis is Complex 24 2.6.4 Types OfPotential Stresses 26 iii Chapter 3: Stress Analysis and MiI-Hdbk-217E Methodology 27

3.1 Breakdown ofSystems Into Component Parts 27 3.2 Literature Research ofMiI-Hdbk-217E 28 3.3 Stress Analysis 29 3.4 Requirements ofthe MiI-Hdbk-217E Model 30 3.4.1 Development ofa Generalized Failure Rate Model 30 3.4.2 Generalized MiI-Hdbk-217E Failure Rate Model 32 3.5 Importance ofThennal Analysis 36 3.6 Example Illustrating the Procedure Employed to Calculate the Estimated Failure Rate ofa Component using MiI-Hdbk-217E , 37 3.7 K-Factors and Modal Analysis: Some Modern Concepts 41 3.8 Modal Analysisand Modification ofFailure Rates 42 3.9 Degree ofConfidence in Stress Analysis 42 3.10 The Effect ofVariable Operating Conditions 43 3.11 Critical Review ofMiI-Hdbk-217E and Modern Controversies 43

Chapter 4: System Reliability and E-R Modeling 47

4.1 Estimating Sub-System Reliability 47 4.2 An Example ofSub-System Reliability Estimation 48 4.3 ReliabilityEstimation ofa Simple Series System 49 4.3.1 Network Model ofa Simple Series Configuration 50 4.4 Introducing the Concept ofRedundancy 52 4.4.1 Types ofRedundancy Techniques 52 4.4.2 Simple Parallel Redundancy Configuration 53 4.5 Bi-modal Parallel/Series and SerieslParallel Configurations 56 4.6 Partial Redundancy 57 4.7 Some Complex Redundancy Configurations 57 4.7.1 K Out ofN Networks 58 4.7.2 Majority Voting Redundancy 59 4.7.3 Operating Redundancy 60 4.8 Optimum Redundancy Levels 60 4.9 Introduction to Techniques used in Modeling the Real World 62 4.9.1 Transformation ofthe Domain into a Useful Computer Representation .62 4.9.2 Controlled Iterative System Development Life Cycle 63 4.10 Data Driven Systems 64 4.11 Data Modeling 65 4.12 Modeling the Functional Aspects ofthe Intended System 67 4.13 Conceptual Modeling: An Object Oriented Approach 68 4.13.1 Object Oriented Modeling 69 iv

Chapter 5: Design and Development ofa Reliability Evaluation System 71

5.1 Software Engineering Paradigms Employed in this Thesis 71 5.2 Application ofComputer Aided Software Engineering (CASE) 72 5.2.1 Advantages ofComputer Assisted Software Engineering (CASE) 73 5.3 Implementation Life Cycle 74 5.3.1 Achieving a Requirements Definition 74 5.3.2 Evolution ofthe System Design 75 5.3.3 Design ofthe Input and Output User Interfaces 79 5.4 Examples ofElectronic Forms Developed in the System 80 5.4. 1 Input Management Form 80 5.4.2 How to Specify Simple Series Configurations 82 5.4.3 How to Specify Simple Parallel Configurations 82 5.4.4 How to Specify Constant Multiple Parallel Configurations 83 5.4.5 How to Specify Varying Multiple Parallel Configurations 83 5.5 Display and Interpretation ofOutput 84 5.6 Manipulating Design Parameters 86

Chapter 6: Conclusion and Future Work 87

6.1 Concluding Remarks 87 6.2 Potential Future Work 89

BIBLIOGRAPHY 90

ABSTRACT 94 v

LIST OF TABLES

Table Page

3.1 Examples ofPart Categories 32

3.2 Examples ofQuality Levels 34

3.3 Examples of I1E Factors 35

3.4 Examples ofApplication Adjustment Factors 36

3.5 Base Failure Rates for Resistors at Various Combinations ofTemperature

and Stress Factor Values 38

3.6 Examples of IIE Values for Different Environments 39

3.7 Examples ofIIR Values for Different Ohmic Values 39

3.8 Examples ofIIQ Values for Different Manufacturing Levels 40

4.1 Example ofSub-System Failure Rates Calculated in this Research Work 49 vi

LIST OF FIGURES

Figure Page

2.1 Cost Versus Reliability Trade-Off . 10

2.2 System breakdown using FMEA 14

2.3 Process Model ofWork Done in this Research Study 15

2.4 Bathtub Curve Depicting the Failure Rate ofElectronic Components 16

2.5 Graph Depicting the Exponential failure Distribution 17

2.6 Graph Depicting a Constant Failure Rate for the Exponential Distribution 18

2.7 Reliability and Unreliability Curves for a Constant Failure Rate 19

2.8 Depiction ofa Constant Failure Rate from Field Data 22

3.1 Effect ofElectrical Stress and Temperature on the Base Failure Rate 33

4.1 Network Model ofa Series Configuration 50

4.2 Effect ofSeries Components on System Reliability 51

4.3 Types ofRedundancy Techniques 53

4.4 Network Model ofa Simple Parallel Configuration 54

4.5 Network Model ofSeries-Parallel and Parallel-Series Configurations 56

4.6 Network Model ofa Partially Redundant System 57

4.7 MTBF Versus Failure Rate for K out ofN Networks 58

4.8 Model ofa Majority Voting Redundant Configuration 59

4.9 Model ofa Operating Redundancy Configuration 60

4.10 Incremental Gain in Reliability with Parallel Redundancy 61 vii 4.11 Controlled Iterative System Development Life Cycle 64

4.12 Transformation ofthe Real World Into a Physical Model 66

4.13 Data and Function Model Interaction to Achieve System Behavior 67

5.1 Illustration ofCASE techniques 73

5.2 Entity Relationship Model ofthe Reliability Estimation System 76

5.3 Data Flow Diagram ofthe Reliability Estimation System 78

5.4 Example of an Input Management Form Developed in the System 81

5.5 Example ofan Output Screen Showing the Failure Rate ofEach Constituent

Module ., 84

5.6 Example ofan Output Screen Showing the Reliability Summary ofthe Final

Electronic System Design 85 Chapter 1

Introduction

1.1 Introduction to Reliability Engineering

"Reliability engineering is the technology ofprediction, control, measurement, reporting and analysis offailure phenomena and failure rates" [29]. Historically, reliability engineering has played a critical role in a vast spectrum ofindustrial and technological applications by addressing the following issues relating to a given component or system [29, 5]:

1. The probability ofthe component or system working successfully over a specified

interval oftime.

11. Balancing achievable performance with other desirable system qualities in a

given context in order to achieve optimum performance levels.

Ill. Examination ofinherent part strength against the expected stress that it may be

subjected to.

IV. The organizational planning and implementation required to achieve a specified

performance level and recommending the operating conditions under which the

part or system may be deployed. v. Studying the procedures needed to achieve volume production ofreliable parts

and systems. 2 1.2 The Role of Modern Reliability Techniques

MOD - U.K. (Ministry of Defense - United Kingdom) studies have concluded that reliability is as important a system parameter as lethality, resolution and other more prominent performance measures [41]. Modem reliability engineering, besides providing a set ofnumerical indices for performance evaluation, is now being applied in helping to understand many more sophisticated issues relating to equipment manufacture and deployment. It addresses issues such as how to design for performance, how and why a system may fail, the consequences offailure, how to best manage failure, and specification of ideal operating conditions for best system utilization [29, 31]. Such modem techniques have also made it possible to create a set ofglobal standards and specification indices that are consistent in meaning and measure across geographical and technological boundaries. MOD studies also indicate that 60% to 80% oflife cycle costs occur in the operation and support ofequipment, emphasizing the importance ofthis ability to provide strategic information to engineers and managers that links the quality of the product with economical interests and capital investment [31]. While reliability evaluation techniques have traditionally been associated with the aerospace and military industries, modem concepts ofreliability engineering have proven to be invaluable in almost every part ofthe scientific and industrial world [29]. 3

1.3 Types ofReliability Predictions

Practical considerations in the area ofequipment manufacture and deployment have necessitated the introduction of concepts such as unpredictable behavior even in fairly well-understood processes. Failure ofmature electronic equipment is one such process that is stochastic in nature, i.e., it varies randomly with time [27, 44]. One ofthe goals ofthis thesis is to study a procedure that combines contextual engineering knowledge with traditional reliability techniques to assist engineers in estimating the long term reliability oflarge electronic systems. Historically, researchers have distinguished between three kinds ofreliability predictions: qualitative, quantitative and statistical [36].

Qualitative prediction has been the major source ofreliability achievement in the past, primarily from the subjective and qualitative experience ofdesign and operating engineers. This method ofassessing the reliability ofa system, although a significant contribution, suffered from a myriad ofdeficiencies. The results varied with personal experience, educational level and expertise, and suffered from a wide variance in meaning and interpretation. As the discipline progressed and the complexity oftechnologies grew, the need for a quantitative measure ofreliability under universal standards began to be felt. In the modem world, the importance and limitations ofboth qualitative and quantitative techniques has given rise to a strategic breed ofscientists that are investigating the advantages ofboth approaches. Such investigations have led to a highly specialized branch ofengineering that utilizes mathematical tools ofanalysis and at the same time emphasizes the consideration ofthe unique engineering factors that arise and 4 interact in the operational life ofthe equipment in order to make any predictions about its reliability [5, 1].

1.4 Failure Mode and Effect Analysis (FMEA) and the Role

of Mil-Hdbk-217E

The system breakdown aspect ofFailure Mode and Effect Analysis (FMEA) has been applied in this research work in analyzing the overall reliability oflarge complex electronic systems. This method seeks to evaluate large complex systems by analyzing the failure of basic units that make up the system and determining their impact upon total system reliability. Large systems are broken down into hierarchical levels until either a level suitable for analyses or the basic functional unit level has been reached. Advanced engineering techniques based on Mil-Hdbk-217E models are then applied to calculate their individual failure rates. The part failure rates are then applied in a bottom-up analysis ofthe resulting FMEA model to measure the reliability ofthe next higher level until the top-most level is reached, at which point total system reliability is calculated

[20, 41]. This contrasts with the traditional Fault Tree Analysis (FTA) approach which postulates an effect and works downwards towards the cause. The FTA method, although easier, is less reliable since the designer has to realize all failure possibilities and further has to speculate on the consequences ofeach one ofthose possibilities. 5 1.4.1 The Role of Redundancy in FMEA

FMEA is most accurate only when each part is essential towards achieving system functionality, i.e., when the system does not contain appreciable redundancy. Taking this into consideration, it is assumed that in the basic analysis and design of the sub-systems considered in this thesis, each vital part contributes directly towards the functionality of the system in the event ofits successful operation or terminates system functionality in the event ofits failure [29, 21]. FMEA is then applied to identify high-failure-rate components and their effect upon the failure rates ofthe parent sub-systems.

The concept ofredundancy is introduced in the assembly ofsub-systems to facilitate the creation oflarger systems. A set ofredundant configurations available to the designer in manipulating total system reliability are also studied.

1.5 Analysis, Design and Development of an Automated Reliability

Estimation System

Once the failure rates ofall the basic components concerned with our study are computed and the relevant FMEA model has been determined, all data, entities and the inter-relationships between them are transformed into an object-relational computer model. Reliability evaluation expertise is then incorporated into this model to provide the designer with an interactive software that allows specification of several possible design configurations and provides real time feedback ofthe estimated reliability ofeach design.

The developed software provides features such as the ability to change a design parameter 6 or sub-system component and immediately see the effect ofthat change on the reliability and potential cost ofthe system.

1.6 Organization and Contribution ofthis Thesis

Chapter 1 provides an introduction to reliability engineering. Chapter 2 gives an overview ofmodem electronics reliability theory and the role of Life Cycle Cost (LCC) analysis. It also lays the foundational background for the detailed techniques that have been implemented in this thesis. Chapter 3 describes in detail how system breakdown methods and Mil-Hdbk-217E models have been applied to elaborately calculate the estimated reliability ofseveral hundred electronic parts and large complex electronic systems. Chapter 4 discusses some redundancy configurations in detail and how they may be exploited to manipulate system reliability. It also introduces the techniques used in this thesis to transform a requirements analysis into a design specification, using

Computer Aided Software Engineering (CASE) methods. Chapter 5 describes the procedures that were followed in this thesis to analyze, design and develop an automated system that assists electronics engineers in computing real time reliability estimations of large complex electronic systems. Finally, Chapter 6 presents the concluding comments ofour study and discusses further work that can be done in this area.

This thesis conducts an original investigation into the intricacies ofreliability estimation oflarge electronic systems. Advanced engineering and Mil-Hdbk-217E techniques were studied and applied in analyzing a set ofelectronic parts and systems to 7

estimate their failure rates. The contributions ofthis thesis also include the assimilation of

diverse methodologies to design and develop, from inception, an original, customized reliability estimation software that allows designers and engineers to compute and compare, in real time, the estimated reliability ofcompeting designs. 8 Chapter 2

Background and the Role of LCC

2.1 Definition ofReliability as Accepted in this Research Work

This research work has accepted the following definition ofreliability as a foundation ofour research into the reliability ofelectronic systems. "Reliability is the probability ofa device performing its purpose adequately for the period oftime intended under the operating conditions specified" [5]. This definition emphasizes the necessity of considering four fields ofstudy relating to the operational lifetime ofthe equipment in order to perform reliability predictions. They are probability, adequate performance, time, and operating conditions.

Probability techniques provide the numerical input and tools for the assessment ofreliability, and is accepted as an important index ofsystem adequacy [29, 5, 41].

Probability plays a crucial role because all reliability techniques and tools are essentially concerned with the problem ofpredicting future operational behavior ofa component or system. The operational behavior being studied is stochastic in nature, i.e., evolving through time in an unpredictable manner, and can occur in milliseconds or several decades depending on the nature ofthe system [44]. Assessment ofsuch a problem involving stochastic processes clearly requires significant reliance on mathematical probability techniques [1]. 9

Time, operating conditions, and adequate performance, are all engineering parameters that are defined by the discipline being studied and introduce the dependence on contextual analysis to obtain reliable results. For example, the time being considered may be continuous or sporadic, the operating conditions for one mission may be constant or acutely variable such as in the launching and orbital flight ofrockets. The adequate performance being considered also has an engineering basis depending upon the meaning offailure. Failure may be a complete failure ofthe system, or degradation ofa desired system function [29, 5].

Scientists have time and again emphasized that while probability mathematics is an important tool in analysis, reliability engineering delves deeper into the importance of the unique engineering factors that arise and interact in the operational life ofthe equipment. These factors could include the nature and characteristics ofthe part and its packaging, interconnection ofthe parts, design ofthe system, methods ofapplication, methods offailure, and the environmental and other circumstantial stresses that it may be subjected to. Studies indicate that reliability evaluation ofmodem electronic systems requires contextual engineering knowledge and should not be based too deeply on statistical distributions and probability theory [29, 1, 36]. Without this engineering factor, quantitative analysis is simply an exercise in mathematical manipulation that can lead to erroneous and misleading results. What probability methods do provide, however, is a basis for a quantitative prediction ofsystem performance and a way ofconsistently evaluating the relative reliability levels ofalternate designs [5, 1]. 10

2.2 Combining Mathematical Models and Engineering Factors

As technology becomes more and more complex, with advancements being made on a global level, it becomes impossible to perform a useful reliability analysis without employing a standardized approach. Reliability prediction using probability theory, mathematical models and engineering techniques allows the formulation ofa quantitative measure ofthe expected system performance while taking into account important qualitative factors. This allows a standardized methodology ofconsistently comparing the reliability ofcompeting parts and designs [29, 12].

2.3 Maximum Reliability Versus Optimum Reliability

Any reliability study essentially remains an academic exercise unless due consideration is given to economic factors and organizational feasibility.

Support Cost Possible Optimum Reliability Initial Cost Cost

Reliability

Figure 2.1 (Adapted from [29]): Cost Versus Reliability Trade-Off. 11

From a practical viewpoint, it is the ability to achieve incremental improvement in system reliability that is more important than maximum reliability. As shown in figure

2.1 [29], a greater investment in reliability improvement in the earlier stages is generally expected to reduce maintenance costs and investments in spares inventories. The trade off between desired reliability levels and practical limiting factors such as unreliability of parts preclude 100% system reliability. The objective should then move from achieving maximum reliability levels to those that are both feasible and mutually acceptable to the producer and consumer alike. The concept ofassigning a quantitative basis for failure control is particularly important, so as to plan for, specify and achieve the optimum level ofreliability in practical applications. Studies have indicated that the total cost ofsupport oflow quality equipment over the life ofthe device is often times more than the initial acquisition cost [29, 15, 19].

In line with this phenomenon, governments and large industrial entities have introduced and enforced guidelines that require assurance ofa known performance level over the specified life time ofthe equipment at a given cost [1]. This awareness of reliability requirements over the entire life ofthe equipment has led to the development of a science called Life Cycle Cost (LCC) modeling. LCC modeling deals with the modeling and estimation ofsystem cost and effectiveness over its entire useful lifetime.

The aim ofthis study is to provide a set oftools to assist in making rational decisions to manufacture a system that will meet all operational needs and which can be procured and 12 operated at minimal expense. According to the Ministry ofDefense - United Kingdom

(MOD - UK) research [5, 31] reliability assumes an integral position in LCC studies as it can directly affect 60% to 80% oflife cycle costs that occur in the operation and support ofequipment.

2.3.1 The Early Stages of Life Cycle Cost Analysis

Statistical distributions, and engineering models, such as Mil-Hdbk-217E can. be employed to obtain fairly accurate estimates ofsystem reliability providing the constituent parts and the operating conditions are known [5, 1]. However, during the typical initial stages ofLCC, such as during a feasibility study, data is most often unavailable for such intricate analysis. In such cases, LCe studies utilize field data obtained from similar currently operating equipment so that reasonable predictions can be made in the absence ofactual field data as early as possible in the life cycle. During this phase, predictions can be made by engineering judgment applied to data based on past experiences with similar equipment. According to studies conducted by Dover and

Oswald [29, 1, 14] there are six kinds ofrelevant LCC models. These are i) Accounting models ii) Cost estimation relationship models iii) Heuristic models iv) Analytical models v) Reliability models and vi) Economic analysis models. This research work concentrates mainly on reliability models relevant to electronic systems. 13 2.4 Reliability of Complex Electronic Systems

The concepts ofreliability engineering and probability theory attain significantly complex levels when applied to electronic systems. This is because modem electronic systems are inherently complex by the very nature oftheir technology. Moreover, electronic parts are particularly sensitive to a multitude ofexternal and internal stresses that drastically alter their behavior. Each part in a system may have one or more potential modes offailure and each mode may have one or more failure mechanisms. This complexity ofcomponent reliability carries over to the reliability ofthe system that houses the components [29]. It behooves engineers therefore, to be extremely careful in conducting a quantitative and qualitative analysis ofthe reliability ofsuch complex systems. Even with rough estimation techniques such as population analysis, resource dedication to evaluate reliability aspects ofa system increases as the complexity ofthe system increases.

2.4.1 Parts Versus Systems

A typical feature ofelectronic systems is that failure ofa system is essentially due to the failure ofone or more constituent parts. While this concept may seem fairly obvious, it gets carried over to an extraordinarily detailed level in electronic systems.

Every relevant failure ofa complex electronic system is assumed to be due to the failure ofa constituent part [29]. This study has utilized this widely accepted assumption in the 14 application ofthe Failure Mode and Effect Analysis (FMEA) method of breaking down large sub-systems for the purpose ofevaluating the reliability aspects ofa basic design.

Assemby Of Controllers

y y Controller 10 ~Contoller 1 I u: ___ Controller N

Figure 2.2: System breakdown using FMEA

As shown in figure 2.2, large complex equipment or inter-connections ofequipment are analyzed by breaking down the equipment into hierarchical levels ofassembly and dividing each level into simple functional units. The break down is carried out until a level suitable for analysis is reached or the basic parts are encountered. Each basic unit is then subjected to an analysis ofthe various factors that could potentially affect its reliability. Based on the modeling shown in figure 2.3 [1], engineering models are then chosen to fit the resulting data and logical inter-relationships. These models along with the concepts ofprobability and statistics are applied to compute the estimated reliability 15 ofthe system in terms of the basic units. Once the high failure rate components have been identified, redundancy can then be incorporated at the sub-system levels to manipulate system reliability to desired levels. It is apparent then, that the analysis of failure rate ofthe basic component is crucial to the calculation ofsystem reliability [29,

20, 1].

Basic Data Refined Data Prognosis of Decision Generation Set Expected Behavior Making

Application ofEngineering Probability and Contextual Models deductive reasoning Knowledge

Figure 2.3: Process Model ofWork Done in this Research Study

2.4.2 Relevance ofthe Bathtub Curve to Part Failure Rates

"The time between successive failures is a continuous random quantity" [1]. From a probabilistic viewpoint, ifthe distribution function modeling the variable ofinterest is known, this random variable can be analyzed using a failure model that is based on given conditions. Probability theory provides the relationship between these failure models and life test results offailure rates. The distribution shown in figure 2.4 [12] is widely applied in the scientific community to model the failure distribution ofmost electronic parts. This distribution, commonly known as the bathtub curve can be stated as the sum ofthree basic distributions as stated in equation 2.1[29, 5]: 16

f(t) = WI f l (t) + W2 f2 (t) + W3 f3 (t) equation 2.1 where WI' w2, and W3 are adjustable weight factors that facilitate the combination ofthe three distributions such that f(t) satisfies the definition ofa probability distribution

function. In our relevant analysis, WI + W2 + W3 = 1.

Period considered in this study Failure Rate Only Chance Failures Occur (A) A= Constant

Time (t)

Infant Mortality Useful period Wear Out Period Period

Figure 2.4: Bathtub Curve Depicting the Failure Rate ofElectronic Components

The hazard rate in the infant mortality period is initially high and is expected to decrease rapidly with time. This phenomenon occurs mainly because offlaws incorporated during the manufacturing process that escape quality control checks and cause the part to fail.

High failure rate units may also be identified in this region. It has been observed that the failure distribution in this period represented by fI(t) in equation 2.1, may be approximated by the gamma distribution described by equation 2.2 [29, 5].

f 1(t) = 1 t(a - 1) e(-tlb}_ for t > 0 equation 2.2 b rea) b(a-l) 17 where, rea) is the gamma function, and is equal to (a - I)! when (a-I) is a positive integer, b = MTBF ofthe whole equipment and, a = Number offailures.

It may be noted that the characteristic for a < I mimics the burn-in period ofthe bathtub curve, and for a = 1, i.e. corresponding to time to first failure, equation 2.2 reduces to the exponential distribution described by equation 2.3.

The work done in this study assumes that all parts have passed this burn-in period and have matured into the constant failure rate period. This constant failure rate period is the useful operating life ofthe part or equipment, and reflects the period in which the measurement ofequipment reliability is most relevant in the electronics industry. Failures ofparts and systems are essentially random in this interval ensuring that the failure rate is essentially constant as shown in figure 2.6.

Figure 2.5: Graph Depicting the Exponential failure Distribution 18

It has been observed that the failure distribution in this period represented by f2(t) in equation 2.1, may be described by the exponential distribution stated in equation 2.3 [12].

At f2(t) = A e- equation 2.3 where A is the constant failure rate. e= 1/ A is the Mean Time Between failures (MTBF) defined as the arithmetic average ofthe failure free intervals for non-repairable systems.

[1, 12].

z(t)

Figure 2.6: Graph Depicting a Constant Failure Rate for the Exponential Distribution

The assumption ofan exponential distribution offailures in the useful operating life of electronic equipment is a very important one for the work done in this thesis. As described later, this assumption allows the failure rate oflarge non-redundant systems to be expressed as the sum ofconstituent part failure rates by using product rule of reliability.

In order to analyze the parallel configurations used in this research work, another measure ofsystem performance needs to be introduced. Just as reliability can be defined as the probability ofsuccessful operation for time 't', so can another measure ofsystem 19

performance called unreliability be defined. It is the probability offailure for the time

interval 't'. Clearly, the probability of either failure or success is equal to 1. i.e.

R(t) + Q(t) = 1 equation 2.4

for operating time t, where R(t) is the reliability and Q(t) is the unreliability [12] .

For a constant failure rate,

R(t) = e-tl9 equation 2.5

Q(t) = 1- e-t19 equation 2.6

Figure 2.7, [12] shows the reliability and unreliability curves for a constant failure rate.

R+Q= 1

------~~~~~~~~-----

Q(t) = 1- e-tie

Q= .63 ------~·-i .' I I I I /' I /' I / R= .37 -- -~ ------I tl9 / " 1 R(t) = e- / I , 1 " I : 1 I I I I 1 I t=e Figure 2.7: Reliability and Unreliability Curves for a Constant Failure Rate

The Poisson distribution is also an extremely useful distribution in the reliability prediction ofelectronic equipment. It characterizes rare events and represents the probability ofan event occurring a given number oftimes in a given interval oftime. It 20 can be seen from equation 2.7, that the exponential distribution is only a special case of the Poisson distribution, specifically when considering the probability ofthe first failure.

Assuming that the total number of operating parts remains the same, the Poisson distribution can be described as

f(x) = e-At (At)X equation 2.7 x!

It has been observed that the failure distribution in the wear out period represented

by f3(t) in equation 2.1, may be described by the Gaussian distribution stated in equation

2.8 [29]. In this period, a large number ofindependent causes related to wear or aging mechanisms act together to cause the failure rate to increase rapidly.

This study does not analyze electronic components in the wear out period since it is a widely accepted fact that critical electronic equipment is overhauled well before it enters this aging period [41, 12].

f3(t) = 1 exp[ - (t - f:!)2] equation 2.8 cr(2il)1/2 (2cr)2 where Jl is the location parameter and 0' is the scale parameter.

The bath tub failure rate curve has been widely acknowledged by scientists as an excellent indicator ofthe reliability behavior ofelectronic parts and systems [29].

Depending upon the life cycle stage ofobservation and the data gathered, this behavior can be represented by various other probability distributions. Many ofthese distributions are theoretically relevant in reliability mathematical modeling. In practice the exponential, Poisson, Raleigh, Normal, Gamma and Weibull distributions are the 21

probability distributions that are commonly employed to model the reliability of

electronic components [29, 41].

2.5 Basis for the Assumption of a Constant Failure Rate

Renewal theory studies by J. L. Doob et ale [32] have concluded that the total

system failure rate remains essentially constant ifthe failure rates ofthe constituent parts

remain constant. In practical applications, although a constant failure rate cannot be

expected for all types ofparts that make up a complex system, diverse part types

appearing in large quantities and operating under varied circumstantial conditions tend to

display an average constant failure rate for large electronic systems.

Failure phenomena may be measured either by measuring the time between

failures in a given observation time period or by counting the number offailures occurring in successive sub-periods ofequal length. Studies conducted by B. Epstein et

al., discuss a number ofanalytical procedures for testing the assumption ofconstant failure rate. Among them are the Kolmogorov - Smimov (K and S) test and the Chi

square test applied to field and factory test results. Several field studies carried out by

Aeronautical Radio Inc. (ARINe) research corporations have also confirmed that the phenomenon of constant failure rate is displayed by a large majority ofcomplex electronic equipment [29, 20].

Figure 2.8 [2~] based on MOD data, shows the average number offailures per

unit interval ofthe test. Interval AB represents the early infant mortality period during 22 which the failure rate is initially high and then declines rapidly as debugging and system harmonization occur. CD represents the maturity period during which failure occurrences become essentially random, causing the failure rate to be constant.

Failure Rate

I I .1 ...... ~ . I I I ! OJ'''''------....------.I I I ~AB ~4 CD Test Period Figure 2.8 (Adapted from [29]): Depiction ofa Constant Failure Rate from Field Data

2.6 Introduction to the Mil-Hdbk-217E Methods of Failure Rate Analysis

The primary models for failure rate analysis employed in this thesis are those stated in the Mil-Hdbk-217E model. These basically rely upon two methods ofreliability estimation applicable during different stages ofthe life cycle. They are, the parts count method and the parts stress analysis method.

2.6.1 Why this Research Work does not Employ the Parts Count Model

The parts count method is applicable in the early design phase and is not a very exact method. The information required to apply this method includes a) generic part types and quantities b) part quality levels and c) equipment environment. The objective is to obtain a reliability estimation that can be expressed as a failure rate. From this value 23 the reliability and the Mean Time Between Failures (MTBF) can be developed. Assuming that an electronic assembly or system will fail when any ofits parts fail, the failure rate of the item may be expressed as the sum ofthe failure rate ofits parts [29, 20].

i = n equation 2.9 where Ai = failure rate ofthe ith part, and n = number ofparts.

It has been observed that families ofelectronic parts share some commonality with respect to their reliability behavior. This makes it possible to derive a reliability estimation from a rough estimate ofthe total system complexity. Individual failure models and circumstantial conditions ofeach part are not needed for this estimation. The main objective ofpopulation analysis is to estimate gross part failure rates for populations ofelectronic equipment. The reliability ofthe total system may be estimated from these failure rates based on system complexity in terms oftotal parts count [29, 20]. Clearly this method relies heavily on the analyst's ability to estimate parts populations and assign part failure rates, indicating the possibility ofinconsistencies in estimations made by different engineers on the same system, depending upon the personal experience with a given product line. Equation 2.10 gives the generalized model for parts count method of analysis. i=n

J... =j ~ N j (J...g IIq)i equation 2.10

where A is the total equipment failure rate (failure/10hr), Ag is the generic failure rate of the ith generic part.(failure / 106hr), Tl, is the quality factor for the ith generic part, N, is 24 the quantity ofthe ith generic part, and n is the number ofdifferent generic part categories.

2.6.2 Introduction to the Stress Analysis Model of Failure Rate Estimation

The work done in this study relies heavily on this complex method offailure rate estimation as modeled in Mil-Hdbk-217E. The influence ofstress on the reliability of electronic parts is extremely complex. This is because of the requirement that the component must be able to withstand varying degrees ofelectrical, mechanical and environmental stresses along-with the interactions between them, and must at the same time work with other components to achieve the desired functionality [29, 20]. Stress analysis is a careful examination of the inherent properties ofeach electronic part, the typical operating environment, and the specific application it is expected to perform in order to estimate its reliability. Stress analysis, however, is an extremely exacting process and must consider the factors that could adversely influence each contributing part in its individual packaging and typical environment.

2.6.3 Why Stress Analysis is Complex

This method may be employed only when a detailed parts list can be constructed and individual part stresses are available. The work done in this study utilizes the stress analysis procedures prescribed by Mil-Hdbk-217E, which is widely considered to be the most reliable source for this method in research and industry [20]. The Mil-Hdbk-217E 25 methodology is the result ofintensive data-collection efforts, experimentation and analysis done by the department ofdefense on military electronic equipment. The models developed therein relate engineering factors such as potential stress with reliability measures such as failure rate [29, 41]. The complexity ofthis analysis arises because each individual part may have one or more major failure modes and each mode may have several failure mechanisms. The models created, identify and manipulate factors inducing major modes offailure, and the mechanisms inducing those failures for several different part types. These processes are extremely rigorous and are accomplished through field failure analysis.

The general form ofthe part failure model is [5, 14]:

equation 2.11

where, Ab is the base failure rate, Il, is the environmental adjustment factor, Tl, is the

application adjustment factor, TIq is the quality adjustment factor and Tl, are additional adjustment factors.

The generation offailure rate data by stress analysis is very time-consuming and laborious because ofthe details involved in computing the failure rate for each part from an elaborate model. The advantage however, is the widely accepted opinion that parts stress analysis produces far more accurate reliability estimates [5, 4] . 26

2.6.4 Types of Potential Stresses

Reliability stress analysis and the resulting prediction involves a multiple-step consideration ofinfluencing factors that typically include [29, 5]:

1. Typical operating, and extreme temperatures.

11. Electrical and mechanical stresses.

Ill. Packaging ofparts.

IV. Electro-mechanical interaction. v. Electro-thermal interaction.

VI. Miscellaneous environmental factors including radiation. 27 Chapter 3

Stress Analysis and Mil-Hdbk-217E Methodology

3.1 Break Down of Systems into Component Parts

Several large complex electronic controller designs were considered in this study.

Each electronic controller consisted ofseveral assemblies ofindividual Printed Circuit

Boards (PCBs). Each PCB in turn consisted ofseveral individual electronic components.

It is assumed that within a given PCB there is no redundancy and that all parts in a PCB operate in series for the purpose offailure rate calculation. Under this assumption, the chain rule ofreliability may be applied to calculate the overall system reliability. Since all parts are assumed to display exponential failure characteristics, the failure rate ofthe system is calculated as the sum ofthe failure rates ofits constituent parts [29, 41]. Every

PCB used in creating the large systems considered in this study was analyzed to identify each constituent part and the context in which it is applied as shown in figure 2.2. This process yielded a qualifying parts list consisting ofseveral hundred components. Mil

Hdbk-217E models and methods were then employed to calculate the failure rates of every component in the qualifying parts list. These methods included identification ofthe factors that influence the specific part type, its inherent properties and the context ofits application, to perform a series oftable look ups in order to calculate its failure rate.

Using this parts list, the failure rates ofevery PCB was calculated. In the reliability estimation software developed in this study, the concept ofredundancy is introduced, 28 allowing the designer to specify redundant configurations ofany PCB and measure the effect of the choice on the overall reliability ofthe system.

3.2 Literature Research of Mil-Hdbk-217E

Major research efforts in reliability prediction ofelectronic systems were first conducted by the Advisory Group on the Reliability of Electronic Equipment (AGREE) in the 1950s. This study was initiated by the u.s Department ofDefense to investigate causes ofunreliability in military equipment and to determine standardized ways of measuring and improving their reliability. The failure rate estimation methods that evolved from that study and other contributions from the Rome Air Development Center

(RADC) eventually came to be known as the Mil-Hdbk-217 reliability prediction methodology for electronic equipment [11, 32].

Specialized measurement techniques and testing under operating conditions simulated to match the stresses experienced by equipment similar to those to be deployed were conducted to produce field failure rate data. The resulting data is used to make estimates ofthe future reliability ofthe part based on statistical methodologies and mathematical models. These models quantify the effect offactors that affect reliability such as temperature, manufacturing quality, operating environment, and electrical stress

[32, 37].

This research work has relied upon the stress analysis techniques and models described in Mil-Hdbk-217E to calculate all part failure rates. The following section 29 describes stress analysis and some examples ofthe techniques and models that were used in the reliability estimations.

3.3 Stress Analysis

All electronic parts experience internal and external stresses during their operating lifetimes that affect their failure rates. Internal stresses could arise from factors such as faulty packaging materials and sub-standard manufacturing techniques. External stresses could include parameters such as voltage, current and temperature that could eventually be specified as ratings in the specific application they are deployed in. External stresses could also include environmental and related stresses that may eventually be specified as optimum operating conditions. It is important to note that stress analysis does not predict lot- dependence or design-dependent failures, nor does it analyze stresses on any particular part. It is intended to draw a statistical summary ofthe average reliability ofthe population ofparts built to the same construction specification rather than any particular item in that population [29, 30].

Stress analysis enables the designer to formulate a set ofstandard prescribed procedures to be applied to each part type in estimating its reliability. These procedures identify the parameters that would influence the failure rate ofthe part, and then quantify each contributing parameter to formulate failure-rate models such as those described in

Mil-Hdbk-217E [29, 41, 30]. 30 3.4 Requirements of the Mil-Hdbk-217E Model

The models used in this thesis are based on stress analysis techniques and assume that certain requirements have been satisfied. These include [26, 25, 30]:

1. Only mature parts that exhibit a constant failure rate are considered. The part

should not be in the infant-mortality period nor in the wear-out period ofits life

cycle where the failure rate is not constant.

11. Each part has been manufactured and maintained according to established quality

control practices. This helps to eliminate misleading rogue failures. iii. Each part being examined is intended for, and applied in applications whose

limiting conditions are specified.

IV. Only independent primary failures are considered, repairs and secondary failures

are not considered in this study.

3.4.1 Development of a Generalized Failure Rate Model

The development ofa failure-rate model starts with the identification of the critical parameters that represent factors influencing the reliability ofthe component.

Practical limitations on the availability ofdata, as well as the need for flexibility, lead to some ofthese parameters being discarded in the final model. It has also been observed that inter-dependencies occur among many parameters that enable the combination of those parameters to form other general parameters that represent the net effect ofthe combined parameters. Some ofthese parameters are applied in calculating the base failure 31 rate while others are included as factors that modify the base failure rate in a given context. The consideration ofall these factors results in a simple model that is reasonably accurate for the population, and yet is dynamic and flexible enough to allow modification in order to accommodate new techniques [41, 30]. The model also needs to provide sufficient protection against design and usage variances that may affect reliability [30].

A valid prediction model depends critically on the ability to obtain valid data sets with regard to the factors that affect reliability. In cases where practical considerations limit the collection ofoptimal data, contextual analysis and data from related studies are used to estimate failure rates for that condition. Differences in manufacturing, operating environments and other factors cause the failure rate ofindividual parts to deviate from the population average. In situations where the usage rate is high, such as the one considered in this study, the model is expected to be fairly accurate because ofthe influence ofaveraging effects that tend to cancel the deviations out. Mil-Hdbk-217 models are theoretically expected to be impervious to the effect ofthe quantity ofany given part. The generalized model is then customized in each situation to reflect the inherent properties ofthe individual part, the context ofthe application and relevant operating conditions. Failure rates provided by Mil-Hdbk-217E models are point estimates with no confidence intervals, and the statistical uncertainty in the resulting estimation must be acknowledged. It must be emphasized that the main objective of Mil

Hdbk-217E is to provide a common basis for comparing the estimated reliability of competing designs [41 25, 30]. 32

3.4.2 Generalized Mil-Hdbk-217E Failure Rate Model

Mil-Hdbk-217E provides failure-rate models for a large number ofelectronic parts. The parts are classified into major categories for analysis and wherever required are further grouped into sub-categories, examples ofwhich are as shown in table 3.1 [26].

The generalized failure rate model for electronic parts model can be stated as [30]:

AP = AbITa IIb TIn equation 3.1 where AP is the estimated part failure rate, Abis the base failure rate, and ITa, ITb ITn are adjustment factors for various stress conditions as described in the next section.

Microelectronics Ca acitors Printed Wire Boards Switches Resistors

Table 3.1: Examples ofPart Categories

• AD: The Base Failure Rate

The base failure rate in the generalized equation relates the influence ofelectrical and temperature stresses to the reliability ofthe part as shown in figure 3.2 [1]. It is estimated from test data for each generic part type and is influenced by the electrical stress level at given operating temperatures. The base failure rate can be calculated by a 33 specific model equation or can be computed manually for typical values ofstress and temperature. The base failure-rates must be obtained from values actually present in the tables to be valid. Extrapolation ofdata in the empty spaces ofthe table yields completely invalid results, and indicate the usage ofparts in over-stressed regions. Note that all part models are based upon a constant failure assumption as described in previous sections. Every part failure is assumed to be an independent failure, not associated with its connection into circuit assemblies [41, 26, 30].

Electrical Stress Base Failure Levels

Rate (Ab)

Temperature (T)

Figure 3.1 (Adapted from 18): Effect ofElectrical Stress and Temperature on the Base Failure Rate

• TIQ : The Parameter Representing Part Quality

The contributing effects ofthe quality in manufacturing control, and fabrication and testing on reliability is represented in the generalized equation by the parameter IlQ•

Parts ofthe same type may be classified as being manufactured under various quality levels depending upon the standards that were used during its manufacture and testing. 34

See table 3.2 [26] for examples. A set ofrequirements in manufacturing and testing control need to be satisfied before a part can be designated as procured under that level.

[41, 26].

Quality Manufacturing Description

S In accordance with MIL-M-38510 Class S

B In accordance with MIL-M-38510 Class B

Lower Standard Commercial Grade

Table 3.2: Examples ofQuality Levels

Part quality is primarily determined by grades of design, production and testing practices. A point worth mentioning is the accepted opinion that high-quality equipment should consist ofhigh-quality parts, and the same high quality standards must be met at the system level for the failure rates predicted by the model to be valid [26].

• TIE: The Parameter Representing Environmental Conditions

The impact ofall environmental conditions other than ionizing radiation and temperature are represented in the generalized equation by the parameter TIE. This parameter consolidates other sub-parameters representing various environmental conditions that affect the reliability ofthe part. In cases where parts may be exposed to 35 more than one consolidated environment during their operating life, the reliability analysis ofthat equipment should be segmented to include each distinct operating environment. An example ofthis is equipment in spacecraft, that is subjected to two different environments, missile launch and space flight during missions. [41, 26]. Table

3.3 [26] shows some typical environmental conditions that are accounted for in the Mil

Hdbk-217E models.

Environment IlE Description ofOperating Conditions

Ground Benign GB Close to no stress with optimum operation.

Ground Fixed GF Installation in permanent racks with adequate cooling.

Ground Mobile GM More severe conditions with possible vibration and shock.

Table 3.3: Examples of IlE Factors

• I1A •••• TIN : The Parameters Representing Application Adjustment Factors

Secondary stresses and specific application factors that are unique to the nature of

the equipment are represented in the generalized equation by the factors TIA •••• TIN' where

A .... N depend upon the particular part type being considered. For example, IlR represents the parameter adjusting the effect ofohmic values in resistors. Table 3.4 [26] shows some adjustment factors for resistors and capacitors considered by Mil-Hdbk-217E

[41, 26]. 36

Part Type II Factor Description

Resistor IIR Adjusts for the effect for ohmic values in resistors.

Resistor IIv Adjusts for the effect ofvoltage in variable resistors.

Capacitor IIcv Adjusts for the effect ofcapacitance values in

capacitors.

Table 3.4: Examples ofApplication Adjustment Factors

IIE and IIQ factors are used in all part failure rate models whereas other II factors such as IIR and IIcv are applicable only in certain models depending upon the individual component being considered and its application context.

3. 5 Importance of Thermal Analysis

A very important requirement for the validity ofthe model is that temperatures at which the parts are expected to operate, must be specified before any ofthe models are used since reliability ofelectronic components is extremely sensitive to thermal aspects.

Thermal analysis is an important subject that needs to be included in trade-offstudies of performance, reliability, physical properties and costs. 37 3.6 Example Illustrating the Procedure Employed to Calculate the

Estimated Failure Rate of a Component using Mil-Hdbk-217E

This section will illustrate with the help ofan example, the typical procedures that were followed in this study to calculate the estimated failure rate ofan individual component in accordance with the Mil-Hdbk-217E methodology. In this example, the failure-rate ofa ten-thousand ohm, type RN fixed-composition resistor rated at 0.4 watts is estimated using the appropriate models from Mil-Hdbk-217E. Assuming that the resistor is dissipating 0.2 watts in the application, the failure rates at 900 F, 1200 F and

1400 F temperatures are calculated. This is done for every part used in a system to provide the designer with a relative perspective oftemperature effects. The model that describes the failure rate ofthis resistor is given by [26]:

6 x, = AB (TIE X TIR X TIQ) failures / 10 hours equation 3.2

where, Ap is the part failure rate, AB is the base failure rate, TIE is the environmental factor,

TIR is the adjustment factor for resistors, and TIQ is the quality factor. As previously stated, the base-failure-rate is a function ofthe electrical stress and temperature, and is represented in tables such as table 3.5 [20]. Mathematical equations may also be used to compute the base failure rate instead ofthe table look up methods used in this study.

• Step 1: Calculation of the Base Failure Rate.

Dsing table 3.5 [26] for typical values ofstress and temperature. The stress ratio is first computed as [26, 30]:

S = Applied power / Rated power equation 3.3 38

= 0.2/0.4

0.5

Since table 3.5 contains the typical values ofinterest, the corresponding base failure rates for ambient temperatures of900 F, 1200 F and 1400 F are identified.

0 0 0 At 90 F, AB = 0.0024, at 120 F, AB = 0.0033, and at 140 F, AB = 0.0040.

Stress

Temperature °C .1 .2 .3 .4 .! .S .7 .8 .~ 1.0

0 .00061 .00067 .00074 .00082 .00091 .0010 .0011 .0012 .00'4 .00" '0 .00067 .00074 .00082 .00091 .00'0 .00" .001l .0014 .0015 .ooy; 20 .00073 .00082 .000'1 .0010 .0011 .0013 ~OOIC .0015 .0017 .00" ~ .00080 .0009Q .0010 .0011 .0013 .001. .001S .0017 .0019 .om. ., .00088 .00099 .0011 .0012 .0014 .0018 .0017 .0020 .0012 .008 50 .00098 .0011 .ooll .0014 .0015 . .0017 .0020 .0022 .0025 .0Da &0 .0011 .001l .0013 .0015 .0017 .001' .OO2Z .0025 .0028 .OOll 10 .001Z .0013 .0015 .0017 .0019 .OOlt .00lS .0028 .0032 .OOX 80 .0013 .0014 .00 IS .001' .0021 .0024 .0028 .0031 .0035 •oeM 1 .001~ 90 .0015 .0018 .0021 .0024 .0027 .0031 .OOJ5 .~o .~. too .001' .0017 .0020 .0023 .0028 .0030 .0035 .0040 .tXM5 .0052 110 .0017 .0019 .0022 .0025 .0029 .0034 .0039 .00.5 .eos: .~, 120 .0018 .0021 .ooze .0028 .0033 .0038 •DOC 3 .0050 .0058 .0017 130 .0020 .0023 .0027 .00]1 .0038 .0042 .0049 .0055 .00fi5 I., .0022 .002& .0030 .0035 .0040 .0047 .0054 150 .0024 .0028 .0033 .0038 .0045 \ Empty spaces 160 .OO2~ .0031 .0038 indicate over- 170 .0029 stressed regions

Table 3.5 (Adapted from 18): Base Failure Rates for Resistors at Various Combinations ofTemperature and Stress Factor Values.

Ifthe table does not show corresponding base failure rate values for specified values of stress and temperature then the model assumes that the resistor would be operating outside ofrated conditions and recommends redesign for the model to be valid. 39

• Step 2: Estimation of the Environmental Mode Factor (TIE)

Assuming the application being considered requires an environmental mode factor

ofGround Fixed (GF). From table 3.6 [41] the value of IlE would be equal to 2.4.

Environment IIE

GB 1

GF 2.4

GM 7.8

Table 3.6: Examples of IIE Values for Different Environments

• Step 3: Estimation of the Application Adjustment Factor for Resistors (IIa>

Ohmic Range IIR

oto lOOK 1.0

> than .1M to 1M 1.1

> than 1Mto 10 M 1.6

> than 10 M 2.5

Table 3.7: Examples ofIIR Values for Different Ohmic Values 40

The resistor under examination has a 10,000 ohm Resistance. This corresponds to a

Resistance Factor (IIR ) of 1.0 from table 3.7 [26].

• Step 4: Estimation of the Quality Factor (IIQ)

The various quality factors included in this model are shown in table 3.8 [26]. In this study, the quality level ofthe part being examined in this example is accepted to be

lower which corresponds to a IIQ of 15.

Manufacturing level IIQ

S .03

p .1

Lower 15

Table 3.8: Examples ofIIQ Values for Different Manufacturing Levels

• Step 5: Calculation ofthe Part Failure Rate from Equation 3.2

Inserting the values identified above in equation 3.2:

0 6 At 90 Ap = 0.0024 x 1 x 15 x 2.4 failures 110 hrs

= 0.0864 failures 1106 hrs

0 6 At 120 Ap = 0.0033 x 1 x 15 x 2.4 failures 110 hrs

= 0.1188 failures 1106 hrs 41

0 6 At 140 Ap = 0.0040 x 1 x 15 x 2.4 failures 110 hrs

= 0.144 failures 1106 hrs

The above example illustrates the procedure followed for most ofthe several hundred parts evaluated in this research work. It is important to note that while this procedure is fairly typical for most part types, each part type involves a series oftable look ups that are unique to that part type.

3.7 K-Factors and Modal Analysis: Some Modern Concepts

Rome Air Defense Center (RADC) studies conducted on air-borne electronic systems have concluded that the Mean Time Between Failures (MTBF) decreases by about 9% for every five degrees centigrade increase in ambient temperature. This phenomenon is valid over a limited range. Such specialized knowledge ofa particular application environment in recent years has led to a experimental concept called K factors that attempts to represent the effect offactors over and above the main electrical and temperature stresses. These factors are applied to the final failure rate obtained by stress analysis from knowledge gained through experience in the particular environment.

For example, ifthe base failure rate equals 10% per 103 hours then ifthis equipment was used in missiles, the failure rate would be 10% per 103 multiplied by the K-factor for this part when deployed in missiles [29]. While this concept is not as widely accepted as the

Mil-Hdbk-217 models, researchers are increasingly recommending the use of 42

modification factors that are based on experience, during the application ofthe strict Mil-

Hdbk-217 models [23].

3.8 Modal Analysis and Modification of Failure Rates

This systems evaluated in this research work account for redundant configurations

ofPrinted Circuit Boards (PCBs) assembled at the system level. However, within a given

PCB it is assumed that there is no redundancy in their components. In such non

redundant part designs and their stress analyses, the mode ofpart failure is considered

irrelevant. In some circuits that include redundancy as shown in figure 3.3 [29], the mode

offailure has an impact on the failure rate ofthe system. The study ofsuch impacts is

called Modal Analysis. The total part failure rate is multiplied by the estimated probability ofthe failure mode to obtain the final failure rate [29].

3.9 Degree of confidence in stress analysis

Estimates offailure rates computed with Mil-Hdbk-217E models are point

estimates with no confidence intervals. However, several studies conducted in recent

years have attempted to compare the failure rates obtained from reliability predictions using stress analysis with actual observed values in field conditions on the same equipment [20, 29]. Results from such studies have shown that most measurements

estimated by stress analysis fall within the 90% confidence interval estimate. 43 3.10 The Effect ofVariable Operating Conditions

The exponential distribution adequately describes the reliability behavior of electronic parts in their useful lifetimes, with the assumption ofa constant failure rate. As described earlier, Stress Analysis and Mil-Hdbk-217E methodology yield MTBF values under a given set ofoperating conditions. Studies have been conducted to observe the reliability behavior ofequipment deployed in operating conditions that vary from those originally intended. Studies conducted by T.C. Reeves [29] demonstrate a relative comparison between the temperature change profile during a specified observation period and the corresponding failure rate profile. It was seen that the failure rates vary significantly with temperature, increasing as the temperature rises and decreasing as the temperature falls.

3.11 Critical Review of Mil-Hdbk-217E and Modern Controversies

Mil-Hdbk-217E has been widely acknowledged in the scientific arena to be the most reliable resource for the estimation offailure rates ofelectronic parts. Recently, however, a number ofcritical opinions have been expressed [23, 31, 32, 3, 16]. One frequent complaint is that the Mil-Hdbk-217 model requires strict compliance with the parameters stated therein for any predicted value to be valid. Puran Luthra emphasizes that reliability knowledge and individual contextual experience in a given organization must be used hand in hand with Mil-Hdbk-217 models, and the designer should appropriately apply adjustment factors to the said models [23, 34, 32]. It has also been 44 stated that the cooling factor, card configuration, method ofsoldering and frequency of operation are especially particular areas in which individual organizational experience must be combined with the models stated in Mil-Hdbk-217 for reliable results [23].

It is a widely accepted fact that failure rates predicted by Mil-Hdbk-217 are many times higher than those observed in the field. Proponents of Mil-Hdbk-217 [25] argue that conservative estimates are always preferable, and the dissenters maintain that a conservative approach does not give a true picture ofthe situation.

An acknowledged weakness in the Mil-Hdbk-217 models is the constant-failure rate assumption. This assumption precludes reliability growth assessment during burn in periods and failure rate estimation during wear-out periods. Pecht and Kang [34] conclude in their study that the Mil-Hdbk-217 models incorrectly assume that each component is affected only by the specified environment and that there is no interaction between the components themselves.

Leonard [37] contends that Mil-Hdbk-217 model does not accurately depict the relationship between temperature and failure rate, causing errors in the cost-benefit analysis ofcooling system requirements. The author strongly condemns the accepted opinion that temperature exerts a strong influence on failure rate and asserts that there is no technical basis for that opinion. He cites examples where field studies were conducted on production boxes in which the temperatures ofinternal components were measured. It was found that the hottest boxes had better MTBFs, thereby indicating a contradiction with the MTBF versus temperature correlation predicted by the Arrhenius relationship 45 used in Mil-Hdbk-217. He concludes that temperature is just another design variable in the moderate regions below 150 degrees F, and should not be counted as a major influence [37]. A widespread opinion is that the temperature to failure relationship stated in Mil-Hdbk-217, which is based upon the Arrhenius formula for reaction kinetics in physics and chemistry, is inapplicable for modem electronic parts which typically do not suffer from appreciable physical or chemical degradation. In his research paper O'Connor challenges the assumption that model designs that have more parts are less reliable, and argues that the increased number ofparts may actually increase system reliability by over-stress protection and compensation [32, 30, 29]. Mil-Hdbk-217 assumes that a plastic encapsulated commercial grade part has a failure rate 20 times higher than a military grade part. Scientists debate that current evidence suggests that this substantial difference may no longer be true because ofchanges in the quality ofcommercial parts which today are as good as military parts [32, 37].

It should be noted that the preceding discussion ofcontroversies in Mil-Hdbk

217E methodologies represents opinions expressed by a minority in the scientific arena.

These opinions, while largely un-proven, serve to provide a diverse perspective about this complex field ofstudy. As mentioned in several sections ofthis thesis, it should be clear to the reader that Mil-Hdbk-217E with the data and methodology contained therein, is still accepted to be by far the best source ofelectronics reliability evaluation information that is available to the scientific community. The participants ofthis thesis work concur 46 with this view and hence have employed Mil-Hdbk-217E in the reliability estimation procedures that have been employed here. 47 Chapter 4

System Reliability and E-R Modeling

4.1 Estimating Sub-System Reliability

As described in chapter 3, Mil-Hdbk-217E methodologies were applied to calculate the failure rates ofeach ofthe several hundred components considered in this study in order to create a qualifying parts list. Assuming that in a given Printed Circuit

Board (PCB), failure ofa component causes PCB failure and that the constituent components are applied within their constant-failure-rate phase, the PCB can be assumed to also exhibit a constant failure rate [29]. This constant failure rate is described by equation 4.1 .

equation 4.1

where APCB is the constant failure rate ofthe PCB, and Ai is the constant failure rate ofthe ith series component.

The corresponding reliability may be described by equation 4.2.

i = n R(t) = exp ( - L x, t ) equation 4.2 i = 1 48

4.2 An Example of Sub-System Reliability Estimation

Every PCB considered in this study is made up ofsome combination ofparts from

the qualifying parts list. Since the failure rates ofall qualifying parts have already been

calculated at this point, Equation 4.1 may be applied to calculate sub-system failure rate.

Table 4.1 illustrates this procedure for a simple digital output card made up oftwelve

types ofcomponents, each ofwhich occurs in varying quantities represented by 'N'. The

column "A" contains the failure rate ofeach single component and the column "N*,,-"

contains the net failure rate of all appearances of that component. As before, the failure rates are calculated at three different temperatures to provide the designer with a relative perspective oftemperature effects. All failure rates are per million hours. All MTBFs are in hours. This procedure was carried out for every PCB module being considered in this study to calculate its failure rate. The next chapter illustrates how this procedure is implemented dynamically by an automated system that can easily accommodate "what if' type ofquestions regarding design parameters along with continuous changes in the constituent parts or their failure rates.

In table 4.1, "-_90 is the failure rate of a part at 900 F and N*A_90 is the net failure rate of all appearances ofthat part in the PCB. "-_125 is the failure rate ofa part at

1250 F and N*"-_125 is the net failure rate of all appearances ofthat part in the PCB at

1250 F. A_145 is the failure rate ofa part at 1450 F and N*A_145 is the net failure rate of all appearances ofthat part in the PCB at 1450 F. 49

Part Description A 90 N*A 90 A 125 N*A 125 A 145 N*A 145

Relay 0.149 0.297 0.175 0.351 0.207 0.414 lOA Fuse 0.100 0.200 0.100 0.200 0.100 0.200

Jo- Connector 0.006 0.011 0.006 0.011 0.006 0.011

Diode IN4742A 0.011 0.022 0.063 0.126 0.093 0.185

Zener-Diode 0.049 0.049 0.132 0.132 0.179 0.179

Diode 0.007 0.013 0.040 0.079 0.059 0.117

LED 0.078 0.156 0.161 0.322 0.208 0.417

.01mf/500V Cap 0.013 0.053 0.014 0.056 0.014 0.057

100 Resistor 0.003 0.013 0.006 0.026 0.009 0.037

2.2K Resistor 0.003 0.006 0.006 0.013 0.009 0.018

10K Resistor 0.003 0.003 0.006 0.006 0.009 0.009

PC Board 0.009 0.009 0.009 0.009 0.009 0.009

Total F.R 0.83 1.33 1.65

Total MTBF 1204000 751518 604518

Table 4.1: Example ofSub-System Failure Rates Calculated in this Research Work

4.3 Reliability Estimation ofa Simple Series System

Once the failure rates ofall PCBs considered in this study have been calculated, the high-failure-rate PCBs are identified, and the resulting set ofPCBs is then made 50 available to the designer to facilitate building large complex systems. In the simplest series configuration the failure rate ofthe final system is calculated as the sum ofthe failure rates ofthe PCBs used in the system. For the purpose of failure-rate calculation, this approach assumes that the system would fail ifany constituent PCB fails [29].

4.3.1 Network Model of a Simple Series Configuration

In its simplest form the desired system may be built and modeled as a series of

PCBs required to achieve system functionality as described by the reliability network model shown in figure 4.1 [36]. For such a series configuration, system reliability is given by the product of constituent unit reliabilities and is represented by equation 4.3 called the chain rule[41, 12].

~------~ IN ---I Part 1 H Part 2 PartN OUT

Figure 4.1: Network Model ofa Series Configuration

equation 4.3 where, R;(t) = e-Ait is the reliability ofthe ith unit, assuming a constant failure rate.

Equation 4.3 may be written as:

e-Ast -e_ -Alt e-Alt e-ANt therefore, e-Ast = e-t(AI + Al + AN) 51 and, As = Al + A2 + + AN equation 4.4 where As is the failure rate ofthe system, and Ai is the failure rate ofthe ith unit.

From the above equations it is obvious that the reliability ofseries systems decreases as the number ofcomponents increase. This effect is depicted in figure 4.2, and is most obvious when individual components have low reliability [12].

R(t)

t Figure 4.2: Effect ofSeries Components on System Reliability

Once the high-failure-rate items have been identified and are deemed necessary for achieving system functionality, the designer should investigate the possibility of adding redundancy to the high-failure-rate items in order to minimize the contribution of their presence to system unreliability. It is important to note that attempt should be made to improve system reliability by manipulating other circumstantial conditions such as the operating environment before considering redundancy as a viable option. This is because redundancy typically increases system complexity, volume and cost [29, 41, 12]. 52 4.4 Introducing the Concept of Redundancy

In cases where the failure rate ofa PCB is undesirably high, and other means of

improving reliability have been exhausted, the designer may alleviate the part's adverse

effect on system reliability by incorporating more than one unit in a redundant configuration in order to improve its net failure rate. This section examines how redundancy can be incorporated into the system to improve its reliability. Some common redundant configurations available to the designer at the sub-system level in order to achieve a specified reliability level ofthe overall system are also examined. Some of these techniques are incorporated into the automated system built in this study.

4.4.1 Types of Redundancy Techniques

As shown in figure 4.3, there are several types of redundancy techniques available to the designer to improve system reliability. Only a few ofthese techniques, namely the categories under parallel redundancy were used here. However, a few additional useful mechanisms that could be exploited in more demanding situations will be described. The designer should seek to achieve a balance between desired reliability and other qualities ofthe system, such as cost and complexity [41, 12]. 53 ~------ITypes of Redundancy y y

y y y y Operating

y y y y y ~ISimple __....__1Duplex Bi Modal 1 Majority Vote Gate Conn I y y Simple 1 Adaptive 1

Figure 4.3 (Adapted from [1]): Types ofRedundancy Techniques

4.4.2 Simple Parallel Redundancy Configuration

In this configuration, high-failure-rate items are identified, and for each identified unit, additional units ofthe same type are added in parallel such that only one needs to function for the configuration to be operational. figure 4.4 shows the reliability network model ofa simple parallel configuration. These configurations essentially provide protection against failures due to opens and their reliability can be described by equation

4.5. i = n equation 4.5 54 where, R, is the reliability ofthe parallel configuration, and Qi is the unreliability ofthe ith unit.

The corresponding unreliability ofthe configuration is given by equation 4.6 i =n equation 4.6

This equation shows that the unreliability ofa parallel configuration decreases with an increase in the number ofparallel components, causing the reliability to increase [12].

- Module 'x' #1

1 (;1------Module Ix' #2 ------Iu Input Output

' . Module Ix' #n

Figure 4.4: Network Model ofa Simple Parallel Configuration

In this simple parallel redundancy, the system is considered successful ifat least one of the parallel units is working at any given time. The reliability ofthe redundant configuration improves significantly with the addition ofredundant units. As is the case in this study, the assumption of a constant failure rate leads to the following description i = n ofreliability [12].

R(t) = 1- n (l-exp(-Ait)) equation 4.7 i =1 55 and the corresponding MTBFs for 2 and 3 unit parallel systems employed in designs considered in this study are given by equation 4.8 and 4.9 respectively.

For a 2 unit parallel configuration,

MTBF =- 1 + -1 - 1 equation 4.8 x, A2 and when Al =A2 = A

MTBF = 1 + 1 - 1 3 equation 4.9 - - --- = A A A +A 2A

For a 3 Unit parallel configuration,

MTBF = 1 + 1 + 1 - 1 equation 4.10 ------Al A2 A3

As applied in this study, when Al = A2 = A3 = A, equation 4.10 reduces to,

MTBF = 1 + 1 + 1 - 1 1 + - - - --1 1 A A A 2A 2A 2A 3A or

MTBF= 11 equation 4.11 6A

i= n or in general, MTBF = L L 1 equation 4.12 A 1 i = 1

As described in later sections, the automated system developed in this thesis allows the designer to specify double or triple simple parallel redundancy for high-failure-rate units and automatically consolidates the corresponding effect on system reliability [41,12]. 56 4.5 Bimodal Parallel/Series and SerieslParallel Configurations

These redundant configurations described by the network model shown in figure

4.5 are a combination ofseries and parallel configurations ofthe same unit that can be specified by the designer to achieve the desired level ofreliability and, simultaneously achieve design objectives to guard against shorts and opens [41, 12]. In this study the designer is allowed to specify multiple parallel configurations placed in series, that have the number ofparallel units being restricted to a maximum ofthree.

PCB PCB

IN OUT

PCB PCB

IN OUT

PCB PCB

Figure 4.5: Network Model ofSeries-Parallel and Parallel-Series Configurations

The net MTBFs for series/parallel and parallel/series configurations are given by equations 4.13 and 4.14 respectively. 57 for series/parallel configurations,

MTBF= 3 equation 4.13 4A for parallel/series configurations,

MTBF= 11 equation 4.14 12A

4.6 Partial Redundancy

Partial redundancy as depicted in figure 4.6. refers to situations wherein parallel redundancy is added only to the high-failure-rate units in a system and the rest ofthe series chain is left untouched. This concept allows the designer to achieve a balance between desired reliability and other factors such as cost, weight and volume [29, 41, 12].

- Module 'e' -

0-- Module 'A' - Module 'B' -' - Module '0' --0

Module 'C'

Figure 4.6: Network Model ofa Partially Redundant System

4.7 Some Complex Redundancy Configurations

This section shall touch upon a few complex configurations that are available to the system designer but have not been implemented in the automated system developed in this study. These complex configurations are justified ifthe reliability achievements are 58 deemed to be worth the increase in system complexity, and increased maintenance costs along with the increased weight and volume.

4.7.1 K out ofN Networks

In many cases such as in fault tolerant computers, more than one unit needs to function in a simple parallel configuration for the system to operate successfully. This is called a K out ofN network. In this configuration, K units out ofN need to function for the configuration to be successful. Under the assumption ofa constant failure rate, the

MTBF of such a network is represented by equation 4.15

i = n-k MTBF = .! L _1_ equation 4.15 A i = 0 (n - i)

MTBF

Failure Rate

Figure 4.7: MTBF Versus Failure Rate for K out ofN Networks 59

It can be seen from equation 4.15 that ifall units need to function that is, if k= n, the network model can be treated as a series model for the purpose offailure rate calculation. ifonly one unit needs to function, that is, if k= 1, the network model may be treated as a simple parallel model for the purpose offailure rate calculation. Figure 4.7 [12] depicts comparative MTBF graphs for a series, k out ofn and parallel configurations for the same number ofunits [29, 41, 12].

4.7.2 Majority Voting Redundancy

The concept ofK out ofN redundancy has been extended in what is known as majority voting redundancy as shown in figure 4.8 [41]. Every parallel element feeds its signal into voting analyzer which compares the signal from each parallel device with those that it receives from every the other parallel element and makes operating decisions only ifthe number ofsuccessful elements are more than the ones that have failed.

PCB

PCB OUT IN PCB

Voting Analyzer PCB

Figure 4.8: Model ofa Majority Voting Redundant Configuration 60 4.7.3 Operating Redundancy

Another configuration used by designers is operating redundancy as shown in figure 4.9 [1]. In this configuration, all redundant units operate continuously, and when the currently used unit fails, a sensor switches over the connection to the next working unit, and stays there until that unit fails. This process continues until all units in the configuration have reached a failed state.

PCB

PCB OUT IN PCB L1 PCB

Figure 4.9: Model ofa Operating Redundancy Configuration

4.8 Optimum Redundancy Levels

As described in previous sections, incorporating redundancy in electronic systems increases their reliability. However, increasing the number ofredundant components also causes greater system complexity, increased weight and size ofthe system, and adds to the cost [12]. This phenomenon implies that redundancy should be added to the system only after a careful review ofthe context in which the system is applied, the criticality of 61 the mission, identification ofthe high-failure-rate items, and a comparative analysis of other desired qualities ofthe system [29, 41, 12]. figure 4.10 [29] shows the incremental gain in reliability as the number of parallel components are increased. It can be seen that the greatest improvement in reliability is with the addition ofthe first component and thereafter the incremental gain is reduced as more units are added [41, 12].

Incremental Reliability gain

··... . ·.·~.. . ·..· , . ·. . · ... 2 3 4 5 Number ofparallel elements

Figure 4.10: Incremental Gain in Reliability with Parallel Redundancy

Figure 4.10, adapted from [29], also shows the incremental improvement in reliability over that ofa single component, as parallel units are added. It is seen that the greatest improvement in reliability occurs with the addition offirst component and then begins to level offas the number ofcomponents increases [29, 12]. The improvement in reliability becomes insignificant with the addition ofmore than four units in parallel. Redundancy studies conducted on air traffic control systems [29] have shown that from a cost-benefit point ofview, a four-unit parallel configuration provided the best returns over a five-year 62 period. It should be emphasized that while these examples provide guidelines, optimum levels ofreliability must be analyzed only in the context ofeach application [29,1,12]. In the study conducted for this thesis, a maximum ofthree units in parallel were considered sufficient to achieve reliable operation ofthe systems that were examined.

4.9 Introduction to Techniques used in Modeling the Real World

Referring to the work done in this thesis, application ofthe afore-mentioned techniques has yielded data and knowledge that include the following:

1. Sets offailure rate data at three different temperatures for all possible parts that

could be used in building the system.

11. Identification of the possible objects in the project.

Ill. The domain knowledge of electronics reliability engineering to optimally

configure various assemblies and systems from a reliability viewpoint and

calculate their failure rates.

The following sections introduce the steps in the life cycle that transform the real-world data, objects and knowledge into computer representations in order to facilitate the design and development ofan automated system that would assist engineers and designers to configure large electronic systems and estimate their failure rates. 63 4.9.1 Transformation ofthe Domain into a Useful Computer Representation

The unified field theory ofinformation [24] proposes that the various information

structures although syntactically different from each other, are basically transformations ofeach other. Real-world information may be represented by a variety ofstructures that

include natural language sentences and their parses, relational mathematics and semantic entities. The basic theory is that, while human beings speak languages and are comfortable with pictures, graphs, and simple numbers, computers parse languages and store and manipulate complex data by relational algebra and complex predicate logic.

Sentences and queries can be parsed and converted to semantic structures and predicate formulae that can be solved using relational mathematics. This basic concept has evolved significantly as knowledge in the field of computer modeling has grown [39, 24].

4.9.2 Controlled Iterative System Development Life Cycle

As shown in figure 4.11, the system development life cycle involves several distinct stages that a project goes through in its evolution towards a reliable production system. This section concentrates on the techniques involved in implementing the first three stages ofthis process [3]. After the overall strategy ofsystem goal, design and development is determined, a controlled iterative development cycle is begun. Each iterative cycle begins with the identification ofthe major targets ofthat cycle. Effort is then directed towards achieving those intermediate goals by taking the system through 64 typical design development and internal testing cycles, with user feedback being used to correct any major changes in direction.

Iteration Documentation

Overall Production Strategy ....----- Deployment

Iteration Development

Iterative cycle

Figure 4.11: Controlled Iterative System Development Life Cycle

Each such iteration takes the overall system further towards the final goal at which point overall system testing is done and prepared for migration into a functional production system.

4.10 Data Driven Systems

The data-intensive, transaction-oriented application being developed in this thesis has at its core, scientific data that has been derived through a detailed and intensive process. The required system is then built around this data. In recognition ofthe importance of such data, the Committee on Data for Science and Technology

(CODATA) has accepted three classes ofscientific data [38]: 65

1. Class A data obtained by performing repeatable measurements on well defined

systems

11. Class B data that includes observational values that could be time or space

dependent and may not necessarily be verified by direct measurement.

Ill. Class C data which includes data, obtained through statistical techniques.

Data ofthe type derived in this thesis involves use ofstatistical techniques and essentially

fall in class C. All data being considered are atomic [40] data types that are not divisible further without causing a loss ofmeaning. The first step in building a system that uses this data is to build a logical data model that represents the real world. This consists of identifying all objects and entities ofrelevance in the project, carefully analyzing their attributes, and forming the relationships that assist in creating further associations and knowledge [40]. This study implements data modeling techniques described in the next section, to achieve such a representation

4.11 Data Modeling

Several techniques ofdata modeling have evolved with the rapid development of systems in the field of scientific and industrial computing. These may be classified as hierarchical, network, binary, entity relationship, semantic and infological [24]. In this research study, advanced versions ofthe entity relationship model first proposed by E. F.

Codd ofthe IBM San Jose research laboratory [24], and semantic modeling first 66 introduced by Chen [29] were applied to ultimately create a relational data model for final implementation.

Conceptual Modeling

Physical Model

Figure 4.12: Transformation ofthe Real World Into a Physical Model

Most researchers ofdata modeling agree that the relational model is the best model for the following reasons [24];

1. It is based on solid mathematical theories such as relational set theory, relational

algebra, and relational calculus.

11. There are proven algorithms and theorems in relational math that ensure that data

stored in a relational data base can be extracted efficiently.

11. The reliance ~n mathematical background supports the expectation that

standard manipulation techniques will yield consistent and accurate results. 67

4.12 Modeling the Functional Aspects ofthe Intended System

As previously described, while the analysis and design ofthe system is centered around the structure and organization ofthe data, simultaneous effort must be spent in incorporating the knowledge-based features ofthe intended system [39].

IUser Requirements I

Data Flow and Functional Hierarchy

Interaction ofthese three components provides system Design of functionality Transactions and Physical Model Behavior

Figure 4.13: Data and Function Model Interaction to Achieve System Behavior

Figure 4.13 shows the steps involved in the design process that achieves this aim. In terms of information that is ultimately visible to the user, it is the external model or the user view that is the starting point for the process oftransformation. This view should 68 represent the data in a structure that is most conducive to effective usage ofinformation by the system. This may be in the form ofactive screens, reports, graphs etc. The model must incorporate all functions that the user needs to be productive. Once this has been established the next step is to identify the conceptual model that would allow optimum internal processing within the constraints ofthe real world system to provide the user with the external view [38, 39].

The conceptual model is then mapped to a logical view that is supported by the particular RDBMS software that is finally chosen. Ultimately the logical view is mapped to the physical view as shown in figure 4.13. This whole process ofmodeling data to represent the system is called data and function modeling [39].

4.13 Conceptual Modeling: An Object-Oriented Approach

A crucial problem faced by software designers during the course ofsystem development is to represent the complex knowledge, data and objects relevant to the problem in order to derive a design specification. The resulting specification represents the core ofany further work done on the system and its reliability determines the reliability ofthe final system.

This representation and specification development process is implemented in modem design by conceptual modeling. Conceptual modeling is closely involved with the process ofunderstanding an application domain and then leading that knowledge through a series oftransformations in which knowledge ofinterest about the real world is 69 converted to a formal design specification. The designer has to understand and extract knowledge ofthe modeled domain referred to as the Universe of Discourse (UOD), defined as that part ofthe real world that is relevant to a specific problem. According to the International Standards Organization (ISO) conceptual schema are defined as the classifications, rules and laws ofthe UOD.

Knowledge ofthe UOD may happen to be distributed amongst several users, designers and in established theory. This research study approaches this knowledge acquisition problem from an object-oriented viewpoint and employs entity relationship modeling to represent the system. According to this model, the universe and the problem at hand are made up ofobjects or entities and the relationships that they share with one another. Entities are normally tangible objects that have properties or attributes and relations are any commonality or property they may share. Theoretically, relations may also be considered as objects, but in this technique, for the purpose ofrepresentation, there is a clear separation between the objects and the relations by which they are connected [38, 24, 3].

4.13.1 Object Oriented Modeling

The science of conceptual modeling has been crystallized from areas such as artificial intelligence, database management and procedural programming. Database design essentially involves representing static properties ofthe system being considered whereas programming involves the dynamic behavioral aspects ofa system at runtime. 70 The object- oriented model stresses the importance ofintegrating the structural and behavioral models to achieve a true representation ofthe system. Object-oriented modeling includes functional approaches that recommend system specification based on functions and data flows. Many scientists have considered mixed approaches of functional and object-oriented techniques to achieve optimum representation [37]. It also includes the data approach that involves the description ofobjects in an entity relationship model or in semantic models [37], and the object centered approach that describes the system as a collection ofobjects that are in a state ofinteraction with each other [24,37]. As stated before, the system built in this study utilizes an original combination ofentity relationship modeling and data flow diagramming to describe the intended system. 71

CHAPTERS

Design and Development of a Reliability Evaluation System

5.1 Software Engineering Paradigms Employed in this Study

The work done in this study has taken into consideration various software engineering paradigms that have contributed towards making modem life cycle development such a highly structured and efficient process. These paradigms include

[37]:

1. Life cycle modeling [Royce 1970 ]: This process was employed to formulate the overall strategy ofthe development cycle and identify the intermediate tasks that would be performed. The standardization ofthis crucial process assists in the industry wide evolution ofcomplex yet reliable products.

11. Prototyping [Boehm et ale 1984 ]: The process ofprototyping was employed in iterative cycles to produce rough but working models ofsubsystems, interfaces and the planned system to give the end user a more realistic idea ofthe final product. This also gives system designers the opportunity to obtain feedback and implement appropriate changes before going too far down the road in the wrong direction. This is extremely important because more often than not, there exists a communication gap between the perceptions of technical developers and end users. 72

111. Fourth generation techniques [Cobb 1985 ]: These modern tools and techniques

were applied in areas that demanded a quick and useful way ofpresenting interactive

screens that could also communicate with the relational database that was developed.

IV. Formal approaches [Jones 1981 ]: Formal methods were implemented in the areas

ofrequirements analysis, specification creation, software testing and quality assurance.

5.2 Application of Computer Aided Software Engineering (CASE)

Several development methodologies have evolved in recent years that cater to particular classes ofprojects. Traditionally, these methods have evolved along two main philosophical viewpoints, process oriented and data oriented. De Marco and Lundeberge

describe process oriented approaches as emphasizing the inter-relationships between the activities ofthe application domain. Nijssen and Halpin describe data oriented methods as inclined more toward the facts and data associated with a modeled application and the relationships between these facts. Each ofthese methods recommends guidelines and specification conventions along with rules for efficient continued development that ensure the consistency and quality ofthe evolving specification [37]. The work done in this study is classified as a data-oriented project, with Structured Analysis and Design

(SASD) being the methodology ofchoice. Along these lines, CASE (Computer Assisted

Software Engineering) has been employed in system analysis and decomposition as described in the next section. 73

5.2.1 Advantages of Computer Assisted Software Engineering (CASE)

Information systems engineering suggests that for a project to be successful, it is

imperative to understand and chart the overall strategy ofhow critical information is to be

used and shared among various functions ofthe project. Program structures can then be

built as needed on top ofthe data model that succinctly describes the intended system [3].

This implies that all functional procedures follow from the data and hence rely heavily upon the quality ofthe data model. Computer Assisted Software Engineering (CASE) methods and tools first introduced by the IBMTM AD cycle program illustrated in figure

5.1 [40] are today playing a major role in modern information system design and analysis, helping to achieve high quality systems at best cost [3, 18].

Modeling Programming Code-Generation Prototypes Database-Generation Traceability Regression testing

Application Specific Design and Development Platforms

Common Repository ofInformation

Figure 5.1: Illustration ofCASE techniques 74

Classical CASE methods recommend the use ofa global repository ofdata and knowledge that can be accessed by all tools at various stages ofthe development process.

RDBMS products such as Oracle" employ tools such as CASE Designer" to accumulate the relationships between the logical entities, relationships and functional hierarchies and store the connections with the physical objects in the database. These same physical objects can then be updated and maintained by the CASE Dictionary?' tools to refine and update the database designs. One may go further and generate prototype code from the very same repository using CASE Generator?", All ofthese tools are tied together through a sophisticated methodology called CASE method" [3, 46]. In this thesis, CASE has been employed from the methodology perspective and serves mainly to represent the system through entity relationship diagrams and identify the data flow from a functional point ofview.

5.3 Implementation Life Cycle

Once the project has been analyzed in a general overview, the process of crystallization into a final product has to pass through the structured steps described in the following sections.

5.3.1 Achieving a Requirements Definition

Requirements engineering refers to the procedures that deal with the subject matter ofthe application domain and serves to achieve an understanding ofthe desired 75 features of the planned product [37]. Knowledge ofthese features may be distributed among subject matter experts, end users and documentation. A key phase ofthis process consists of carefully eliciting, crystallizing and representing the essential requirements of the planned system in the form ofa formal requirements definition [37, 8]. The work done in this thesis implemented several ofthese steps and included interviews with end users, formal presentations to designers, and visits to the client site to observe first hand how the system would be deployed. Reliability engineering documents were also analyzed to better understand the needs ofthe client organization and suggest key features that would facilitate all desired functionality. In the classical sense, requirement specification is described as representation ofa model that describes the intended system and adequately represents the statement ofthe problem at hand. Researchers have long recommended involvement ofthe user community right from the beginning ofthe formation ofthe conceptual model [37]. This thesis supports that argument and has made the attempt to emphasize user participation through continuous discussions, interviews and presentations.

5.3.2 Evolution ofthe System Design

After the requirements definition is procured, the process ofconceptual modeling begins to transform it into a design and implementation specification. The requirement specifications were subjected to systematic analyses and were crystallized into an entity- 76 relationship model ofthe relevant environment as shown in figure. 5.2 and a potential data flow diagram that would achieve system functionality, as illustrated in figure 5.3.

com.-.a •••,• • 1'" •••.•.••••••••.••.•••••••••• _ ••••••.•••••••••••••••••••• , ...... 1•••• : ••I'~ VENDOR CLIENT ~II.ATI alQAHIIATIa.

' ... ITBt DfYElCA-&EHT NJHTIME IYlTat uaER I.e.".'•••• DEVELOPER EOJIPt.EHT EaJIf\ENT • C44. • Ca"eo" • 1iI.0',___ • NAfM •C•• 'g."

• "yl•••n4 'IV I•••,,4 • : .. ,d.'. upda'i : ·...... · · • .nal 'Ia~·· t.tA8TER ll.T · · It, · · ,.vl•••d a,,4 · ELEClfOUC • ,." Numb" ..' ••••4 " · • f.R .t 10 0'1 · crr;:ftt.Tlnlb a, • f.ft at 10 D.' · I , 'a,' N.ma • f.ft •• tzo D.8..------, · · •, ,.",.'1 D•••. u.,4 I. · · ~::1::;r · · • •. .valu•••4 · · 'r ,,. .·: .1,4 · . : It, · ••n'a'. ,.,. .wal " •••,: ••• · I,om · · l~ IY'lEW RELIAIILITY · EVALUATIa. · REL.ABILITY PRINTEO • I,•• em .....lBA · TEQ.HOOI' CltnJlT .. ~., · 8:)AR) o ap.G NURI ·...... • ..... ho.-. · • ,.,,,,,,1 •• • Mo4",1. , fun"I." 'aoor,o,a••" HutTb., o O.h.r " • ppll.d ... IVllu"ld It, I ¥ •• U. 1I' : :"••• · l .....•...... •...... •.••.•.•.....·,

Figure 5.2: Entity Relationship Model ofthe Reliability Estimation System

These logical models are very closely tied with the requirements definition and conceptual design phases and are independent ofany hardware or software that would eventually be employed [37]. The above entity-relationship diagram was formulated following strict CASE method conventions and has only a single standard interpretation.

It is important to note that the diagram is not merely an artistic representation, but is an 77 active working document that has intelligent relationships with the dataflow diagram and the database objects. The diagramming tool also enforces logic during the formulation of the entities and their inter-relationships.

As a simple illustrative example, ifthe CASE user specifies the following two relationships:

1. A> Band,

11. B > C then

Ill. A > C would be a logical relationship validated by the CASE diagramming tool.

The diagram would also automatically specify the constraints that are needed to achieve that logical condition. This diagram further makes it extremely simple to design and build an efficient database structure that has logical integrity and which guarantees with mathematical certainty, efficient storage and access paths to the data that is needed to support the intended system. The data flow diagram shown in figure 5.3 is also an intelligent diagram that has hooks into the functional hierarchies that are ultimately incorporated into the system. Like the entity-relationship diagram, the data-flow diagram also is constructed according to strict rules ofrepresentation and interpretation. It concisely describes through universal symbols, how data flows through the system, the functions that act upon and transform that data into useful information and the data stores that are ultimately needed to achieve system functionality. Following the logical design, several potential physical designs were examined along with the advantages and disadvantages associated with each. 78

• r, •• ,~. EI •••'..... •, •••, f ••• I, t~ Rll_ava •••,., I. f •• ',, , ."""1 I., , .

4. t FlAYURE_A 4.' FEA1URE_D .... FEA1URE_' •. to FEA1UM.,J K••p .". A••, ...... , '"••,pe,••• ~I, ...... ll•• wi. t., ....,II., ...... '.'.7. ".'.·4.'•. 4....." •• , ••" ..,...... , "...... ,. .,,4 ...., 't.~..e • ...... " . H~ •••.....hue .....,.,. ,.....,. .... , ...... ,I fall.,a •••••• ,.,••f 1 ...... • .• FEATUAE_f .. It' ••••,_. , •• ...... ,.,. "e•••• ....", ..... pe,._.,...·i f"'ab" '" ~ 0. rTIRll8T ...._... 4.' FEA1UfE_E P,.w'•• tall.,. ,., .... 1,., t •••• •• ,1 , ..1 ~, •• •••4b•••• , ...I.~,."...... •• ••• ".4 4.7 FlATUfIJIE_Q ••1 ••••• 11, FlAtuM_1 'I',ab " I., " ••• ...... • ,,4 "...... ft...... ,....•••• ,.I •• tt I"', 0 •••••' .....' •• ,-,...... , ..••, w'•••,••••. I .. -;. .. --p ,.,., .. •••••..., ••,.". t.~ . ....,I ...'

I " •••f J~ C.C FEAlURE_C IN'" I. Ae••p' I~.' c.".""., ••• ~~a •.• FEATUAE_" """ ,.,.....,. • ••••t ...If.....1...,. wla ,at" Aft.I,I••"" •• Pal .1••• , ••••••,... 01 ru P'C* ...",.. "'

Figure 5.3: Data Flow Diagram ofthe Reliability Estimation System

Physical design is dependent on the particular equipment and resources available in the project. At this point, appropriate hardware and software are chosen, and an actual relational database design is implemented from the logical designs. This relational database structure is refined and modified in the later stages to blend in with the design of the input and output user interfaces. 79 5.3.3 Design ofthe Input and Output User Interfaces

The user interfaces for input and output management were implemented through forms and reports with the aid ofa sophisticated fourth-generation tool that has the capability ofinteracting with the internal database and performing procedural and non procedural functions along with managing input and output. Input management is achieved via active electronic forms that are designed to have the look and feel ofregular paper forms that users are familiar with, while at the same time providing all the conveniences ofmodem computing at a fraction of the comparable storage and manipulation costs. Several formatted reports were also designed to provide users with optional hard copies of any requested output. In the classical design these forms and reports would represent the users view ofthe database and would present information in a way that is logically amenable to the user's task at hand. In this regard, the form and report design phase along with form prototyping also assists in eliciting finer details of the requirements from the end users and subject matter experts. Forms have also been employed in several approaches by software engineers during the view definition and integration phases [37, 9]. At this stage presentations were given to the client to solicit feedback about any differences in perception before final database structure definitions are made. The database design is then reviewed and refined to facilitate efficient input and output management. It is recommended that any further changes to the database design at this point should be avoided unless it impedes functionality implementation [9]. 80 5.4 Examples ofElectronic Forms Developed in this System

The reliability evaluation system built in this research study consists ofseveral forms that facilitate the storage, initialization, manipulation and display ofdata. Each of these forms performs a set ofwell defined transaction oriented tasks that contribute towards the achievement of all specified system features. The following sections will illustrate with examples a few ofthese forms used in the main system, that the designers employ in performing reliability evaluations oflarge complex electronic systems.

5.4.1 Input Management Form

Figure 5.4 shows the screen that the user sees upon first entering the main application. It is on this screen that the electronics engineer specifies the various electronic modules that he/she would need in the system design. The designer also specifies the corresponding configuration and other reliability parameters associated with each included module that assists in a customized estimation of system reliability. It is worth mentioning that a key feature ofthe system is that intermediate reliability ofall included modules are calculated dynamically by the system. This allows the designer the flexibility ofupdating individual module designs and modifying constituent components without having to re-calculate the failure rates ofeach module. The driving component of the system is the master parts list which is the only repository where failure rates are stored. All intermediate and final failure rates are calculated at runtime. The single location storage of core part failure rates allows easy maintenance and update ofthis 81 information rather than having to make such changes allover the system. All the forms

are transaction-oriented and contain reliability estimation knowledge, enforcing

validations that ensure the integrity ofthe system as far as possible. Every requested

transaction causes the corresponding procedural code to execute, which in turn

manipulates the database and the screen to achieve desired functionality. This is

consistent with the transaction-based object-oriented model described earlier.

05/12/92 Client Organization Mode 1 Reliabilitv Evaluation Svstem Page 01 Enter Identifier for New System: SE-644-0100-o001 [ Large Scale Micro Contoller j Module Description Use Qty Redundancy t E!\f255-001A-II0 Field card assembly digital input w/trip 1 3 1 EM255-00 lA-520 RTD input field card assembly 1 12 3 E!\1255-001A-550 Field card assembly vibrator input 1 3 3 EM255-002A-IOO Interface 32 digital in card assmhly digital input 1 2 2 E!\f255-002A-150 16 interface card assembly 6 digital out 1 6 2 EM255-002C-450 Frequency input file card assembly 1 3 3 EM255-005A-150 Opcon keyboard module 1 2 1 EM255-005A-220 Module LED display 1 4 2 EM255-008A-IOO TCS RTD point card assembly 1 1 1 E!\f255-008A-810#1 RTD scanner amplified card 1 6 3 E!\1255-010A-150 Module filter 1 3 1 E!\i255-900A-921 Assembly PCB 1 1 1 EM255-002A-900#2 Controller communication processor card 1 2 2 E~I2 55-002A-900#4 Processor card assembly 1 1 1 • Hint: Enter Appropriate Design Parameters and Press

Figure 5.4: Example of an Input Management Form Developed in the System

Each row in figure 5.4 describes the level ofparticipation ofthe specified module in the design ofthe system from a reliability point ofview. Keeping in mind that most typical systems are partially redundant systems, the overall system reliability is a function ofthe 82 contribution ofeach module's configuration in the system. In the above form, all modules to be included in the design are entered in the column marked Module. All referential information about each module is entered automatically by the system upon validation of the entered module number. Use factor defaults to 1 and refers to the balancing factor that adds weight to each contributing failure rate depending on the extent to which this part is active in achieving system functionality. Quantity refers to the total number ofeach corresponding module that will be included in the final design. To specify redundant configurations, the Qty (Quantity) and Redundancy columns are used in conjunction with one another as described below.

5.4.2 How to Specify Simple Series Configurations

To specify a simple series configuration ofany module, the module number is entered and the number oftimes that module appears in the final system is entered in the

Qty column. The redundancy column is left unchanged from the default value of 1 indicating that there is no redundancy incorporated for that part.

5.4.3 How to Specify Simple Parallel Configurations

Simple parallel redundant configurations ofa particular module are specified by entering the number ofunits in the parallel configuration, in the Qty column and the level ofredundancy i.e. 2 or 3, is entered in the Redundancy column. Note that in this configuration the Qty ofunits must equal the level ofredundancy. 83

5.4.4 How to Specify Constant Multiple Parallel Configurations

To specify multiple constant parallel configurations, the user would enter the total number ofunits to be used in the configuration, in the Qty column and the Redundancy column would be filled in with the type ofredundancy i.e. double or triple (2 or 3 respectively). From this information, the system would construct the network model of simple constant parallel configurations in series in accordance with the following example: To specify a configuration of four series instances, each having 3 units in parallel, the user would enter a value of 12 in the Qty and a value of3 in the Redundancy column. The system would interpret this as 12/3 = 4 series instances each having triple redundancy, and calculate the failure rate accordingly.

5.4.5 How to Specify Varying Multiple Parallel Configurations

In some cases the user may wish to specify varying multiple parallel configurations ofa certain module in the system design. For example, the module may have one triple redundant configuration and one double redundant configuration. The user would specify this configuration by having two distinct rows in the input screen for that module. One for the triple redundancy and one for the double. A module may appear on as many rows as needed as long as the relative conditions between Qty and Redundancy are met. 84

5.5 Display and Interpretation of Output

The result ofall requested transactions are presented to the user through active electronic forms and formatted reports. Figure 5.5 and figure 5.6 show the output forms for the example given in section 3.4.

05/12/92 Client Organization Mode I Reliabilitv Evaluation System Page 01 Modulescontained in SE-6-t.J-O 100-000 1 I Large Scale Micro Contoller ] Module A 90 /... 125 /..._145 Total A_90 Total A_125 Total/..._145 t EM2SS-001A·IIO 0.6990 I.S247 2.0394 2.097 4.5741 6.1182 E~1255-001A-S20 2.6411 3.6265 4.5805 5.7624 7.9123 9.9938 EM2S5-00 I A-SSO 4.9895 7.6875 9.7782 2.7215 4.1931 5.3335 E~f255-o02A-l00 7.6095 20.7767 33.8717 5.073 13.8511 22.5811 E~f2SS"{}02A·lSO 4.9036 13.2566 21.6793 9.8072 26.5132 43.3586 EM255-002C-450 10.4675 24.9717 34.7991 5.7095 13.6209 18.9813 E~{2S5-005A·lSO 3.3586 5.3308 6.4937 6.7172 10.6616 12.9874 E~f255-005A-220 2.2208 5.0009 6.6143 2.9610 6.6678 8.8190 E~f 255-008A-l00 1.3688 2.3818 3.0910 1.3688 2.3818 3.0910 E~f2SS-008A-810III 1.1420 1.9251 2.6121 1.2458 2.1001 2.8495 E~12S5-010A-lSO 4.4504 5.4630 6.3059 13.3512 16.389 18.9177 E~f255900A-921 9.2115 14.8186 19.3018 9.2115 14.8186 19.3018 E)'{25S-002A-900#2 6.5288 12.2456 16.4540 4.3525 8.1637 10.9693 EAf2S5-002A-900#4 5.3238 10.5902 15.2561 5.3238 10.5902 15.2561 I Hint: Press to Proceed.

Figure 5.5: Example ofan Output Screen Showing the Failure Rate ofEach Constituent Module

Figure 5.5 shows the failure rates ofeach module included in the final design at three different temperatures to provide the designer with a relative perspective oftemperature effects. The form also shows the total effect ofthe presence ofthe total quantity ofeach module in the specified redundant configuration. Note that these values are calculated 85 dynamically for each program execution request. Any changes regarding constituent parts or their failure rate changes, need to be recorded only in the original qualifying parts list.

Those changes will be reflected automatically in the final output during subsequent runs ofthe program. This makes it extremely simple for designers to keep track ofchanges and see the corresponding effects on reliability. Figure 5.6 presents a summary ofthe reliability ofthe final design. It shows the estimated failure rate ofthe final design that includes the net effect ofall modules included in the system in their respective quantities and redundant configurations. The corresponding estimated Mean Time Between Failure

(MTBF) ofthe final design is also computed. As before, all values are computed at three different temperatures.

05/12/92 Client Organization Mode 1 Reliabilitv Evaluation Svstem Page 03 Reliability Evaluation of: SE-644-Q100-0001 [ Large Scale Micro Contoller ]

t Total System Failure Rate at 90 Deg F: 75.7024 Total System MTBF at 90 Deg F: 0.0132 Total System Failure Rate at 125 Deg F: 142.4375 Total System MTBF at 125 Deg F: 0.0070 Total System Failure Rate at 145 Deg F: 198.5583 Total System MTBF at 145 Deg F: 0.0050

Module Having Highest Failure rate at 90 Deg F: EM255-002C-450 F.R at 90: 10.4675 Module Having Highest Failure rate at 125 Deg F: EM255-002C-450 F.R at 125: 24.9717 Module Having Highest Failure rate at 145 Deg F: EM255-Q02C-450 F.R at 145: 34.7991

Module Having Highest Total Failure rate at 90 Deg F: EM255-QI0A-150 F.R at 90: 13.3512 Module Having Highest Total Failure rate at 125 Deg F: EM255-002A-150 F.R at 125: 26.5132 Module Having Highest Total Failure rate at 145 Deg F: EM255-002A-150 F.R at 145: 43.3586 Note: All Failure Rates are per Million Hours and All MTBFs are in Hours. , Hint: Press to Proceed

Figure 5.6: Example ofan Output Screen Showing the Reliability Summary ofthe Final Electronic System Design 86

The system also identifies the module that has the highest failure rate at each ofthree different temperatures. This value is the inherent failure rate ofthat module and does not include the effect ofquantity or redundancy. The module that has the highest contribution to the failure rate ofthe final design at three different temperatures is also identified. This value represents the net effect ofthe total quantity and redundancy ofthat module incorporated in the final design.

5.6 Manipulating Design Parameters

In a typical design environment, the engineer would specify the modules to be incorporated in the final design in the minimum quantities needed, without adding any redundancy. The engineer would then request a reliability evaluation. Based on the computed failure rates, the engineer would then start adding redundancy to the high failure-rate items, and to those that are especially critical in achieving system functionality. This process is continued until the engineer achieves a desired level of reliability for the final design within the specified budget. 87 Chapter 6

Conclusion and Future Work

6.1 Concluding Remarks

The work done in this thesis is spread over a wide area ofstudy. In view of the

potential width ofscope, the task was approached in phases, each ofwhich accomplished

particular subsections that contributed towards an overall goal.

In the first phase, studies were conducted in formulating an approach to

electronics system reliability analysis that elicited the advantages ofboth qualitative and

quantitative approaches. Modern life-cycle-cost analysis techniques that estimate the

extent at which reliability remains a useful investment were studied. Consideration was

given to the impact ofcontextual factors such as mission criticality, physical dimensions and budgetary constraints.

In the second phase ofthe thesis, a restricted form of Failure Mode and Effect

Analysis (FMEA) was implemented to analyze and break down an identified set oflarge complex electronic system designs into simple components convenient for evaluation.

Each ofthese simple components was then subjected to an elaborate procedure consisting ofrigorous engineering and Mil-Hdbk-217E techniques to compute its estimated failure rate. These values were computed at three different temperatures to provide a perspective oftemperature effects on electronics reliability. This process yielded a qualifying parts list consisting ofseveral hundred individual components that could be used in building 88 larger sub-systems. The high failure rate components were also identified at this stage and redundancy models were examined for potential modules that contained these components. Advantages ofthis methodology were researched, while simultaneously examining modern controversies that have evolved in its criticism.

In the last phase ofthe thesis, the significant advantages gained by automating a process that seeks to strike a balance between desired reliability and other influential factors such as time, cost ofanalysis and implementation, and effort, was demonstrated.

This was done by designing and developing from inception, a modern relational data based software system that provides engineers with simple but useful options that facilitate quick and efficient reliability evaluations ofsub-systems and large systems. The software allows the designer the ability to experiment with several competing designs and, in real time, obtain relative reliability levels ofthe designs at three different temperatures. Key features ofthis event driven system include means to manipulate factors such as sub-system redundancy, use factors, quantity and ask " what if" type of questions while observing, in real time, the consequence ofsuch manipulation on system reliability . The software was developed using classical state ofart representation, analysis and development techniques such as Computer Assisted Software Engineering

(CASE) and Relational Data Based Software. 89 6.2 Potential Future Work

While this thesis accomplished all work within the intended boundaries, It is acknowledged that there remain topics in this area towards which further effort may be directed. It is recommended that:

• Deeper studies be conducted in understanding and resolving modern controversies

regarding Mil-Hdbk-217E methodologies.

• The software developed in this thesis be interfaced with an intelligent program

that could draw on a global repository ofMil-Hdbk-217E data and methods. This

approach would allow the software to be more generic and cater to a wider variety

ofdesigns. A simplistic version ofsuch a database exists at Rome Air

Development Center (RADC), and programs based on simpler parts count

methods such as IRAS have been developed [4].

• More sophisticated knowledge and analysis algorithms be incorporated into the

software so as to allow the designer to experiment with more complex redundant

configurations.

•A state-of-the-art Graphical User Interface (GUI) be attached to the software to

facilitate specification and interpretation ofdesigns in graphical form.

• That such a program be linked with circuit analysis programs such as PSPICE ™

so that reliability evaluation may be more closely tied to the functional design,

thereby making it possible to manufacture reliable systems routinely and

efficiently. BffiLIOGRAPHY 90

[1] l.E. Arsenault and J.A. Roberts, Reliability and Maintainability ofElectronic Systems. Computer Science Press Inc., 1980.

[2] Avram Bar-Cohen, Control Data, Minneapolis, Reliability Physics vs Reliability Prediction. IEEE Transactions on Reliability, Vol. 37, No.5, 1988 December.

[3] Richard Barker, CASE Method Entity Relationship Modeling. ORACLE Corporation UK Limited, 1990.

[4] Richard E. Barlow, Mathematical Theory ofReliability: A Historical Perspective. IEEE Transactions on Reliability, Vol. R-33, No.1, April 1984.

[5] Roy Billington, Ronald N. Allan, Reliability Evaluation ofEngineering Systems. Pitman Advanced Publishing Program, 1983.

[6] A.C. Brombacher, Ir. W.F.J. Peters, Twente University ofTechnology, Enschede, lRAS, An Interactive Reliability Analysis System for Electronic Circuits. IEEE Transactions on Reliability, Vol. R-34, No.5, 1985 December.

[7] l.A. Bubenko, Jr. and B. Wrangler, Research Directions in Conceptual specification development. Conceptual Modeling, Database and CASE, An integrated view ofinformation Systems Development, John Wiley and Sons Inc., 1992.

[8] Case Designer - Oracle @ 1989.

[9] J. Choobineh, M.V. Mannino and V.P. Tseng, The Role ofForm Analysis in Computer Aided Software Engineering. Conceptual Modeling, Database and CASE, An integrated view ofinformation System Development. John Wiley and Sons Inc., 1992.

[10] Components vis a vis Systems. IEEE Transactions on Reliability, Vol. 39, No.3 1990 August.

[11] Anthony Coppola, Reliability Engineering ofElectronic Equipment - A Historical Perspective. IEEE Transactions on Reliability, Vol. R-33, No.1, April 1984.

[12] Geoffrey W.A. Dummer and Norman B. Griffin, Electronics Reliability Calculation and Design. Poergamon Press Ltd, 1966. 91 [13] M. Dyer, R.D. Mills, Developing Electronic Systems with Certifiable Reliability. Electronic Systems Effectiveness and life cycles costing, NATO ASI Series Vol. F3. [14] Anthony J. Feduccia, USAF Rome Air Development Center, Reliability Prediction - Use it wisely. IEEE Transactions on Reliability, Vol. R-37, No.5, 1988 December.

[15] Yash P. Gupta, Life Cycle Cost Models and Associated Uncertainties. Electronic Systems Effectiveness and Life Cycle Costing, NATO ASI Series Vol. F3.

[16] C.E. Jowett, Reliability ofElectronics and Environments. Halsted Press, 1973.

[17] C.E. Jowett, Reliability ofElectronic Components. London Iliffe Books Ltd,

[18] Ruth Kerry, Integrating Knowledge Based and Database Management Systems. Ellis Horwood Limited, 1990. 1966.

[19] Klaas B. Klaassen, Reliability ofAnalogue Electronic Systems. Studies in Electrical and Electronic Engineering 13, Elsevier Science Publishing Company Inc., 1984.

[20] D.J. Lawson, Failure Mode, Effect and Criticality Analysis. Electronic Systems Effectiveness and life cycle costing,. NATO ASI Series Vol. F3.

[21] Charles T. Leonard, Seattle, On US Mil-Hdbk-217 and Reliability Prediction. IEEE Transactions on Reliability, Vol. 37, No.5, 1988 December.

[22] P. Loucopoulos, Conceptual Modeling. Conceptual Modeling, Database and CASE, An integrated view ofinformation System Development, John Wiley and Sons Inc., 1992.

[23] Puran Luthra, Mil-Hdbk-217: What is Wrong with It. IEEE transactions on Reliability, Vol. 39, No.5, 1990 December.

[24] Rod Manis, Evan Schaffer, Robert Jorgensen, UNIX Relational Database Management. Prentice Hall, 1985.

[25] Peter F. Manno, RADC Failure rate Prediction Methodology, Today and Tomorrow. Electronic Systems Effectiveness and life cycle costing, NATO ASI series Vol. F3. 92 [26] Mil-Hdbk-217E. Rome Air Development Center (RADC).

[27] Mil-Hdbk-217 - Yet Again. IEEE Transactions on Reliability, Vol. 37, No.4, 1988 October.

[28] Seymour F. Morris, Rome Air Development Center, Griffiss AFB, New York, Use and Application ofMil-Hdbk-217. August 1990 Solid State Technology.

[29] Richard H. Myers, Kam L. Wong, Harold M. Gordy, Reliability Engineering for Electronic Systems. John Wiley and Sons Inc., 1964.

[30] J. Mylopoulos, Conceptual Modeling and Telos. Conceptual Modeling, Database and CASE, An Integrated View ofInformation System Development, John Wiley and sons Inc., 1992.

[31] Bernard de Neumaun, Life Cycle Cost Models. Electronic Systems Effectiveness and Life Cycle Costing, NATO ASI Series Vol. F3.

[32] Patrick D.T. O'Connor, British Aerospace Dynamics Ltd, Reliability Prediction: Help or Hoax. August 1990 Solid State Technology.

[33] Of Math and Meaning. IEEE Transactions on Reliability, Vol. 40 No.1, 1991 April.

[34] Michael Pecht, Wen-Chang Kang, A Critique ofMil-Hdbk-217E Reliability Prediction Methods. IEEE Transactions on Reliability, Vol. 37, No.5, 1988 December.

[35] T.G. Pham, Comment on: Reliability Prediction, fact or Fancy. IEEE Transactions on Reliability, Vol. 39, No.5, 1990 December.

[36] Reliability Prediction Fact or Fancy. IEEE Transactions on Reliability, Vol. 39, No.2, 1990 June.

[37] C. Rolland, C. Cauvet, Trends and Perspectives in Conceptual Modeling. Conceptual Modeling, Database and CASE, An integrated view ofinformation Systems Development, John Wiley and sons Inc., 1992.

[38] John R. Rumble,Jr., Data, Computers, and Database Management Systems, Database Management In Science and Technology. CODATA, 1984. 93 [39] John R. Rumble,Jr. , Viktor E. Hampel, H. Bestougeff, Designing the Database Management System, Database Management In Science and Technology. CODATA, 1984.

[40] John R. Rumble,Jr. , Viktor E. Hampel, Database Management In Science and Technology. CODATA, 1984.

[41] J.K Skwirzynski, Electronic Systems Effectiveness and Life Cycle Costing, NATO ASI Series Vol. F3.

[42] Statistics and Ignorance. IEEE Transactions on Reliability, Vol. 39, No.3, 1990 August.

[43] Joseph J. Steinkirchner/Steven J.Flint, Reliability models, Practical constraints. Electronic Systems Effectiveness and life cycle costing, NATO ASI Series Vol. F3.

[44] The Engineering curriculum. IEEE Transactions on Reliability, vol. 40, No.1, 1991 April.

[45] Ted W. Yellman, Comment on: Reliability Prediction. IEEE Transactions on Reliability, Vol. 39, No.5, 1985 December.

[42] R. Zircari, C. Bauzer, Medoiros, New Generation Database Systems, Introduction Conceptual Modeling. Database and CASE, An integrated view ofinformation System Development, John Wiley and sons Inc., 1992.