SENG 637 Dependability, Reliability & Testing of Software Systems SRE Dep loymen t (Chapter 10)
Department of Electrical & Computer Engineering, University of Calgary B.H. Far ([email protected]) http://www.enel . ucalgary.ca/People/far/Lectures/SENG637/
[email protected] 1 Contents
Quality in requirements phase
Quality in design & implementation, testing & release phases
SfSoftware Qua lity A ssurance (SQA) an d So ftware Reliability Engineering (SRE)
Quality, test and data plans
Roles and responsibilities
Sample quality and test plan
Defect reporting procedure
Best practices of SRE
Quality in post-release and maintenance phase
[email protected] 2 Quality vs. Project Costs
Cost distribution for a typical software pro jec t Product Integration Design and test
Programming
3 Total Cost Distribution
Product Design Questions:
Programming How to build quality into a system?
Maintenance How to Integration andtd test assess quality of a system? Developing better quality system will contribute to lowering maintenance costs 4 Quality in Software Development Process
Q. How to include quality concerns in the process?
Architectural analysis Quality attributes Software Reliability Software Quality Method: ATAM, CBAM, etc. Engineering (SRE) Assurance (SQA)
Requirement & Design & Test & Release AhittArchitecture IlImplemen ttitation
Maintenance
Software QQyuality Assessment Method: RAM, etc.
[email protected] 5 Chapter 10 Section 1 SfSoftware QliQuality: Requirements and Architecture phase
[email protected] 6 Quality Challenges
Modern software systems are required to meet several quality attributes such as: modifiability, performance, security, interoperability, portability, reliability, etc. Questions for any particular system:
What precisely do these quality attributes mean?
Can a system bldbe analyzed to didetermine diddesired quali liities?
How soon can such an analysis occur?
How do you know if the design is suitable without having to build the system first?
SW Architecture Evaluation / Assessment!
[email protected] 7 Evaluating SW Architecture
Determining whether an architecture satisfies its requ irement s oft en i nvol ves:
Being very explicit about what the requirements (functional & non-functional) are and how they are reflected in the architecture
Understanding where one has to make trade-offs between different design alternatives
Applying analysis wherever possible to determine the consequences of an architectural choice
Mediating between desires of different stakeholders To achieve these goals an architectural evaltiluation process i s need dded
[email protected] 8 SW Architecture Evaluation
IfInformal l/d / ad-hoc architec tura l eval uati on
Pros?
QQpuick and Cheap
Cons?
… and Dirty? Incomplete? Unreliable?
… Unrepeatable? Poorly documented?
[email protected] 9 SW Architecture Evaluation
Are there better methods than ad-hoc evaluation?
The answer is “YES”:
SAAM (Software Architecture Analysis Method)
Scenario-based evaluation
ATAM (Architecture Tradeoff Analysis Method)
Scenario-based evaluation with focus on trade -offs
SACAM (Software Architecture Comparison Method)
Business goal-driven comparison of architecture alternatives
CBAM (Cost-Benefit Analysis Method)
Focus on economic aspects
etc.
[email protected] 10 References
Software Architecture Technology Initiative of the SEI: http://www.sei.cmu.edu/architecture/ ATAM: Method for Architecture Evaluation (2000), Rick Kazman, Mark Klein, Paul Clements, Technical Report, CMU/SEI-2000-TR-004. CBAM: M aki ng A rchi tecture D esi gn D eci si ons: A n Economic Approach (2002), Rick Kazman, Jai Asundi, Mark Klein, Technical Report, CMU/SEI- 2002-TR-035.
nd Software Architecture in Practice, 2 ed., Len Bass, Paul Clements,,, Rick Kazman, Addison-Wesley, 2003.
Evaluating Software Architectures: Methods and Case Studies, Paul Clements, Rick Kazman, Mark Klein, Addison-Wesley, 2001.
[email protected] 11 Chapter 10 Section 2 SfSoftware QliDi&Quality: Design & Implementation, Testing & Release Phases
[email protected] 12 What is Reliable Software?
Reliable software products are those that run correctly and consistently, have fewer remaining defects, handle abnormal situation properly, and need less installation effort The remaining defects should not affect the normal behaviour and the use of the software , they will not do any destructive things to system and its hardware or software environment, and rarely be evident to the users DliDeveloping reli liblftable software requi res:
Establishing Software Quality System (SQS) and Software Quality Assurance (SQA) programs
Establishing Software Reliability Engineering (SRE) process
[email protected] 13 Software Quality System (SQS) Goals: Bu ilding qualit y into the software from the beggginning
Keeppging and tracking quality in the software throughout the software life cycle
ThTechnol ogy John W. Horch: Practical Guide to Software Quality Management
[email protected] 14 Software Quality Assurance (SQA)
Software quality Assurance (SQA) is a planned and systematic approach to ensure that both software process and software product conform to the established standards, processes, and procedures. The goals of SQA are to improve software quality by monitoring both software and the development process to ensure full compliance with the established standards and procedures. Steps to establish an SQA program
Get the top management’s agreement on its goal and support.
Identify SQA issues, write SQA plan, establish standards and SQA functions, implement the SQA plan and evaluate SQA program.
[email protected] 15 SRE: Process & Plans
Requirement & Design & Test Architecture Implementation
Define Necessary Reliability
Develop Operational SRE Profile
Proc PPfTtrepare for Test
Apply Execute Failure Test Data
time Quality Test Data Plan Plan Plan There may be many Test and Data (measurement) plans for various parts of the same project
[email protected] 16 Defect Handling: Without & With SQS
Defect reppg,g,orting, tracking, and closure p rocedure
Defect reports DB SCN: software change notice
STR: software trouble report
John W. Horch: Practical Guide to Software Quality Management
[email protected] 17 SRE: Who is Involved?
Senior management
Test coordinator (manager)
Data coordinator (manager)
Customer or user
[email protected] 18 SRE: Management Concerns
Perception and specification of a customer’s real needs.
Translilation of specifi ifiication i nto a conf ormi ng d diesign.
Maintaining conformity throughout the development processes.
Product and sub-product demonstrations which provide convincing indications of the product and project having met their requirements.
Ensuring that the tests and demonstrations are designed and controlled, so as to be both achievable and manageable.
[email protected] 19 Roles & Responsibilities /1
Test Coordinator (Manager): Test coordinator is expected to ensure that every specific statement of intent in the product requirement, specification and design, is matched by a well designed (cost-effective, convincing, self-reporting, etc.) test, measurement or demonstration .
Data Coordinator (Manager) : Data coordinator ensures that the physical and administrative structures fdfor data coll lliection exi st and are d ocumented dih in the quali lilty plan, recei ves and validates the data during development, and through analysis and communication ensures that the meaning of the information is known to all, in time, for effective application.
[email protected] 20 Roles & Responsibilities /2
Customer or User:
Actively encouraging the making and following of detailed quality plans for the products and projects.
Requiring access to previous quality plans and their recorddded outcomes bfbefore accept ing th e fi gures and methods quoted in the new plan.
Enquiring into the sources and validity of synthetics and formulae used in estimating and planning .
Appointing appropriate personnel to provide authoritative responses to queries from the developer and a managed interface to the developer.
Receiving and reviewing reports of significant audits, reviews, tests and demonstrations.
Making any queries and objections in detail and in writing, at the earliest possible time.
[email protected] 21 Quality Plans /1
The most promising mechanisms fiidiifor gaining and improving predictability and controllability of software qualities are quality Test plan and its subsidiary documents, including test plans Plan and data (measurement) plans. Quality The creation of the quality plan Plan can be instrumental in raising project effectiveness and in ppgpreventing expensive and time- Data consuming misunderstandings Plan during the project, and at release/acceptance time.
[email protected] 22 Quality Plan /2
Quality plan and quality record, provide guidelines fitdtllithfllifor carrying out and controlling the followings:
Requirement and specification management.
Development processes .
Documentation management.
Design evaluation.
Product testing. SRE related Data collection and interpretation. activities Acceptance and release processes.
[email protected] 23 Quality Plan /3
Quality planning should be made at the very earliest point in a project, preferably before a final decision is made on feasibility, and before a software development contract is signed.
Quality plan should be devised and agreed between all the concerned parties: senior management, software development management (both administrative and technical) , software development team, customers, and any involved general support functions such as resource management and company-wide qualit y management .
[email protected] 24 Data (Measurement) Plan
The data (measurement) plan prescribes:
What should be measured and recorded during a project;
How it should be checked and collated;
How it should be interpreted and applied .
Data may be collected in several ways, within the specific project and beyond it .
Ideally, there should be a higher level of data collection and application into which project data is fed.
[email protected] 25 Test Plan /1
The purpose of test plan is to ensure that all testing activities (including those used for controlling the process of development, and in indicating the progress of the project) are expected, are manageable and are managed. Test plans are created as a subsection or as an associated document of the quality plan. Test plans become progressively more detailed and expanded diduring a proj jtect. Each test plan defines its own objectives and scope, and the means and methods by which the objectives are expected to be met.
[email protected] 26 Test Plan /2
For the software product, the test plan is usually restricted by the scope of the test: certification, feature and load test. The plan predicts the resources and means required to reach the required levels of assurance about the end products, and the scheduling of all testing , measuring and demonstration activities. Tests, measurements and demonstrations are used to establish thtthftthat the software prod dttifithuct satisfies the requi rement tds document , and that each process during a development is carried out correctly and results in acceptable outcomes.
[email protected] 27 Chapter 10 Section 2.1 Elemen ts o f Qua litlity & T est Pl an
[email protected] 28 Sample SQS Plan /1
1 Purpose
2 Reference Documents
3 Management
3.1 Organization
3. 2 Tasks
3.3 Responsibilities
Based on IEEE Standard 730.1-1989
[email protected] 29 Sample SQS Plan (cont’ d) /2
4 Documentation
4.1 Purpose
4.2 Minimum Documentation
421S4.2.1 So ftware R equi rement s S pecifi cati on
4.2.2 Software Design Description
4.2.3 Software Verification and Validation Plan
4.2.4 Software Verification and Validation Report
4.2.5 User Documentation
4.2.6 Configurati on Management Pl an
4.3 Other Documentation
Based on IEEE Standard 730.1-1989
[email protected] 30 Sample SQS Plan (cont’ d) /3
5 Standards, Practices, Conventions, and Metrics
5.1 Purpose
5.2 Documentation, Logic, Coding, and Commentary Standards and Conventions
5.3 Testing Standards, Conventions, and Practices
5.4 Metrics
Based on IEEE Standard 730.1-1989
[email protected] 31 Sample SQS Plan (cont’ d) /4
6 Review and Audits
6.1 Purpose
6.2 Minimum Requirements
6.2.1 Software Requirements Review
6.2 .2 Pr elimin ary Des ig n Rev iew
6.2.3 Critical Design Review
6.2.4 Software Verification and Validation Review
6.2.5 Functional Audit
6.2.6 Physical Audit
6.2.7 In-process Reviews
6.2.8 Managerial Reviews
629C6.2.9 Confi gurati on Management Pl an Revi ew
6.2.10 Postmortem Review
6.3 Other Reviews and Audits
Based on IEEE Standard 730.1-1989
[email protected] 32 Sample SQS Plan (cont’ d) /5
7 Test 8 Problem Reporting and Corrective Action
8.1 Practices and Procedures
8.2 Organizational Responsibilities 9T9 Tool s, T ech hiniques, and dMhdl Methodologi es 10 Code Control 11 Media Control 12 Supplier Control 13 Records Collection, Maintenance, and Retention 14 Training 15 Risk Management
Based on IEEE Standard 730.1-1989
[email protected] 33 Sample Test Plan /1
1 Test Plan identifier
2 Introduction
2.1 Objectives
2.2 Background
2.3 Scope
2.4 References
Based on IEEE Standard 829-1983
[email protected] 34 Sample Test Plan (cont’ d) /2
3 Test Items
3.1 Program Modules
3.2 Job Control Procedures
3.3 User Procedures
3.4 Operator Procedures
4 Features To Be Tested
5 Feature Not To be Tested
Based on IEEE Standard 829-1983
[email protected] 35 Sample Test Plan (cont’ d) /3
6 Approach
6.1 Conversion Testing
6.2 Job Stream Testing
6. 3 Interface Testing
6.4 Security Testing
6.5 Recovery Testing
6.6 Performance Testing
6.7 Regression
6.8 Comprehensiveness
6.9 Constraints
Based on IEEE Standard 829-1983
[email protected] 36 Sample Test Plan (cont’ d) /4
7 Item Pass/Fail Criteria
8 Suspension Criteria and Resumption Requirements
8.1 Suspension Criteria
8.2 Resumppqtion Requirements
9 Test Deliverables
10 Testing Tasks
Based on IEEE Standard 829-1983
[email protected] 37 Sample Test Plan (cont’ d) /5
11 Environmental Needs
11. 1 Har dware
11.2 Software
11.3 Security
11.4 Tools
11.5 Publications 12 Responsibilities
12.1 Test Group
12.2 User Department
12.3 Development Project Group
Based on IEEE Standard 829-1983
[email protected] 38 Sample Test Plan (cont’ d) /6
13 Staffing and Training Needs
13.1 Staffing
13.2 Training
14 Schedule
15 Risks and Contingencies
16 Approvals
Based on IEEE Standard 829-1983
[email protected] 39 Chapter 10 Section 2.2 BtPtiSREBest Practice SRE
[email protected] 40 Practice of SRE /1
The practice of SRE provides the software engineer or manager the means to predict , estimate , and measure the rate of failure occurrences in software.
Using SRE in the context of Software Engineering, one can:
Analyze, manage, and improve the reliability of software products.
Balance customer needs for competitive price, timely delivery, and a !
yy reliable product.
Determine when the software is good enough to release to customers, minimizing the risks of releasing software with serious problems. opefull Avoid excessive time to market due to overtesting. HH
[email protected] 41 Incremental Implementation
Most projects implement the SRE activities incrementally.
A typical implementation sequence
[email protected] 43 Implementing SRE /1
Feasibility and requirements phase:
Define and classify failures, i.e., failure severity classes
Identify customer reliability needs
Determine operational profile
Conduct trade-off studies (among reliability, time, cost, people, technology)
Set reliability objectives
[email protected] 44 Implementing SRE /2
Design and implementation phase:
Allocate reliability among components, acquired software, hardware and other systems
Engineer to meet reliability objectives
Focus resources based on operational profile
Measure reliability of acquired software, hardware and other systems, i.e., certification test
Manage fault introduction and propagation
[email protected] 45 Implementing SRE /3
System test and field trial phase:
Determine operational profile used for testing, i.e. test profile
Conduct reliability growth testing
Track testing progress
Project additional testing needed
Certify reliability objectives and release criteria are met
[email protected] 46 Implementing SRE /4
Post delivery and maintenance:
Project post-release staff needs
Monitor field reliability vs. objectives
Track customer satisfaction with reliability
Time new feature introduction by monitoring reliability
Guide product and process improvement with reliability measures
[email protected] 47 Feasibility Phase
Activity 1: Define and classify failures
DfiDefine filfailure f rom cust omer’ s perspecti ve
Group identified failures into a group of severity classes from customer’s perspective
Usually 3-4 classes are sufficient
Activity 2: Identify customer reliability needs
What is the level of reliability that the customer needs?
Who are the rival companies and what are rival products and what is their reliability?
Activity 3: Determine operational profile
Based on the tasks performed and the environmental factors
[email protected] 48 Requirements Phase
Activity 4: Conduct trade-off studies Reliability and functionality Reliability, cost, delivery date, technology, team Activity 5: Set reliability objectives based on Explicit requirement statements from a request for ppproposal or standard document Customer satisfaction with a previous release or similar product Capabilities of competition Trade-offs with performance, delivery date and cost Warranty, technology capabilities
[email protected] 49 Design Phase
Activity 6: Allocate reliability among acquired software, components, hardware and other systems
Determine which systems and components are involved and how they affect the overall system reliability
Activity 7: Engineer to meet reliability objectives
Plan using fault tolerance, fault removal and fault avoidance
Activity 8: Focus resources based on operational profile
Operational profile guides the designer to focus on features that are supposed to be more critical
Develop more critical f unctions first in more detail
[email protected] 50 Implementation Phase
Activity 9: Measure reliability of acquired software, har dware an d o ther sys tems
Certification test using reliability demonstration chart
Atiit10Activity 10: MfltitdtidManage fault introduction and propagation
Practicing a development methodology; constructing modular system; employing reuse; conducting inspection and review; controlling change
[email protected] 51 System Test Phase
Activity 11: Determine operational profile used fttifor testing
Decide upon critical operations
Decide upon need of multiplicity of operational profile
Activity 12: Conduct reliability growth testing
Activity 13: Track testing progress and certify that reliability objectives are met
Conduct feature test , regression test and performance and load test
Conduct reliability growth test
[email protected] 52 Field Trial Phase
Activity 14: Project additional testing needed
Check accuracy of test: time and coverage
Plan for changes in test strategies and methods
Atiit15Activity 15: Cer tify tha t re lia bility obj ecti ves and release criteria are met
Check accuracy of data collection
Check whether test operational profile reflects field operational profile
Check customer’s definition of failure matches with what was defined for testing the product
[email protected] 53 Post Delivery Phase /1
Activity 16: Project post-release staff needs
Cust omer’ s st aff f or syst em recovery; suppli er’ s st aff t o handle customer-reported failures and to remove faults Activity 17: Monitor field reliabilityyj vs. objectives
Collect post release failure data systematically Activity 18: Track customer satisfaction with relia bility
Survey product features with a sample customer set
[email protected] 54 Post Delivery Phase /2
Activity 19: Time new feature introduction by monitoring reliability
New features bring new defects. Add new features desired by the customers if they can be managed without sacrifici ng reli abili ty of th e wh ol e system Activity 20: Guide product and process improvement with reliability measures
Root-cause analysis for the faults
Why the fault was not detected earlier in the development phase and what should be done to reduce the probability of introducing similar faults
[email protected] 55 Chapter 10 Section 2.3 PtiVitiPractice Variations
[email protected] 56 Existing vs. New Projects
There is no essential difference between new and existing projects in applying SRE for the first time . However , determining failure intensity objective and operational profile for existing projects is easier.
Most of the SRE activities will require only small updates after they have been completed once, e.g., operational profile should only be updated for the new operations added . (remember interaction factor)
After SRE has been applied to one release, less effort is needed for succeeding releases, e.g., new test cases should be added to the existing ones.
[email protected] 57 Short-Cycle Projects
Small projects or releases or those with short development cycles may require a modified set of SRE activities to keep costs low or activity durations short. Reduction in cost and time can be obtained by limiting the number of elements in the operational profile and by accepting less precision . Examples: Setting one operational mode and performing certification test rather than reliability growth test.
[email protected] 58 Cost Concerns
There may be a training cost when starting to apply SRE.
The principal cost in applying SRE is determining the operational profile .
Another cost is associated with processing and analyzing failure data during reliability growth test .
As most projects have multiple releases, the SRE cost drops sharply after initial release.
[email protected] 59 Practice Variation
Defining an operational profile based on “customer modeling”.
Automatic test cases generation based on frequency of use reflected in operational profile.
Employing “cleanroom” development techniques together with feature and certification testing.
Au tomati c t racki ng of reli abilit y growth .
SRE for Agile software development.
[email protected] 60 Conclusions …
Practical implementation of an effective SRE program is a non -trivial task. Mechanisms for collection and analysis of data on software product and process quality must be in place. Fault identification and elimination techniques must be in place. Other organizational abilities such as the use of reviews and inspections, reliability based testing, and software process improvement are also necessary for effective SRE. Quality oriented mindset and training are necessary!
[email protected] 61 Chapter 10 Section 3 SfSoftware QliQuality: Post Release & Maintenance Phase
[email protected] 62 Quality Assessment
Post-release quality
assessment: Quality Assess evaluation, validation ment
Ref: Design for Electrical & Comp. Engineers, J.E. Salt et al., Wiley
63 Quality Assessment: Difficulties
Leonardo Pablo da Vinci Picasso
Mona Lisa Lisa Dorra Maar Maar (1479) (1937)
Same requirements can lead to different systems
Need to account for “creativity” in the “design” of the product and the “requirements” as well as the “product” itself
Quality assessment method: RAM
64 How Do We Assess Quality?
Usual (ad-hoc) approach
Systematic approach: RAM
[email protected] 65 Inside RAM
What is RAM?
RAM: RELIABILITY – AVAILABILITY – MAINTAINABILITY
A collection of numerical analysis techniques that quantifies the reliability, availability and maintainability of a complex system RAM analysis helps us answer questions related to dependability (i .e . reliability , safety , availability and maintainability) of the system
66 RAM: Advantages & Uses Can be used to understand
Operation of the system - System reliability versus through-put rate requirements
Safety of the system -Idifiblfildentifiable failure mod es which present an unacceptable consequence to facility workers or the public
Improvements that can have substantial impacts on system performance - Recommendations for improving the safety and reliability of eqqpuipment/p rocesses.
[email protected] 67 RAM: Data Requirements
Failure data
Maintenance data
Reliability and availability data from recognized industry standards (MTTF, MTBF & MTTR)
Data collection requires: • EiEngineeri ng experidjdtience and judgment • Interviews with engineering and maintenance personnel at the system site
[email protected] 68 Case Study RAM Analysis Dist r ibu te d Con tro l SSys tem o f the Bonnybrook Waste Water Treatment Plan t (Cit(City of C alg lary) )
[email protected] 69 Background
The City of Calgary invested $100 million in the 1994 expansion of the Bonnybrook Wastewater Treatment Plant (WTP) to serve Calgary's growing population, which was 767,000 in 1996.
This expansion increased the plant capacity by %25 to 500,000 cubic meter per day, while incorporating state-of-the- art treatment technologies .
This study was performed in order to provide the City with an assessment of quality of the Distributed Control Systems (DCS) of the Bonnybrook WTP to be used as a guide for the next WTP plant at Pine Creek.
[email protected] 70 Background (cont’ d)
The City’s WTP DCS is real-time, mission critical, dependable, safe and secure.
However, the current qqyuality measures for City of Calgary’s WTP DCS is unknown.
To successfully improve the safety and reliability for the next generation of WTP, which is built in Pine Creek, a study of current level of reliability and safety of the existing Bonnybrook WTP plant was prudent.
[email protected] 71 Assumptions & Hints a) Deal with both hardware (mechanical and electrical) and software failures b) Only deal with “failures”, not mandatory preventative maintenance or minor repairs where no components are replaced c) Components whose function is to wear and/or fail after a certain period of time (e.g., batteries, etc.), and regularly replaced items are not included in the analysis d) Probes, gauges, or transmitters whose purpose is to provide information to the user are not included
[email protected] 72 Assumptions (Cont’ d) e) Failures due to an improper installation of hardware/software are not included f) Missing parts are not considered failures (e.g., rivets, screws, bolts) g) Anything below the subsystem level is considered to be in series h) All subsystems are independent (i.e., loss of one subsystem does not result in loss of another sub syst em’ s f uncti onality) lit ) i) Failures are not distinguished based on their severity
[email protected] 73 RAM for Bonnybrook WWTP
Reason for conducting RAM analysis for Bonnybrook WWTP
Current scenario at Bonnybrook WWTP
Methods / Techniques used
Result of the analysis (How Reliable is Bonnybrook?)
Key value of RAM analysis to the City of Calgary
How to use the results (for current and future systems)
[email protected] 74 Why RAM?
Reason for conducting RAM analysis for Bonnybrook WWTP What we know? What we DO NOT know? Current system runs smoothly Actual reliability of the system Minor failures can be repaired Cost of each maintenance easily (e.g. card frame change) Connection layout of components Impact of “minor” failures on the overall system Frequency of failure / maintenance - Accurate failure data - Accurate maintenance data Change i n cost / reli abilit y w ith the c hange in con figura tion Is the current system configuration good for next projects? Is the system serial or parallel? Are the comppponents inside each DCU serial or parallel? Can the change in layout change the performance?
[email protected] 75 Why RAM?
Reason for conducting RAM analysis for BbkBonnybrook WWTP
Better understand the system (system configuration)
Better understand the impact of failure / faults of components on the system
Establish ggyroundwork for Reliability-Availability- Maintainability measurement
Study the method of data collection, fault / maintenance record keeping
Design and develop tool to perform what-if scenario
[email protected] 76 RAM: Current Scenario
Current scenario at Bonnybrook WWTP
Reliability of components and the system as a whole is not measured
Established method to measure the system reliability needs to be put in place
[email protected] 77 RAM: Methods Used
Techniques are selected based on availability of data and tools
Techniqqyues used in this analysis are:
Reliability Block Diagram (RBD)
Reliabilityy() Demonstration Chart (RDC)
Fault Tree Analysis (FTA)
[email protected] 78 RAM: RBD
A reliability block diagram is a graphical representtitation of fh how th e component s of a syst em are connected from reliability point of view
[email protected] 79 RAM: RDC
RDC analysis is an efficient way of checking whe ther f ail ure in tensit y obj ecti ve (FIO) i s met or not.
[email protected] 80 RAM: FTA
Fault tree analysis is a graphical representation of the major (critical) failures associated with a pp,roduct, the causes for the faults, and potential countermeasures.
[email protected] 81 Analysis: System Configuration
[email protected] 82 Analysis: Data Control System
RBD of The DCS layout
[email protected] 83 Analysis: Inside a DCU
Contains serial and parallel subsystems Configuration affects total system reliability
Total 10 Units
[email protected] 84 Analysis: Results
[email protected] 85 Analysis: Results
What We know? What we would Like to see? The exact layout of the DCS (inside-out) - More relevant failure data - More maintenance data Actual reliability of the current system Failure mode and their effects (FMEA) Cost and impact of “minor” failures on the overall system Change in cost / reliability with the change in configu ration Is the system serial or parallel? Are the components inside each DCU serial or par all el? Can the change in layout change performance? Is the current system/configuration fit to be used in the next projects?
[email protected] 86 RAM: Key Values From engineering point of view
Current:
Can understand what the system looks like inside-out
Can use current system as benchmark for future system ’s performance
Can change components and see their effects on reliability
In future:
Can be used to pinpoint single points of failures
Can be used to effectively plan redundancy and refrain from “over engineering” and over spending (spending can be made at the right place to complement reliability and availability
[email protected] 87 RAM: Key Values From management point of view
Can help perform what -if scenario evaluation
Can help planning and design of future projects and plants
Can helppp perform cost-value analysis on maintenance vs. replacement
Can help make better decisions on system/ subsystem/ component purchase base d on re liabili ty d ata and i mpact on performance
Can help compare systems/subsystems/components from several vendors.
Can be used to plan procedures that need to be in place for dlliidata collection, maintenance as well llli as analysis purpose.
[email protected] 88 What Was Accomplished
Defined benchmark value for reliability metrics for WTPs DCS componen ts (ven dors ’ da ta )
Defined the architecture of the WTPs DCS
Idifiddentified source of ffild failure data
Stated ground rules and assumptions
Identified the confidence level of estimation and predictions
Collected failure and maintenance data for the current WTPs DCS
[email protected] 89 What Was Accomplished
Analyzed data to identify proper distribution that modldels f filailure d dtata
Performed goodness-of-fit and bias tests (using reliability demonstration charts , fault tree analysis , etc.) to validate distribution fit
Estimated current system reliability
Based on these:
A reliability calculation chart to perform what -if analysis for various units of the system was developed
A list of recommendations for reliability improvements of the WTPs DCS was produced
[email protected] 90 Conclusions
System integration & manufacturing are not the final stfdltteps of a development process ( (ll)usually)
Quality assessment of hardware/software system can be performed systematically using RAM
Mechanisms for failure data collection and interpretation are necessary
Engineering judgment (in selecting tools, techniques, interpreting data , etc .) is essential to the analysis