<<

SENG 637 , Reliability & Testing of Systems SRE Dep loymen t (Chapter 10)

Department of Electrical & , University of Calgary B.H. Far ([email protected]) http://www.enel . ucalgary.ca/People/far/Lectures/SENG637/

[email protected] 1 Contents

in requirements phase

 Quality in design & implementation, testing & release phases

 SfSoftware Qua lity A ssurance (SQA) an d So ftware (SRE)

 Quality, test and data plans

 Roles and responsibilities

 Sample quality and test plan

 Defect reporting procedure

 Best practices of SRE

 Quality in post-release and maintenance phase

[email protected] 2 Quality vs. Project Costs

Cost distribution for a typical software pro jec t Product Integration Design and test

Programming

3 Total Cost Distribution

Product Design Questions:

Programming How to build quality into a system?

Maintenance How to Integration andtd test assess quality of a system? Developing better quality system will contribute to lowering maintenance costs 4 Quality in Process

Q. How to include quality concerns in the process?

Architectural analysis Quality attributes Software Reliability Method: ATAM, CBAM, etc. Engineering (SRE) Assurance (SQA)

Requirement & Design & Test & Release AhittArchitecture IlImplemen ttitation

Maintenance

Software QQyuality Assessment Method: RAM, etc.

[email protected] 5 Chapter 10 Section 1 SfSoftware QliQuality: Requirements and Architecture phase

[email protected] 6 Quality Challenges

 Modern software systems are required to meet several quality attributes such as: modifiability, performance, , interoperability, portability, reliability, etc.  Questions for any particular system:

 What precisely do these quality attributes mean?

 Can a system bldbe analyzed to didetermine diddesired quali liities?

 How soon can such an analysis occur?

 How do you know if the design is suitable without having to build the system first?

 SW Architecture Evaluation / Assessment!

[email protected] 7 Evaluating SW Architecture

 Determining whether an architecture satisfies its requ irement s oft en i nvol ves:

 Being very explicit about what the requirements (functional & non-functional) are and how they are reflected in the architecture

 Understanding where one has to make trade-offs between different design alternatives

 Applying analysis wherever possible to determine the consequences of an architectural choice

 Mediating between desires of different stakeholders To achieve these goals an architectural evaltiluation process i s need dded

[email protected] 8 SW Architecture Evaluation

IfInformal l/d / ad-hoc architec tura l eval uati on

 Pros?

 QQpuick and Cheap

 Cons?

 … and Dirty? Incomplete? Unreliable?

 … Unrepeatable? Poorly documented?

[email protected] 9 SW Architecture Evaluation

 Are there better methods than ad-hoc evaluation?

 The answer is “YES”:

 SAAM ( Analysis Method)

 Scenario-based evaluation

 ATAM (Architecture Tradeoff Analysis Method)

 Scenario-based evaluation with focus on trade -offs

 SACAM (Software Architecture Comparison Method)

 Business goal-driven comparison of architecture alternatives

 CBAM (Cost-Benefit Analysis Method)

 Focus on economic aspects

 etc.

[email protected] 10 References

 Software Architecture Technology Initiative of the SEI: http://www.sei.cmu.edu/architecture/  ATAM: Method for Architecture Evaluation (2000), Rick Kazman, Mark Klein, Paul Clements, Technical Report, CMU/SEI-2000-TR-004.  CBAM: M aki ng A rchi tecture D esi gn D eci si ons: A n Economic Approach (2002), Rick Kazman, Jai Asundi, Mark Klein, Technical Report, CMU/SEI- 2002-TR-035.

nd  Software Architecture in Practice, 2 ed., Len Bass, Paul Clements,,, Rick Kazman, Addison-Wesley, 2003.

 Evaluating Software Architectures: Methods and Case Studies, Paul Clements, Rick Kazman, Mark Klein, Addison-Wesley, 2001.

[email protected] 11 Chapter 10 Section 2 SfSoftware QliDi&Quality: Design & Implementation, Testing & Release Phases

[email protected] 12 What is Reliable Software?

 Reliable software products are those that run correctly and consistently, have fewer remaining defects, handle abnormal situation properly, and need less installation effort  The remaining defects should not affect the normal behaviour and the use of the software , they will not do any destructive things to system and its hardware or software environment, and rarely be evident to the users  DliDeveloping reli liblftable software requi res:

 Establishing Software Quality System (SQS) and Software (SQA) programs

 Establishing Software Reliability Engineering (SRE) process

[email protected] 13 Software Quality System (SQS) Goals:  Bu ilding qualit y into the software from the beggginning

 Keeppging and tracking quality in the software throughout the software life cycle

ThTechnol ogy John W. Horch: Practical Guide to Software

[email protected] 14 Software Quality Assurance (SQA)

 Software quality Assurance (SQA) is a planned and systematic approach to ensure that both software process and software product conform to the established standards, processes, and procedures.  The goals of SQA are to improve software quality by monitoring both software and the development process to ensure full compliance with the established standards and procedures.  Steps to establish an SQA program

 Get the top management’s agreement on its goal and support.

 Identify SQA issues, write SQA plan, establish standards and SQA functions, implement the SQA plan and evaluate SQA program.

[email protected] 15 SRE: Process & Plans

Requirement & Design & Test Architecture Implementation

Define Necessary Reliability

Develop Operational SRE Profile

Proc PPfTtrepare for Test

Apply Execute Failure Test Data

time Quality Test Data Plan Plan Plan There may be many Test and Data (measurement) plans for various parts of the same project

[email protected] 16 Defect Handling: Without & With SQS

 Defect reppg,g,orting, tracking, and closure p rocedure

Defect reports DB SCN: software change notice

STR: software trouble report

John W. Horch: Practical Guide to Software Quality Management

[email protected] 17 SRE: Who is Involved?

 Senior management

 Test coordinator (manager)

 Data coordinator (manager)

 Customer or user

[email protected] 18 SRE: Management Concerns

 Perception and specification of a customer’s real needs.

 Translilation of specifi ifiication i nto a conf ormi ng d diesign.

 Maintaining conformity throughout the development processes.

 Product and sub-product demonstrations which provide convincing indications of the product and project having met their requirements.

 Ensuring that the tests and demonstrations are designed and controlled, so as to be both achievable and manageable.

[email protected] 19 Roles & Responsibilities /1

 Test Coordinator (Manager): Test coordinator is expected to ensure that every specific statement of intent in the product requirement, specification and design, is matched by a well designed (cost-effective, convincing, self-reporting, etc.) test, measurement or demonstration .

 Data Coordinator (Manager) : Data coordinator ensures that the physical and administrative structures fdfor data coll lliection exi st and are d ocumented dih in the quali lilty plan, recei ves and validates the data during development, and through analysis and communication ensures that the meaning of the information is known to all, in time, for effective application.

[email protected] 20 Roles & Responsibilities /2

 Customer or User:

 Actively encouraging the making and following of detailed quality plans for the products and projects.

 Requiring access to previous quality plans and their recorddded outcomes bfbefore accept ing th e fi gures and methods quoted in the new plan.

 Enquiring into the sources and validity of synthetics and formulae used in estimating and planning .

 Appointing appropriate personnel to provide authoritative responses to queries from the developer and a managed interface to the developer.

 Receiving and reviewing reports of significant audits, reviews, tests and demonstrations.

 Making any queries and objections in detail and in writing, at the earliest possible time.

[email protected] 21 Quality Plans /1

 The most promising mechanisms fiidiifor gaining and improving predictability and controllability of software qualities are quality Test plan and its subsidiary documents, including test plans Plan and data (measurement) plans. Quality  The creation of the quality plan Plan can be instrumental in raising project effectiveness and in ppgpreventing expensive and time- Data consuming misunderstandings Plan during the project, and at release/acceptance time.

[email protected] 22 Quality Plan /2

 Quality plan and quality record, provide guidelines fitdtllithfllifor carrying out and controlling the followings:

 Requirement and specification management.

 Development processes .

 Documentation management.

 Design evaluation.

 Product testing. SRE related  Data collection and interpretation. activities  Acceptance and release processes.

[email protected] 23 Quality Plan /3

 Quality planning should be made at the very earliest point in a project, preferably before a final decision is made on feasibility, and before a software development contract is signed.

 Quality plan should be devised and agreed between all the concerned parties: senior management, software development management (both administrative and technical) , software development team, customers, and any involved general support functions such as resource management and company-wide qualit y management .

[email protected] 24 Data (Measurement) Plan

 The data (measurement) plan prescribes:

 What should be measured and recorded during a project;

 How it should be checked and collated;

 How it should be interpreted and applied .

 Data may be collected in several ways, within the specific project and beyond it .

 Ideally, there should be a higher level of data collection and application into which project data is fed.

[email protected] 25 Test Plan /1

 The purpose of test plan is to ensure that all testing activities (including those used for controlling the process of development, and in indicating the progress of the project) are expected, are manageable and are managed.  Test plans are created as a subsection or as an associated document of the quality plan.  Test plans become progressively more detailed and expanded diduring a proj jtect.  Each test plan defines its own objectives and scope, and the means and methods by which the objectives are expected to be met.

[email protected] 26 Test Plan /2

 For the software product, the test plan is usually restricted by the scope of the test: certification, feature and load test.  The plan predicts the resources and means required to reach the required levels of assurance about the end products, and the scheduling of all testing , measuring and demonstration activities.  Tests, measurements and demonstrations are used to establish thtthftthat the software prod dttifithuct satisfies the requi rement tds document , and that each process during a development is carried out correctly and results in acceptable outcomes.

[email protected] 27 Chapter 10 Section 2.1 Elemen ts o f Qua litlity & T est Pl an

[email protected] 28 Sample SQS Plan /1

 1 Purpose

 2 Reference Documents

 3 Management

 3.1 Organization

 3. 2 Tasks

 3.3 Responsibilities

Based on IEEE Standard 730.1-1989

[email protected] 29 Sample SQS Plan (cont’ d) /2

 4 Documentation

 4.1 Purpose

 4.2 Minimum Documentation

 421S4.2.1 So ftware R equi rement s S pecifi cati on

 4.2.2 Description

 4.2.3 Software Verification and Validation Plan

 4.2.4 Software Verification and Validation Report

 4.2.5 User Documentation

 4.2.6 Configurati on Management Pl an

 4.3 Other Documentation

Based on IEEE Standard 730.1-1989

[email protected] 30 Sample SQS Plan (cont’ d) /3

 5 Standards, Practices, Conventions, and Metrics

 5.1 Purpose

 5.2 Documentation, Logic, Coding, and Commentary Standards and Conventions

 5.3 Testing Standards, Conventions, and Practices

 5.4 Metrics

Based on IEEE Standard 730.1-1989

[email protected] 31 Sample SQS Plan (cont’ d) /4

 6 Review and Audits

 6.1 Purpose

 6.2 Minimum Requirements

 6.2.1 Review

 6.2 .2 Pr elimin ary Des ig n Rev iew

 6.2.3 Critical Design Review

 6.2.4 Software Verification and Validation Review

 6.2.5 Functional Audit

 6.2.6 Physical Audit

 6.2.7 In-process Reviews

 6.2.8 Managerial Reviews

 629C6.2.9 Confi gurati on Management Pl an Revi ew

 6.2.10 Postmortem Review

 6.3 Other Reviews and Audits

Based on IEEE Standard 730.1-1989

[email protected] 32 Sample SQS Plan (cont’ d) /5

 7 Test  8 Problem Reporting and Corrective Action

 8.1 Practices and Procedures

 8.2 Organizational Responsibilities  9T9 Tool s, T ech hiniques, and dMhdl Methodologi es  10 Code Control  11 Media Control  12 Supplier Control  13 Records Collection, Maintenance, and Retention  14 Training  15 Risk Management

Based on IEEE Standard 730.1-1989

[email protected] 33 Sample Test Plan /1

 1 Test Plan identifier

 2 Introduction

 2.1 Objectives

 2.2 Background

 2.3 Scope

 2.4 References

Based on IEEE Standard 829-1983

[email protected] 34 Sample Test Plan (cont’ d) /2

 3 Test Items

 3.1 Program Modules

 3.2 Job Control Procedures

 3.3 User Procedures

 3.4 Operator Procedures

 4 Features To Be Tested

 5 Feature Not To be Tested

Based on IEEE Standard 829-1983

[email protected] 35 Sample Test Plan (cont’ d) /3

 6 Approach

 6.1 Conversion Testing

 6.2 Job Stream Testing

 6. 3 Interface Testing

 6.4 Security Testing

 6.5 Recovery Testing

 6.6 Performance Testing

 6.7 Regression

 6.8 Comprehensiveness

 6.9 Constraints

Based on IEEE Standard 829-1983

[email protected] 36 Sample Test Plan (cont’ d) /4

 7 Item Pass/Fail Criteria

 8 Suspension Criteria and Resumption Requirements

 8.1 Suspension Criteria

 8.2 Resumppqtion Requirements

 9 Test Deliverables

 10 Testing Tasks

Based on IEEE Standard 829-1983

[email protected] 37 Sample Test Plan (cont’ d) /5

 11 Environmental Needs

 11. 1 Har dware

 11.2 Software

 11.3 Security

 11.4 Tools

 11.5 Publications  12 Responsibilities

 12.1 Test Group

 12.2 User Department

 12.3 Development Project Group

Based on IEEE Standard 829-1983

[email protected] 38 Sample Test Plan (cont’ d) /6

 13 Staffing and Training Needs

 13.1 Staffing

 13.2 Training

 14 Schedule

 15 Risks and Contingencies

 16 Approvals

Based on IEEE Standard 829-1983

[email protected] 39 Chapter 10 Section 2.2 BtPtiSREBest Practice SRE

[email protected] 40 Practice of SRE /1

 The practice of SRE provides the or manager the means to predict , estimate , and measure the rate of failure occurrences in software.

 Using SRE in the context of , one can:

 Analyze, manage, and improve the reliability of software products.

 Balance customer needs for competitive price, timely delivery, and a !

yy reliable product.

 Determine when the software is good enough to release to customers, minimizing the risks of releasing software with serious problems. opefull  Avoid excessive time to market due to overtesting. HH

[email protected] 41 Incremental Implementation

 Most projects implement the SRE activities incrementally.

 A typical implementation sequence

[email protected] 43 Implementing SRE /1

 Feasibility and requirements phase:

 Define and classify failures, i.e., failure severity classes

 Identify customer reliability needs

 Determine operational profile

 Conduct trade-off studies (among reliability, time, cost, people, technology)

 Set reliability objectives

[email protected] 44 Implementing SRE /2

 Design and implementation phase:

 Allocate reliability among components, acquired software, hardware and other systems

 Engineer to meet reliability objectives

 Focus resources based on operational profile

 Measure reliability of acquired software, hardware and other systems, i.e., certification test

 Manage fault introduction and propagation

[email protected] 45 Implementing SRE /3

 System test and field trial phase:

 Determine operational profile used for testing, i.e. test profile

 Conduct reliability growth testing

 Track testing progress

 Project additional testing needed

 Certify reliability objectives and release criteria are met

[email protected] 46 Implementing SRE /4

 Post delivery and maintenance:

 Project post-release staff needs

 Monitor field reliability vs. objectives

 Track customer satisfaction with reliability

 Time new feature introduction by monitoring reliability

 Guide product and process improvement with reliability measures

[email protected] 47 Feasibility Phase

 Activity 1: Define and classify failures

 DfiDefine filfailure f rom cust omer’ s perspecti ve

 Group identified failures into a group of severity classes from customer’s perspective

 Usually 3-4 classes are sufficient

 Activity 2: Identify customer reliability needs

 What is the level of reliability that the customer needs?

 Who are the rival companies and what are rival products and what is their reliability?

 Activity 3: Determine operational profile

 Based on the tasks performed and the environmental factors

[email protected] 48 Requirements Phase

 Activity 4: Conduct trade-off studies  Reliability and functionality  Reliability, cost, delivery date, technology, team  Activity 5: Set reliability objectives based on  Explicit requirement statements from a request for ppproposal or standard document  Customer satisfaction with a previous release or similar product  Capabilities of competition  Trade-offs with performance, delivery date and cost  Warranty, technology capabilities

[email protected] 49 Design Phase

 Activity 6: Allocate reliability among acquired software, components, hardware and other systems

 Determine which systems and components are involved and how they affect the overall system reliability

 Activity 7: Engineer to meet reliability objectives

 Plan using fault tolerance, fault removal and fault avoidance

 Activity 8: Focus resources based on operational profile

 Operational profile guides the designer to focus on features that are supposed to be more critical

 Develop more critical f unctions first in more detail

[email protected] 50 Implementation Phase

 Activity 9: Measure reliability of acquired software, har dware an d o ther sys tems

 Certification test using reliability demonstration chart

 Atiit10Activity 10: MfltitdtidManage fault introduction and propagation

 Practicing a development methodology; constructing modular system; employing reuse; conducting inspection and review; controlling change

[email protected] 51 System Test Phase

 Activity 11: Determine operational profile used fttifor testing

 Decide upon critical operations

 Decide upon need of multiplicity of operational profile

 Activity 12: Conduct reliability growth testing

 Activity 13: Track testing progress and certify that reliability objectives are met

 Conduct feature test , regression test and performance and load test

 Conduct reliability growth test

[email protected] 52 Field Trial Phase

 Activity 14: Project additional testing needed

 Check accuracy of test: time and coverage

 Plan for changes in test strategies and methods

 Atiit15Activity 15: Cer tify tha t re lia bility obj ecti ves and release criteria are met

 Check accuracy of data collection

 Check whether test operational profile reflects field operational profile

 Check customer’s definition of failure matches with what was defined for testing the product

[email protected] 53 Post Delivery Phase /1

 Activity 16: Project post-release staff needs

 Cust omer’ s st aff f or syst em recovery; suppli er’ s st aff t o handle customer-reported failures and to remove faults  Activity 17: Monitor field reliabilityyj vs. objectives

 Collect post release failure data systematically  Activity 18: Track customer satisfaction with relia bility

 Survey product features with a sample customer set

[email protected] 54 Post Delivery Phase /2

 Activity 19: Time new feature introduction by monitoring reliability

 New features bring new defects. Add new features desired by the customers if they can be managed without sacrifici ng reli abili ty of th e wh ol e system  Activity 20: Guide product and process improvement with reliability measures

 Root-cause analysis for the faults

 Why the fault was not detected earlier in the development phase and what should be done to reduce the of introducing similar faults

[email protected] 55 Chapter 10 Section 2.3 PtiVitiPractice Variations

[email protected] 56 Existing vs. New Projects

 There is no essential difference between new and existing projects in applying SRE for the first time . However , determining failure intensity objective and operational profile for existing projects is easier.

 Most of the SRE activities will require only small updates after they have been completed once, e.g., operational profile should only be updated for the new operations added . (remember interaction factor)

 After SRE has been applied to one release, less effort is needed for succeeding releases, e.g., new test cases should be added to the existing ones.

[email protected] 57 Short-Cycle Projects

 Small projects or releases or those with short development cycles may require a modified set of SRE activities to keep costs low or activity durations short.  Reduction in cost and time can be obtained by limiting the number of elements in the operational profile and by accepting less precision .  Examples: Setting one operational mode and performing certification test rather than reliability growth test.

[email protected] 58 Cost Concerns

 There may be a training cost when starting to apply SRE.

 The principal cost in applying SRE is determining the operational profile .

 Another cost is associated with processing and analyzing failure data during reliability growth test .

 As most projects have multiple releases, the SRE cost drops sharply after initial release.

[email protected] 59 Practice Variation

 Defining an operational profile based on “customer modeling”.

 Automatic test cases generation based on frequency of use reflected in operational profile.

 Employing “cleanroom” development techniques together with feature and certification testing.

 Au tomati c t racki ng of reli abilit y growth .

 SRE for Agile software development.

[email protected] 60 Conclusions …

 Practical implementation of an effective SRE program is a non -trivial task.  Mechanisms for collection and analysis of data on software product and process quality must be in place.  Fault identification and elimination techniques must be in place.  Other organizational abilities such as the use of reviews and inspections, reliability based testing, and software process improvement are also necessary for effective SRE.  Quality oriented mindset and training are necessary!

[email protected] 61 Chapter 10 Section 3 SfSoftware QliQuality: Post Release & Maintenance Phase

[email protected] 62 Quality Assessment

 Post-release quality

assessment: Quality Assess evaluation, validation ment

Ref: Design for Electrical & Comp. Engineers, J.E. Salt et al., Wiley

63 Quality Assessment: Difficulties

Leonardo Pablo da Vinci Picasso

Mona Lisa Lisa Dorra Maar Maar (1479) (1937)

 Same requirements can lead to different systems

 Need to account for “creativity” in the “design” of the product and the “requirements” as well as the “product” itself

 Quality assessment method: RAM

64 How Do We Assess Quality?

Usual (ad-hoc) approach

Systematic approach: RAM

[email protected] 65 Inside RAM

 What is RAM?

RAM: RELIABILITY – AVAILABILITY –

A collection of techniques that quantifies the reliability, availability and maintainability of a complex system RAM analysis helps us answer questions related to dependability (i .e . reliability , safety , availability and maintainability) of the system

66 RAM: Advantages & Uses Can be used to understand

 Operation of the system - System reliability versus through-put rate requirements

 Safety of the system -Idifiblfildentifiable failure mod es which present an unacceptable consequence to facility workers or the public

 Improvements that can have substantial impacts on system performance - Recommendations for improving the safety and reliability of eqqpuipment/p rocesses.

[email protected] 67 RAM: Data Requirements

 Failure data

 Maintenance data

 Reliability and availability data from recognized industry standards (MTTF, MTBF & MTTR)

Data collection requires: • EiEngineeri ng experidjdtience and judgment • Interviews with engineering and maintenance personnel at the system site

[email protected] 68 Case Study RAM Analysis Dist r ibu te d Con tro l SSys tem o f the Bonnybrook Waste Water Treatment Plan t (Cit(City of C alg lary) )

[email protected] 69 Background

 The City of Calgary invested $100 million in the 1994 expansion of the Bonnybrook Wastewater Treatment Plant (WTP) to serve Calgary's growing population, which was 767,000 in 1996.

 This expansion increased the plant capacity by %25 to 500,000 cubic meter per day, while incorporating state-of-the- art treatment technologies .

 This study was performed in order to provide the City with an assessment of quality of the Distributed Control Systems (DCS) of the Bonnybrook WTP to be used as a guide for the next WTP plant at Pine Creek.

[email protected] 70 Background (cont’ d)

 The City’s WTP DCS is real-time, mission critical, dependable, safe and secure.

 However, the current qqyuality measures for City of Calgary’s WTP DCS is unknown.

 To successfully improve the safety and reliability for the next generation of WTP, which is built in Pine Creek, a study of current level of reliability and safety of the existing Bonnybrook WTP plant was prudent.

[email protected] 71 Assumptions & Hints a) Deal with both hardware (mechanical and electrical) and software failures b) Only deal with “failures”, not mandatory preventative maintenance or minor repairs where no components are replaced c) Components whose function is to wear and/or fail after a certain period of time (e.g., batteries, etc.), and regularly replaced items are not included in the analysis d) Probes, gauges, or transmitters whose purpose is to provide information to the user are not included

[email protected] 72 Assumptions (Cont’ d) e) Failures due to an improper installation of hardware/software are not included f) Missing parts are not considered failures (e.g., rivets, screws, bolts) g) Anything below the subsystem level is considered to be in series h) All subsystems are independent (i.e., loss of one subsystem does not result in loss of another sub syst em’ s f uncti onality) lit ) i) Failures are not distinguished based on their severity

[email protected] 73 RAM for Bonnybrook WWTP

Reason for conducting RAM analysis for Bonnybrook WWTP

Current scenario at Bonnybrook WWTP

Methods / Techniques used

Result of the analysis (How Reliable is Bonnybrook?)

Key value of RAM analysis to the City of Calgary

How to use the results (for current and future systems)

[email protected] 74 Why RAM?

 Reason for conducting RAM analysis for Bonnybrook WWTP What we know? What we DO NOT know? Current system runs smoothly Actual reliability of the system Minor failures can be repaired Cost of each maintenance easily (e.g. card frame change) Connection layout of components Impact of “minor” failures on the overall system Frequency of failure / maintenance - Accurate failure data - Accurate maintenance data Change i n cost / reli abilit y w ith the c hange in con figura tion Is the current system configuration good for next projects? Is the system serial or parallel? Are the comppponents inside each DCU serial or parallel? Can the change in layout change the performance?

[email protected] 75 Why RAM?

 Reason for conducting RAM analysis for BbkBonnybrook WWTP

 Better understand the system (system configuration)

 Better understand the impact of failure / faults of components on the system

 Establish ggyroundwork for Reliability-Availability- Maintainability measurement

 Study the method of data collection, fault / maintenance record keeping

 Design and develop tool to perform what-if scenario

[email protected] 76 RAM: Current Scenario

 Current scenario at Bonnybrook WWTP

 Reliability of components and the system as a whole is not measured

 Established method to measure the system reliability needs to be put in place

[email protected] 77 RAM: Methods Used

 Techniques are selected based on availability of data and tools

 Techniqqyues used in this analysis are:

 Reliability Block Diagram (RBD)

 Reliabilityy() Demonstration Chart (RDC)

 Fault Tree Analysis (FTA)

[email protected] 78 RAM: RBD

 A reliability block diagram is a graphical representtitation of fh how th e component s of a syst em are connected from reliability point of view

[email protected] 79 RAM: RDC

 RDC analysis is an efficient way of checking whe ther f ail ure in tensit y obj ecti ve (FIO) i s met or not.

[email protected] 80 RAM: FTA

 Fault tree analysis is a graphical representation of the major (critical) failures associated with a pp,roduct, the causes for the faults, and potential countermeasures.

[email protected] 81 Analysis: System Configuration

[email protected] 82 Analysis: Data Control System

RBD of The DCS layout

[email protected] 83 Analysis: Inside a DCU

Contains serial and parallel subsystems Configuration affects total system reliability

Total 10 Units

[email protected] 84 Analysis: Results

[email protected] 85 Analysis: Results

What We know? What we would Like to see? The exact layout of the DCS (inside-out) - More relevant failure data - More maintenance data Actual reliability of the current system Failure mode and their effects (FMEA) Cost and impact of “minor” failures on the overall system Change in cost / reliability with the change in configu ration Is the system serial or parallel? Are the components inside each DCU serial or par all el? Can the change in layout change performance? Is the current system/configuration fit to be used in the next projects?

[email protected] 86 RAM: Key Values From engineering point of view

 Current:

 Can understand what the system looks like inside-out

 Can use current system as benchmark for future system ’s performance

 Can change components and see their effects on reliability

 In future:

 Can be used to pinpoint single points of failures

 Can be used to effectively plan redundancy and refrain from “over engineering” and over spending (spending can be made at the right place to complement reliability and availability

[email protected] 87 RAM: Key Values From management point of view

 Can help perform what -if scenario evaluation

 Can help planning and design of future projects and plants

 Can helppp perform cost-value analysis on maintenance vs. replacement

 Can help make better decisions on system/ subsystem/ component purchase base d on re liabili ty d ata and i mpact on performance

 Can help compare systems/subsystems/components from several vendors.

 Can be used to plan procedures that need to be in place for dlliidata collection, maintenance as well llli as analysis purpose.

[email protected] 88 What Was Accomplished

 Defined benchmark value for reliability metrics for WTPs DCS componen ts (ven dors ’ da ta )

 Defined the architecture of the WTPs DCS

 Idifiddentified source of ffild failure data

 Stated ground rules and assumptions

 Identified the confidence level of estimation and predictions

 Collected failure and maintenance data for the current WTPs DCS

[email protected] 89 What Was Accomplished

 Analyzed data to identify proper distribution that modldels f filailure d dtata

 Performed goodness-of-fit and bias tests (using reliability demonstration charts , fault tree analysis , etc.) to validate distribution fit

 Estimated current system reliability

 Based on these:

 A reliability calculation chart to perform what -if analysis for various units of the system was developed

 A list of recommendations for reliability improvements of the WTPs DCS was produced

[email protected] 90 Conclusions

 System integration & manufacturing are not the final stfdltteps of a development process ( (ll)usually)

 Quality assessment of hardware/software system can be performed systematically using RAM

 Mechanisms for failure data collection and interpretation are necessary

 Engineering judgment (in selecting tools, techniques, interpreting data , etc .) is essential to the analysis

[email protected] 91 [email protected] 92