Quick viewing(Text Mode)

Reliability: Software Software Vs

Reliability: Software Software Vs

Theory SENG 521  Re liabilit y th eory d evel oped apart f rom th e mainstream of and , and Reliability & was usedid primaril y as a tool to h hlelp Software nineteenth century maritime and life iifiblinsurance companies compute profitable rates Chapter 5: Overview of Software to charge their customers. Even today, the Reliability terms “” and “hazard rate” are often used interchangeably. Department of Electrical & , University of Calgary  Probability of survival of merchandize after B.H. Far ([email protected]) 1 http://www. enel.ucalgary . ca/People/far/Lectures/SENG521/ ooene MTTF is R e 0.37 From Handbook

[email protected] 1 [email protected] 2

Reliability: Natural Reliability: Hardware

 Natural system  Hardware lif e life cycle. cycle.

 Aging effect:  Useful life span Life span of a of a hardware natural system is system is limited limited by the by the age (wear maximum out) of the system. reproduction rate of the cells.

Figure from Pressman’s book Figure from Pressman’s book

[email protected] 3 [email protected] 4

Reliability: Software Software vs. Hardware

 So ftware life cycl e.  Software reliability doesn’t decrease with  Software time, i.e., software doesn’t wear out. are changed (updated) many  Hardware faults are mostly physical faults, times during their e.g ., . life cycle.   Each update adds to Software faults are mostly faults the structural which are harder to measure, model, detect deterioration of the and correct. software system.

Figure from Pressman’s book

[email protected] 5 [email protected] 6 Software vs. Hardware Reliability: Science

 HdHardware filfailure can b b“fid”be “fixed” by repl liacing a  Exploring ways of implementing “reliability” faulty component with an identical one, therefore no reliability growth . in software products.  Software problems can be “fixed” by changing the  Reliability Science’s goals: code in order to have the failure not happen again ,  Dli“dl”(Developing “models” (regressi on and therefore reliability growth is present. aggregation models) and “techniques” to build  Software does not go through production phase the relia ble soft ware. same way as hardware does.  Testing such models and techniques for adequacy,  Conclusion: hardware reliability models may not sounddldness and completeness. be used identically for software.

[email protected] 7 [email protected] 8

What is Engineering? Reliability: Engineering /1

 Engineering =  What is the problem to be solved?  Engineeri ng of “reli abili ty” i n sof tware  What characters of the entity are products.  Analysis + used to solve the problem?  Design +  How will the entity be realized?  ’s goal:  How is it constructed?  Construction + developing software to reach the market  What approach is used to uncover  Verification + errors in design and construction?  With “minimum” development time  How w ill th e entity tit b e support ed i n  With “ mi ni mum” d evel opment cost  Management the long term?  With “maximum” reliability

 With “minimum” expertise needed

 With “minimum” available technology

[email protected] 9 [email protected] 10

Reliability: Engineering /2 What is SRE? /1

Software quality getting the right  So ftware Re lia bility Engi neeri ng (S RE) i s a mul ti - faceted discipline covering the software product balance among development cost , development lifecycl e. time, people, technology and reliability.  It involves both technical and management Mini mum & M ax imum activities in three basic areas: Cost, Time, People, SRE  and Technology, Reliability Optimum  Measurement and Analysis of reliability Pick quantitative representations for the 5 factors (cost ,  Feedback of reliability information into the software time, people, technology and reliability) and measure lifecycle activities. them!

[email protected] 11 [email protected] 12 What is SRE ? /2

 SRE i s a practi ce f or quantit ati vel y pl anni ng and guiding software development and test, with emphasis on reliability and availability.  SRE simultaneously does three things:

 It ensures that product reliability and availability meet user needs. Software Reliability  It delivers the product to market faster.  It increases productivity, lowering product life-cycle cost. Engineering (SRE) Process  In applying SRE, one can vary relative emphasis placed on these three factors.

[email protected] 14 [email protected] 13

Reference SRE: Process /1

 Dr. Musa’s Software  There are 5 st eps i n Reliability SRE process (for Engiiineering, 2 Ed each system to test):

 Chapter 1  Define necessary reliability

 Develop operational profiles

 Prepare for test

 Execute test

 Apply failure data to guide decisions

[email protected] 15 [email protected] 16

SRE: Process /2 SRE: Process /2

 Modified version of the SRE Process  The DlODevelop Opera tilPfiltional Profiles, and PfPrepare for Test activities all start during the (and perhaps architectural analysis) phase of the software development process.  They all extend to varying degrees into the Design and Implementation phase, as they can be affected by it.  The Execute Test and Guide Test activities coincide with the Test phase.

Ref: Musa’s book 2nd Ed

[email protected] 17 [email protected] 18 SRE: Necessary Reliability Failure Intensity Objective (FIO)

 DfiDefine w hat “fil“failure” means f or th e sof tware prod uct.  Failure i ntensi ty ( λ)i) is d efi ned as f ail ure per natural  Choose a common measure for all failure intensities, either units (or time), e.g. failures per some natural unit or failures per hour .  3 alarms per 100 hours of operation.  Set the total system failure intensity objective (FIO) for the  5 failures per 1000 transactions, etc. software/hardware system.  Failure intensity of a cascade (serial) system is the  Compute a developed software FIO by subtracting the total of the FIOs of all hardware and acquired software sum of failure intensities for all of the components components from the system FIOs. of the system.  Use the developpyed software FIOs to track the reliability  For exponential model: growth during system test (later on). n ztsystem12 n  i i1

[email protected] 19 [email protected] 20

How to Set FIO? Reliability vs. Failure Intensity

 Se tting FIO in terms o f sys tem re lia bility ( R) or ava ilabilit y (A): Reliability for 1 hour Failure intensity mission time ln R 1 R or for R 0.95 0.36800 1 failure / hour tt 0.90000 105 failure / 1000 hours 1 A 0. 95900 1f1 fail ure / /d day   0.99000 10 failure / 1000 hours tAm 0. 99400 1 failure / week λ λ R is failure intensity 0.99860 1 failure / month R is reliability A 0.99900 1 failure / 1000 hours t is natural unit (time, etc.) 0.99989 1 failure / year tm is downtime per failure

[email protected] 21 [email protected] 22

SRE: Operation SRE: Operational

 An operation ijtliltkhihis a major system logical task, which  OtildOperational mode idititttfis a distinct pattern of syst em returns control to the system when complete. use and/or set of environmental conditions that may  An operation iittfftthis an input event affects the course of need separate testing due to likelihood of behavior of software. stimulating different failures.  ElExample: operations f or a W eb proxy server  Example:   Connect internal users to external Web Time (time of year, day of week, time of day)   EilitlEmail internal users t o ext ernal users Different user types (customer or user)   Email external users to internal users Users experiences (novice or expert)

 DNS request by internal users  The same operation may appear in different

 Etc. operational mode with different .

[email protected] 23 [email protected] 24 SRE: Operational Profile SRE: System Operational Profile

 An operational profile is a complete set of operations with their  StSystem opera tiltional profil e mustbt be deve lope dfd for all of fit its probabilities of occurrence (during the operational use of the software). important operational modes.  An operational profile is a description of the distribution of input events  There are four principal steps in developing an operational that is expected to occur in actual software operation. profile:  The operational profile of the software reflects how it will be used in  Identify the operation initiators (i.e., user types, external systems, and practice. Probability the sys tem itse lf) of occurrence  List the operations invoked by each initiator

 Operational mode  Determine the occurrence rates

 Determine the occurrence probabilities by dividing the occurrence rates by the total occurrence rate

Operation

[email protected] 25 [email protected] 26

SRE: Prepare for Test SRE: Execute Test

 The PfTtPrepare for Test activ ity uses th e operati onal  Allocate test ti me among th e associ ated systems and profiles to prepare test cases and test procedures. types of test (feature, load, regression, etc.).  Test cases are all ocat ed i n accord ance with th e  Invoke the test cases at random times, choosing operational profile. operations randomly in accordance with the  TtTest cases are assi gned dtth to the operati tibons by operational profile. selecting from all the possible intra-operation choices with equal probability .  Identifyy,g failures, along with when they occur.  The test procedure is the controller that invokes test  This information will be used in Apply Failure Data cases during execution. and Guide Test .

[email protected] 27 [email protected] 28

Types of Test SRE: Apply Failure Data

 Certification Test: Accept or reject (binary  Plot each new failure as it occurs on a decision) an acquired component for a given target failure intensity. reliability demonstration chart.  Feature (Unit) TestTest:: A single execution of an  Accept or reject software (operations) using operation with between operations miiinimi zed . reliability demonstration chart .  Load Test: Testing with field use data and  Track reliability growth as faults are removed. accounting for interactions  Regression Test: Feature tests after every build involving significant change , i .e ., check whether a bug fix worked.

[email protected] 29 [email protected] 30 Release Criteria Collect Field Data Consider releasing the product when:  SRE f or th e soft ware pro duc t lif ecycl e.  Collect field data to use in succeeding releases either using 1. All acquired components pass certification automatic reporting routines or manual collection, using a test random sample of field sites.  Collect data on failure intensity and on customer satisfaction 2. Test t ermi nat ed sati sf act oril y f or all th e and use this information in setting the failure intensity product variations and components with the objective for the next release.  Measure operational profiles in the field and use this failure intensity reaching the target λF information to correct the operational profiles we estimated. For better confidence , we usually allow  Collect information to refine the ppgrocess of choosing reliability strategies in future . λ/λF ratio be below 0.5 (Confidence ft)factor)

[email protected] 31 [email protected] 32

However …

 Practical implementation of an effective SRE program is a non-trivial task.  Mechanisms for collection and analysis of data on software product and process quality must be in place.  Fault identification and elimination techniques must be in place.  Ot her organ izat ional abili ti es such as th e use of reviews and inspections, reliability based testing and software process improvement are also necessary for effective SRE.

[email protected] 33