Reliability: Software Software Vs

Reliability Theory SENG 521 Re lia bility th eory d evel oped apart f rom th e mainstream of probability and statistics, and Software Reliability & was usedid primar ily as a tool to h hlelp Software Quality nineteenth century maritime and life iifiblinsurance companies compute profitable rates Chapter 5: Overview of Software to charge their customers. Even today, the Reliability Engineering terms “failure rate” and “hazard rate” are often used interchangeably. Department of Electrical & Computer Engineering, University of Calgary Probability of survival of merchandize after B.H. Far （[email protected]） 1 http://www. enel.ucalgary . ca/People/far/Lectures/SENG521/ ooene MTTF is R e 0.37 From Engineering Statistics Handbook [email protected] 1 [email protected] 2 Reliability: Natural System Reliability: Hardware Natural system Hardware life life cycle. cycle. Aging effect: Useful life span Life span of a of a hardware natural system is system is limited limited by the by the age (wear maximum out) of the system. reproduction rate of the cells. Figure from Pressman’s book Figure from Pressman’s book [email protected] 3 [email protected] 4 Reliability: Software Software vs. Hardware So ftware life cyc le. Software reliability doesn’t decrease with Software systems time, i.e., software doesn’t wear out. are changed (updated) many Hardware faults are mostly physical faults, times during their e. g., fatigue. life cycle. Each update adds to Software faults are mostly design faults the structural which are harder to measure, model, detect deterioration of the and correct. software system. Figure from Pressman’s book [email protected] 5 [email protected] 6 Software vs. Hardware Reliability: Science HdHardware filfailure can b“fid”bbe “fixed” by repl liacing a Exploring ways of implementing “reliability” faulty component with an identical one, therefore no reliability growth . in software products. Software problems can be “fixed” by changing the Reliability Science’s goals: code in order to have the failure not happen again , Dli“dl”(Developing “models” (regressi on and therefore reliability growth is present. aggregation models) and “techniques” to build Software does not go through production phase the relia ble soft ware. same way as hardware does. Testing such models and techniques for adequacy, Conclusion: hardware reliability models may not sounddldness and completeness. be used identically for software. [email protected] 7 [email protected] 8 What is Engineering? Reliability: Engineering /1 Engineering = What is the problem to be solved? Engineer ing of “reli abili ty” i n sof tware What characters of the entity are products. Analysis + used to solve the problem? Design + How will the entity be realized? Reliability Engineering’s goal: How is it constructed? Construction + developing software to reach the market What approach is used to uncover Verification + errors in design and construction? With “minimum” development time How w ill th e en tittity b e suppor te d i n With “ mi ni mum” d evel opment cost Management the long term? With “maximum” reliability With “minimum” expertise needed With “minimum” available technology [email protected] 9 [email protected] 10 Reliability: Engineering /2 What is SRE? /1 Software quality means getting the right So ftware Re lia bility Engi neeri ng (S RE) i s a mul ti - faceted discipline covering the software product balance among development cost , development lifecyc le. time, people, technology and reliability. It involves both technical and management Min imum & Max imum activities in three basic areas: Cost, Time, People, SRE Software Development and Maintenance Technology, Reliability Optimum Measurement and Analysis of reliability data Pick quantitative representations for the 5 factors (cost , Feedback of reliability information into the software time, people, technology and reliability) and measure lifecycle activities. them! [email protected] 11 [email protected] 12 What is SRE ? /2 SRE i s a practi ce f or quantit a tivel y pl anni ng and guiding software development and test, with emphasis on reliability and availability. SRE simultaneously does three things: It ensures that product reliability and availability meet user needs. Software Reliability It delivers the product to market faster. It increases productivity, lowering product life-cycle cost. Engineering (SRE) Process In applying SRE, one can vary relative emphasis placed on these three factors. [email protected] 14 [email protected] 13 Reference SRE: Process /1 Dr. Musa’s Software There are 5 s teps i n Reliability SRE process (for Engiiineering, 2 E d each system to test): Chapter 1 Define necessary reliability Develop operational profiles Prepare for test Execute test Apply failure data to guide decisions [email protected] 15 [email protected] 16 SRE: Process /2 SRE: Process /2 Modified version of the SRE Process The DlODevelop Opera tilPfiltional Profiles, and PfPrepare for Test activities all start during the Requirements (and perhaps architectural analysis) phase of the software development process. They all extend to varying degrees into the Design and Implementation phase, as they can be affected by it. The Execute Test and Guide Test activities coincide with the Test phase. Ref: Musa’s book 2nd Ed [email protected] 17 [email protected] 18 SRE: Necessary Reliability Failure Intensity Objective (FIO) DfiDefine w hat “fil“failure ” means f or th e so ftware prod uct. Failure intens ity ( λ)i) is de fined as fa ilure per natural Choose a common measure for all failure intensities, either units (or time), e.g. failures per some natural unit or failures per hour . 3 alarms per 100 hours of operation. Set the total system failure intensity objective (FIO) for the 5 failures per 1000 transactions, etc. software/hardware system. Failure intensity of a cascade (serial) system is the Compute a developed software FIO by subtracting the total of the FIOs of all hardware and acquired software sum of failure intensities for all of the components components from the system FIOs. of the system. Use the developpyed software FIOs to track the reliability For exponential model: growth during system test (later on). n ztsystem12 n i i1 [email protected] 19 [email protected] 20 How to Set FIO? Reliability vs. Failure Intensity Se tting FIO in terms o f sys tem re lia bility ( R) or ava ila bility (A): Reliability for 1 hour Failure intensity mission time ln R 1 R or for R 0.95 0.36800 1 failure / hour tt 0.90000 105 failure / 1000 hours 1 A 0. 95900 1f1 fail ure /d/ day 0.99000 10 failure / 1000 hours tAm 0. 99400 1 failure / week λ λ R is failure intensity 0.99860 1 failure / month R is reliability A 0.99900 1 failure / 1000 hours t is natural unit (time, etc.) 0.99989 1 failure / year tm is downtime per failure [email protected] 21 [email protected] 22 SRE: Operation SRE: Operational Mode An operation ijtliltkhihis a major system logical task, which OtildOperational mode idititttfis a distinct pattern of syst em returns control to the system when complete. use and/or set of environmental conditions that may An operation iittfftthis an input event affects the course of need separate testing due to likelihood of behavior of software. stimulating different failures. ElExample: operations for a We b proxy server Example: Connect internal users to external Web Time (time of year, day of week, time of day) EilitlEmail internal users t o ext ernal users Different user types (customer or user) Email external users to internal users Users experiences (novice or expert) DNS request by internal users The same operation may appear in different Etc. operational mode with different probabilities. [email protected] 23 [email protected] 24 SRE: Operational Profile SRE: System Operational Profile An operational profile is a complete set of operations with their StSystem opera tiltional profil e mustbt be deve lope dfd for a ll of fit its probabilities of occurrence (during the operational use of the software). important operational modes. An operational profile is a description of the distribution of input events There are four principal steps in developing an operational that is expected to occur in actual software operation. profile: The operational profile of the software reflects how it will be used in Identify the operation initiators (i.e., user types, external systems, and practice. Probability the sys tem itse lf) of occurrence List the operations invoked by each initiator Operational mode Determine the occurrence rates Determine the occurrence probabilities by dividing the occurrence rates by the total occurrence rate Operation [email protected] 25 [email protected] 26 SRE: Prepare for Test SRE: Execute Test The PfTtPrepare for Test activ ity uses th e operati onal Allocate test ti me among t he assoc iated systems an d profiles to prepare test cases and test procedures. types of test (feature, load, regression, etc.). Test cases are a llocat ed i n accord ance with th e Invoke the test cases at random times, choosing operational profile. operations randomly in accordance with the TtTest cases are ass igned dtth to the operati tibons by operational profile. selecting from all the possible intra-operation choices with equal probability. Identifyy,g failures, along with when they occur. The test procedure is the controller that invokes test This information will be used in Apply Failure Data cases during execution. and Guide Test . [email protected] 27 [email protected] 28 Types of Test SRE: Apply Failure Data Certification Test: Accept or reject (binary Plot each new failure as it occurs on a decision) an acquired component for a given target failure intensity. reliability demonstration chart. Feature (Unit) TestTest:: A single execution of an Accept or reject software (operations) using operation with interaction between operations miiinim ize d.

Reliability: Software Software Vs

An Architect's Guide to Site Reliability Engineering Nathaniel T

Chapter 6 Structural Reliability

Writing Quality Software

Studying the Feasibility and Importance of Software Testing: an Analysis

Training-Sre.Pdf

Software Reliability and Dependability: a Roadmap Bev Littlewood & Lorenzo Strigini

Manual on Quality Assurance for Computer Software Related to the Safety of Nuclear Power Plants

Software Quality Assurance Activities in Software Testing

Big Data for Reliability Engineering: Threat and Opportunity

Reducing Product Development Risk with Reliability Engineering Methods

Renesas' Synergy Software Quality Handbook

Site Reliability Engineering HOW GOOGLE RUNS PRODUCTION SYSTEMS