Software Reliability and Dependability: a Roadmap Bev Littlewood & Lorenzo Strigini
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
An Architect's Guide to Site Reliability Engineering Nathaniel T
An Architect's Guide to Site Reliability Engineering Nathaniel T. Schutta @ntschutta ntschutta.io https://content.pivotal.io/ ebooks/thinking-architecturally Sofware development practices evolve. Feature not a bug. It is the agile thing to do! We’ve gone from devs and ops separated by a large wall… To DevOps all the things. We’ve gone from monoliths to service oriented to microserivces. And it isn’t all puppies and rainbows. Shoot. A new role is emerging - the site reliability engineer. Why? What does that mean to our teams? What principles and practices should we adopt? How do we work together? What is SRE? Important to understand the history. Not a new concept but a catchy name! Arguably goes back to the Apollo program. Margaret Hamilton. Crashed a simulator by inadvertently running a prelaunch program. That wipes out the navigation data. Recalculating… Hamilton wanted to add error-checking code to the Apollo system that would prevent this from messing up the systems. But that seemed excessive to her higher- ups. “Everyone said, ‘That would never happen,’” Hamilton remembers. But it did. Right around Christmas 1968. — ROBERT MCMILLAN https://www.wired.com/2015/10/margaret-hamilton-nasa-apollo/ Luckily she did manage to update the documentation. Allowed them to recover the data. Doubt that would have turned into a Hollywood blockbuster… Hope is not a strategy. But it is what rebellions are built on. Failures, uh find a way. Traditionally, systems were run by sys admins. AKA Prod Ops. Or something similar. And that worked OK. For a while. But look around your world today. -
Balancing Dependability Quality Attributes Relationships for Increased Embedded Systems Dependability
Master Thesis Software Engineering Thesis no: MSE-2009:17 September 2009 Balancing Dependability Quality Attributes Relationships for Increased Embedded Systems Dependability Saleh Al-Daajeh Supervisor: Professor Mikael Svahnberg School of Engineering Blekinge Institute of Technology Box 520 SE – 372 25 Ronneby Sweden This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 2 x 20 weeks of full time studies. Contact Information: Author: Saleh Al-Daajeh E-mail: [email protected] University advisor: Prof. Miakel Svahnberg School of Engineering Internet : www.bth.se/tek Blekinge Institute of Technology Phone : +46 457 385 000 Box 520 Fax : +46 457 271 25 SE – 372 25 Ronneby Sweden II Abstract Embedded systems are used in many critical applications of our daily life. The increased complexity of embedded systems and the tightened safety regulations posed on them and the scope of the environment in which they operate are driving the need for more dependable embedded systems. Therefore, achieving a high level of dependability to embedded systems is an ultimate goal. In order to achieve this goal we are in need of understanding the interrelationships between the different dependability quality attributes and other embedded systems’ quality attributes. This research study provides indicators of the relationship between the dependability quality attributes and other quality attributes for embedded systems by identify- ing the impact of architectural tactics as the candidate solutions to construct dependable embedded systems. III Acknowledgment I would like to express my gratitude to all those who gave me the possibility to complete this thesis. -
Chapter 6 Structural Reliability
MIL-HDBK-17-3E, Working Draft CHAPTER 6 STRUCTURAL RELIABILITY Page 6.1 INTRODUCTION ....................................................................................................................... 2 6.2 FACTORS AFFECTING STRUCTURAL RELIABILITY............................................................. 2 6.2.1 Static strength.................................................................................................................... 2 6.2.2 Environmental effects ........................................................................................................ 3 6.2.3 Fatigue............................................................................................................................... 3 6.2.4 Damage tolerance ............................................................................................................. 4 6.3 RELIABILITY ENGINEERING ................................................................................................... 4 6.4 RELIABILITY DESIGN CONSIDERATIONS ............................................................................. 5 6.5 RELIABILITY ASSESSMENT AND DESIGN............................................................................. 6 6.5.1 Background........................................................................................................................ 6 6.5.2 Deterministic vs. Probabilistic Design Approach ............................................................... 7 6.5.3 Probabilistic Design Methodology..................................................................................... -
A Reasoning Framework for Dependability in Software Architectures Tacksoo Im Clemson University, [email protected]
Clemson University TigerPrints All Dissertations Dissertations 12-2010 A Reasoning Framework for Dependability in Software Architectures Tacksoo Im Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Part of the Computer Sciences Commons Recommended Citation Im, Tacksoo, "A Reasoning Framework for Dependability in Software Architectures" (2010). All Dissertations. 618. https://tigerprints.clemson.edu/all_dissertations/618 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. A Reasoning Framework for Dependability in Software Architectures A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Tacksoo Im August 2010 Accepted by: Dr. John D. McGregor, Committee Chair Dr. Harold C. Grossman Dr. Jason O. Hallstrom Dr. Pradip K. Srimani Abstract The degree to which a software system possesses specified levels of software quality at- tributes, such as performance and modifiability, often have more influence on the success and failure of those systems than the functional requirements. One method of improving the level of a software quality that a product possesses is to reason about the structure of the software architecture in terms of how well the structure supports the quality. This is accomplished by reasoning through software quality attribute scenarios while designing the software architecture of the system. As society relies more heavily on software systems, the dependability of those systems be- comes critical. -
Budgen, Software Design Methods
David Budgen The Loyal Opposition Software Design Methods: Life Belt or Leg Iron? o software design methods have a correctly means “study of method.”) To address, but future? In introducing the January- not necessarily answer, this question, I’ll first consider D February 1998 issue of IEEE Software,Al what designing involves in a wider context, then com- Davis spoke of the hazards implicit in pare this with what we do, and finally consider what “method abuse,”manifested by a desire this might imply for the future. to “play safe.”(If things go well, you can take the credit, but if they go wrong, the organization’s choice of method can take the blame.) As Davis argues, such a THE DESIGN PROCESS policy will almost certainly lead to our becoming builders of what he terms “cookie-cutter, low-risk, low- Developing solutions to problems is a distinguish- payoff, mediocre systems.” ing human activity that occurs in many spheres of life. The issue I’ll explore in this column is slightly dif- So, although the properties of software-based systems ferent, although it’s also concerned with the problems offer some specific problems to the designer (such as that the use of design methods can present. It can be software’s invisibility and its mix of static and dynamic expressed as a question: Will the adoption of a design properties), as individual design characteristics, these method help the software development process (the properties are by no means unique. Indeed, while “life belt” role), or is there significant risk that its use largely ignored by software engineers, the study of the will lead to suboptimum solutions (the “leg iron”role)? nature of design activities has long been established Robert L. -
Fundamental Concepts of Dependability
Fundamental Concepts of Dependability Algirdas Avizˇ ienis Jean-Claude Laprie Brian Randell UCLA Computer Science Dept. LAAS-CNRS Dept. of Computing Science Univ. of California, Los Angeles Toulouse Univ. of Newcastle upon Tyne USA France U.K. UCLA CSD Report no. 010028 LAAS Report no. 01-145 Newcastle University Report no. CS-TR-739 LIMITED DISTRIBUTION NOTICE This report has been submitted for publication. It has been issued as a research report for early peer distribution. Abstract Dependability is the system property that integrates such attributes as reliability, availability, safety, security, survivability, maintainability. The aim of the presentation is to summarize the fundamental concepts of dependability. After a historical perspective, definitions of dependability are given. A structured view of dependability follows, according to a) the threats, i.e., faults, errors and failures, b) the attributes, and c) the means for dependability, that are fault prevention, fault tolerance, fault removal and fault forecasting. he protection and survival of complex information systems that are embedded in the infrastructure supporting advanced society has become a national and world-wide concern of the 1 Thighest priority . Increasingly, individuals and organizations are developing or procuring sophisticated computing systems on whose services they need to place great reliance — whether to service a set of cash dispensers, control a satellite constellation, an airplane, a nuclear plant, or a radiation therapy device, or to maintain the confidentiality of a sensitive data base. In differing circumstances, the focus will be on differing properties of such services — e.g., on the average real-time response achieved, the likelihood of producing the required results, the ability to avoid failures that could be catastrophic to the system's environment, or the degree to which deliberate intrusions can be prevented. -
Training-Sre.Pdf
C om p lim e nt s of Training Site Reliability Engineers What Your Organization Needs to Create a Learning Program Jennifer Petoff, JC van Winkel & Preston Yoshioka with Jessie Yang, Jesus Climent Collado & Myk Taylor REPORT Want to know more about SRE? To learn more, visit google.com/sre Training Site Reliability Engineers What Your Organization Needs to Create a Learning Program Jennifer Petoff, JC van Winkel, and Preston Yoshioka, with Jessie Yang, Jesus Climent Collado, and Myk Taylor Beijing Boston Farnham Sebastopol Tokyo Training Site Reliability Engineers by Jennifer Petoff, JC van Winkel, and Preston Yoshioka, with Jessie Yang, Jesus Climent Collado, and Myk Taylor Copyright © 2020 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more infor‐ mation, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquistions Editor: John Devins Proofreader: Charles Roumeliotis Development Editor: Virginia Wilson Interior Designer: David Futato Production Editor: Beth Kelly Cover Designer: Karen Montgomery Copyeditor: Octal Publishing, Inc. Illustrator: Rebecca Demarest November 2019: First Edition Revision History for the First Edition 2019-11-15: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492076001 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Training Site Reli‐ ability Engineers, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. -
Dependability Assessment of Software- Based Systems: State of the Art
Dependability Assessment of Software- based Systems: State of the Art Bev Littlewood Centre for Software Reliability, City University, London [email protected] You can pick up a copy of my presentation here, if you have a lap-top ICSE2005, St Louis, May 2005 - slide 1 Do you remember 10-9 and all that? • Twenty years ago: much controversy about need for 10-9 probability of failure per hour for flight control software – could you achieve it? could you measure it? – have things changed since then? ICSE2005, St Louis, May 2005 - slide 2 Issues I want to address in this talk • Why is dependability assessment still an important problem? (why haven’t we cracked it by now?) • What is the present position? (what can we do now?) • Where do we go from here? ICSE2005, St Louis, May 2005 - slide 3 Why do we need to assess reliability? Because all software needs to be sufficiently reliable • This is obvious for some applications - e.g. safety-critical ones where failures can result in loss of life • But it’s also true for more ‘ordinary’ applications – e.g. commercial applications such as banking - the new Basel II accords impose risk assessment obligations on banks, and these include IT risks – e.g. what is the cost of failures, world-wide, in MS products such as Office? • Gloomy personal view: where it’s obvious we should do it (e.g. safety) it’s (sometimes) too difficult; where we can do it, we don’t… ICSE2005, St Louis, May 2005 - slide 4 What reliability levels are required? • Most quantitative requirements are from safety-critical systems. -
Software Design Document 1
SOFTWARE DESIGN DOCUMENT 1. Introduction The following subsections of the Software Design Document (SDD) should provide an overview of the entire SDD. 1.1 Purpose This subsection should explain the purpose of the SDD and specify the intended audience for it. The SDD described the software structure, software components, interfaces and data necessary for the implementation phase. Each requirement in the SRS should be traceable to one or more design entities in the SDD. 1.2 Scope This subsection should relate the design document to the SRS and to the software to be developed. 1.3 Definitions, Acronyms and Abbreviations This subsection should provide the definitions of all terms, acronyms and abbreviations that are used in the SDD. 2. References This subsection should provide a complete list of all documents referenced in the SDD. It should identify these references by title, report number, date and publishing organization. It should also specify the sources from which these references are available. 3. Attributes of Design Entities There are some attributes common to all entities, regardless of the approach utilized, whether procedural or object-oriented. These are used in subsections 4 and later. 3.1 Identification The name of the entity should be specified. Two entities should not have the same name. 3.2 Type The type attribute should describe the nature of the entity. It may simply name the kind of entity, such as subprogram, module, procedure, process, data item, object etc. Alternatively, design entities can be grouped, in order to assist in locating an entity dealing with a particular type of information. -
Reliability: Software Software Vs
Reliability Theory SENG 521 Re lia bility th eory d evel oped apart f rom th e mainstream of probability and statistics, and Software Reliability & was usedid primar ily as a tool to h hlelp Software Quality nineteenth century maritime and life iifiblinsurance companies compute profitable rates Chapter 5: Overview of Software to charge their customers. Even today, the Reliability Engineering terms “failure rate” and “hazard rate” are often used interchangeably. Department of Electrical & Computer Engineering, University of Calgary Probability of survival of merchandize after B.H. Far ([email protected]) 1 http://www. enel.ucalgary . ca/People/far/Lectures/SENG521/ ooene MTTF is R e 0.37 From Engineering Statistics Handbook [email protected] 1 [email protected] 2 Reliability: Natural System Reliability: Hardware Natural system Hardware life life cycle. cycle. Aging effect: Useful life span Life span of a of a hardware natural system is system is limited limited by the by the age (wear maximum out) of the system. reproduction rate of the cells. Figure from Pressman’s book Figure from Pressman’s book [email protected] 3 [email protected] 4 Reliability: Software Software vs. Hardware So ftware life cyc le. Software reliability doesn’t decrease with Software systems time, i.e., software doesn’t wear out. are changed (updated) many Hardware faults are mostly physical faults, times during their e. g., fatigue. life cycle. Each update adds to Software faults are mostly design faults the structural which are harder to measure, model, detect deterioration of the and correct. software system. Figure from Pressman’s book [email protected] 5 [email protected] 6 Software vs. -
Software Design Document (SDD) Template
Software Design Document (SDD) Template Software design is a process by which the software requirements are translated into a representation of software components, interfaces, and data necessary for the implementation phase. The SDD shows how the software system will be structured to satisfy the requirements. It is the primary reference for code development and, therefore, it must contain all the information required by a programmer to write code. The SDD is performed in two stages. The first is a preliminary design in which the overall system architecture and data architecture is defined. In the second stage, i.e. the detailed design stage, more detailed data structures are defined and algorithms are developed for the defined architecture. This template is an annotated outline for a software design document adapted from the IEEE Recommended Practice for Software Design Descriptions. The IEEE Recommended Practice for Software Design Descriptions have been reduced in order to simplify this assignment while still retaining the main components and providing a general idea of a project definition report. For your own information, please refer to IEEE Std 10161998 1 for the full IEEE Recommended Practice for Software Design Descriptions. 1 http://www.cs.concordia.ca/~ormandj/comp354/2003/Project/ieeeSDD.pdf Downloaded from http://www.tidyforms.com (Team Name) (Project Title) Software Design Document Name (s): Lab Section: Workstation: Date: (mm/dd/yyyy) Downloaded from http://www.tidyforms.com Software Design Document TABLE OF CONTENTS 1. INTRODUCTION 2 1.1 Purpose 2 1.2 Scope 2 1.3 Overview 2 1.4 Reference Material 2 1.5 Definitions and Acronyms 2 2. -
Design by Contract: the Lessons of Ariane
. Editor: Bertrand Meyer, EiffelSoft, 270 Storke Rd., Ste. 7, Goleta, CA 93117; voice (805) 685-6869; [email protected] several hours (at least in earlier versions of Ariane), it was better to let the computa- tion proceed than to stop it and then have Design by to restart it if liftoff was delayed. So the SRI computation continues for 50 seconds after the start of flight mode—well into the flight period. After takeoff, of course, this com- Contract: putation is useless. In the Ariane 5 flight, Object Technology however, it caused an exception, which was not caught and—boom. The exception was due to a floating- point error during a conversion from a 64- The Lessons bit floating-point value, representing the flight’s “horizontal bias,” to a 16-bit signed integer: In other words, the value that was converted was greater than what of Ariane can be represented as a 16-bit signed inte- ger. There was no explicit exception han- dler to catch the exception, so it followed the usual fate of uncaught exceptions and crashed the entire software, hence the onboard computers, hence the mission. This is the kind of trivial error that we Jean-Marc Jézéquel, IRISA/CNRS are all familiar with (raise your hand if you Bertrand Meyer, EiffelSoft have never done anything of this sort), although fortunately the consequences are usually less expensive. How in the world everal contributions to this made up of respected experts from major department have emphasized the European countries, which produced a How in the world could importance of design by contract report in hardly more than a month.