Software Fault Tolerance: a Tutorial

Total Page:16

File Type:pdf, Size:1020Kb

Software Fault Tolerance: a Tutorial NASA / TM-2000-210616 Software Fault Tolerance: A Tutorial Wilfredo Torres-Pomales Langley Research Center, Hampton, Virginia October 2000 The NASA STI Program Office ... in Profile Since its founding, NASA has been dedicated to CONFERENCE PUBLICATION. Collected the advancement of aeronautics and space papers from scientific and technical science. The NASA Scientific and Technical conferences, symposia, seminars, or other Information (STI) Program Office plays a key meetings sponsored or co-sponsored by part in helping NASA maintain this important NASA. role. SPECIAL PUBLICATION. Scientific, The NASA STI Program Office is operated by technical, or historical information from Langley Research Center, the lead center for NASA programs, projects, and missions, NASA's scientific and technical information. The often concerned with subjects having NASA STI Program Office provides access to the substantial public interest. NASA STI Database, the largest collection of aeronautical and space science STI in the world. TECHNICAL TRANSLATION. English- The Program Office is also NASA's institutional language translations of foreign scientific mechanism for disseminating the results of its and technical material pertinent to NASA's research and development activities. These mission. results are published by NASA in the NASA STI Report Series, which includes the following Specialized services that complement the STI report types: Program Office's diverse offerings include creating custom thesauri, building customized TECHNICAL PUBLICATION. Reports of databases, organizing and publishing research completed research or a major significant results ... even providing videos. phase of research that present the results of NASA programs and include extensive For more information about the NASA STI data or theoretical analysis. Includes Program Office, see the following: compilations of significant scientific and technical data and information deemed to • Access the NASA STI Program Home Page be of continuing reference value. NASA at http'//www.sti.nasa.gov counterpart of peer-reviewed formal professional papers, but having less • E-mail your question via the Internet to stringent limitations on manuscript length [email protected] and extent of graphic presentations. • Fax your question to the NASA STI Help TECHNICAL MEMORANDUM. Scientific Desk at (301) 621-0134 and technical findings that are preliminary or of specialized interest, e.g., quick release • Phone the NASA STI Help Desk at reports, working papers, and (301) 621-0390 bibliographies that contain minimal annotation. Does not contain extensive Write to: analysis. NASA STI Help Desk NASA Center for AeroSpace Information CONTRACTOR REPORT. Scientific and 7121 Standard Drive technical findings by NASA-sponsored Hanover, MD 21076-1320 contractors and grantees. NASA / TM-2000-210616 Software Fault Tolerance: A Tutorial Wilfredo Torres-Pomales Langley Research Center, Hampton, Virginia National Aeronautics and Space Administration Langley Research Center Hampton, Virginia 23681-2199 October 2000 Available from: NASA Center for AeroSpace Information (CASI) National Technical Information Service (NTIS) 7121 Standard Drive 5285 Port Royal Road Hanover, MD 21076-1320 Springfield, VA 22161-2171 (301) 621-0390 (703) 605-6000 Abstract Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. This paper presents a review of software fault tolerance. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, at least one of the others should provide an acceptable output. Recovery blocks, N- version programming, N self-checking programming, consensus recovery blocks, and t/(n-1) techniques are reviewed. Current research in software engineering focuses on establishing patterns in the software structure and trying to understand the practice of software engineering. It is expected that software fault tolerance research will benefit from this research by enabling greater predictability of the dependability of software. iii Acknowledgements I would like to acknowledge my gratitude to Dr. F. Gail Gray of the Virginia Polytechnic Institute and State University (VA Tech) for providing advice during the preparation and reviewing of this report. I also wish to acknowledge my appreciation to Mr. Ricky Butler of the NASA Langley Research Center for his help in reviewing the final version of this report. iv Table of Contents 1. INTRODUCTION .............................................................................................. 1 2. SOFTWARE DEVELOPMENT ......................................................................... 2 3. SOFTWARE DESIGN FAULTS ....................................................................... 6 4. SOFTWARE FAULT TOLERANCE ................................................................. 8 4.1. Single-Version Software Fault Tolerance Techniques ......................... 9 4.1.1. Software Structure and Actions ............................................................................. 9 4.1.2. Error Detection ..................................................................................................... 10 4.1.3. Exception Handling ............................................................................................. 12 4.1.4. Checkpoint and Restart ........................................................................................ 12 4.1.5. Process Pairs ........................................................................................................ 13 4.1.6. Data Diversity ...................................................................................................... 14 4.1.6. Considerations on the use of Checkpointing ....................................................... 16 4.2. Multi-Version Software Fault Tolerance Techniques .......................... 17 4.2.1. Recovery Blocks .................................................................................................. 17 4.2.2. N-Version Programming ...................................................................................... 18 4.2.3. N Self-Checking Programming ............................................................................ 19 4.2.4. Consensus Recovery Blocks ................................................................................ 20 4.2.6. t/(n- 1)-Variant Programming ............................................................................... 21 4.2.7. Additional Considerations ................................................................................... 21 4.2.7.1. Multi-Version Software Development .......................................................... 21 4.2.7.2. Output Selection Algorithms ......................................................................... 23 4.3. Fault Tolerance in Operating Systems ............................................ 25 4.4. Software Fault Injection for Fault Tolerance Assessment .................. 27 V 5. HARDWARE AND SOFTWARE FAULT TOLERANCE ................................ 28 5.1. Computer Fault Tolerance ............................................................. 28 5.2. Examples of Fault Tolerant Architectures ........................................ 34 5.2.1. B777 Primary Flight Control Computer .............................................................. 35 5.2.2. AIRBUS A320/A330/A340 Flight Control Computer ........................................ 37 6. SUMMARY AND CONCLUDING REMARKS ................................................ 38 7. BIBLIOGRAPHY ............................................................................................ 39 vi List of Figures Figure 1: Linear Sequential Process Model .................................................................................................... 3 Figure 2: Prototyping Process Model ............................................................................................................. 3 Figure 3: Logical Representation of Checkpoint and Restart ....................................................................... 13 Figure 4: Logical Representation of Process Pairs ....................................................................................... 14 Figure 5: Checkpoint and Restart using Data Diversity (with Input Re-Expression Model) ........................ 15 Figure 6: Data Diversity using Input Data
Recommended publications
  • The Google File System (GFS)
    The Google File System (GFS), as described by Ghemwat, Gobioff, and Leung in 2003, provided the architecture for scalable, fault-tolerant data management within the context of Google. These architectural choices, however, resulted in sub-optimal performance in regard to data trustworthiness (security) and simplicity. Additionally, the application- specific nature of GFS limits the general scalability of the system outside of the specific design considerations within the context of Google. SYSTEM DESIGN: The authors enumerate their specific considerations as: (1) commodity components with high expectation of failure, (2) a system optimized to handle relatively large files, particularly multi-GB files, (3) most writes to the system are concurrent append operations, rather than internally modifying the extant files, and (4) a high rate of sustained bandwidth (Section 1, 2.1). Utilizing these considerations, this paper analyzes the success of GFS’s design in achieving a fault-tolerant, scalable system while also considering the faults of the system with regards to data-trustworthiness and simplicity. Data-trustworthiness: In designing the GFS system, the authors made the conscious decision not prioritize data trustworthiness by allowing applications to access ‘stale’, or not up-to-date, data. In particular, although the system does inform applications of the chunk-server version number, the designers of GFS encouraged user applications to cache chunkserver information, allowing stale data accesses (Section 2.7.2, 4.5). Although possibly troubling
    [Show full text]
  • Introspection-Based Verification and Validation
    Introspection-Based Verification and Validation Hans P. Zima Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109 E-mail: [email protected] Abstract This paper describes an introspection-based approach to fault tolerance that provides support for run- time monitoring and analysis of program execution. Whereas introspection is mainly motivated by the need to deal with transient faults in embedded systems operating in hazardous environments, we show in this paper that it can also be seen as an extension of traditional V&V methods for dealing with program design faults. Introspection—a technology supporting runtime monitoring and analysis—is motivated primarily by dealing with faults caused by hardware malfunction or environmental influences such as radiation and thermal effects [4]. This paper argues that introspection can also be used to handle certain classes of faults caused by program design errors, complementing the classical approaches for dealing with design errors summarized under the term Verification and Validation (V&V). Consider a program, P , over a given input domain, and assume that the intended behavior of P is defined by a formal specification. Verification of P implies a static proof—performed before execution of the program—that for all legal inputs, the result of applying P to a legal input conforms to the specification. Thus, verification is a methodology that seeks to avoid faults. Model checking [5, 3] refers to a special verification technology that uses exploration of the full state space based on a simplified program model. Verification techniques have been highly successful when judiciously applied under the right conditions in well-defined, limited contexts.
    [Show full text]
  • Fault-Tolerant Components on AWS
    Fault-Tolerant Components on AWS November 2019 This paper has been archived For the latest technical information, see the AWS Whitepapers & Guides page: Archivedhttps://aws.amazon.com/whitepapers Notices Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. Archived Contents Introduction .......................................................................................................................... 1 Failures Shouldn’t Be THAT Interesting ............................................................................. 1 Amazon Elastic Compute Cloud ...................................................................................... 1 Elastic Block Store ........................................................................................................... 3 Auto Scaling ....................................................................................................................
    [Show full text]
  • MAIB Annual Report 2020
    2020 Marine Accident Recommendations and Statistics ANNUAL REPORT ANNUAL H NC RA N B IO GAT TI S INVE T DEN ACCI This document is posted on our website: www.gov.uk/maib Marine Accident Investigation Branch Email: [email protected] RINE First Floor, Spring Place, 105 Commercial Road Telephone: 023 8039 5500 MA Southampton, SO15 1GH United Kingdom 9 June 2021 MARINE ACCIDENT INVESTIGATION BRANCH The Marine Accident Investigation Branch (MAIB) examines and investigates all types of marine accidents to or on board UK vessels worldwide, and other vessels in UK territorial waters. Located in ofices in Southampton, the MAIB is a separate, independent branch within the Department for Transport (DfT). The head of the MAIB, the Chief Inspector of Marine Accidents, reports directly to the Secretary of State for Transport. © Crown copyright 2021 This publication, excluding any logos, may be reproduced free of charge in any format or medium for research, private study or for internal circulation within an organisation. This is subject to it being reproduced accurately and not used in a misleading context. The material must be acknowledged as Crown copyright and the title of the publication specified. Details of third party copyright material may not be listed in this publication. Details should be sought in the corresponding accident investigation report or [email protected] contacted for permission to use. CONTENTS CHIEF INSPECTOR'S STATEMENT 1 PART 1 - 2020: CASUALTY REPORTS TO MAIB 5 Statistical overview 5 Summary of investigations started
    [Show full text]
  • Network Reliability and Fault Tolerance
    Network Reliability and Fault Tolerance Muriel Medard´ [email protected] Laboratory for Information and Decision Systems Room 35-212 Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA 02139 Steven S. Lumetta [email protected] Coordinated Science Laboratory University of Illinois Urbana-Champaign 1308 W. Main Street, Urbana, IL 61801 1 Introduction The majority of communications applications, from cellular telephone conversations to credit card transactions, assume the availability of a reliable network. At this level, data are expected to tra- verse the network and to arrive intact at their destination. The physical systems that compose a network, on the other hand, are subjected to a wide range of problems, ranging from signal distor- tion to component failures. Similarly, the software that supports the high-level semantic interface 1 often contains unknown bugs and other latent reliability problems. Redundancy underlies all ap- proaches to fault tolerance. Definitive definitions for all concepts and terms related to reliability, and, more broadly, dependability, can be found in [AAC+92]. Designing any system to tolerate faults first requires the selection of a fault model, a set of possible failure scenarios along with an understanding of the frequency, duration, and impact of each scenario. A simple fault model merely lists the set of faults to be considered; inclusion in the set is decided based on a combination of expected frequency, impact on the system, and feasibility or cost of providing protection. Most reliable network designs address the failure of any single component, and some designs tolerate multiple failures. In contrast, few attempt to handle the adversarial conditions that might occur in a terrorist attack, and cataclysmic events are almost never addressed at any scale larger than a city.
    [Show full text]
  • Critical Systems
    Critical Systems ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 1 Objectives ● To explain what is meant by a critical system where system failure can have severe human or economic consequence. ● To explain four dimensions of dependability - availability, reliability, safety and security. ● To explain that, to achieve dependability, you need to avoid mistakes, detect and remove errors and limit damage caused by failure. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 2 Topics covered ● A simple safety-critical system ● System dependability ● Availability and reliability ● Safety ● Security ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 3 Critical Systems ● Safety-critical systems • Failure results in loss of life, injury or damage to the environment; • Chemical plant protection system; ● Mission-critical systems • Failure results in failure of some goal-directed activity; • Spacecraft navigation system; ● Business-critical systems • Failure results in high economic losses; • Customer accounting system in a bank; ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 4 System dependability ● For critical systems, it is usually the case that the most important system property is the dependability of the system. ● The dependability of a system reflects the user’s degree of trust in that system. It reflects the extent of the user’s confidence that it will operate as users expect and that it will not ‘fail’ in normal use. ● Usefulness and trustworthiness are not the same thing. A system does not have to be trusted to be useful. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 5 Importance of dependability ● Systems that are not dependable and are unreliable, unsafe or insecure may be rejected by their users.
    [Show full text]
  • Ask Magazine | 1
    National Aeronautics and Space Administration Academy Sharing Knowledge The NASA Source for Project Management and Engineering Excellence | APPEL SUMMER | 2013 Inside THE KNOWLEDGE MANAGEMENT JOURNEY KSC SWAMP WORKS PREDICTABLE PROJECT SURPRISES gnarta SuhsoJn amrir AoineS, ecroFr i. AS.: Utidero CtohP Utidero AS.: i. ecroFr AoineS, amrir SuhsoJn gnarta ON THE COVER Above Bear Lake, Alaska, the Northern Lights, or aurora borealis, are created by solar radiation entering the atmosphere at the magnetic poles. The appearance of these lights is just one way solar radiation affects us; it can also interfere with NASA missions in low-Earth orbit. To achieve long-duration human spaceflight missions in deeper space, several NASA centers are working to find better safety measures and solutions to protect humans from space radiation. ASK MAGAZINE | 1 Contents 21 5 9 DEPARTMENTS INSIGHTS STORIES 3 9 5 In This Issue Predictable Surprises: Back to the Future: BY DON COHEN Bridging Risk-Perception Gaps KSC Swamp Works BY PEDRO C. RIBEIRO Concerns about BY KERRY ELLIS Kennedy’s new Swamp project-threatening risks are often ignored Works encourages innovation through 4 or unspoken. experimentation and knowledge sharing. From the NASA CKO Our Knowledge Legacy BY ED HOFFMAN 21 13 The Knowledge Management Journey University Capstone Projects: BY EDWARD W. ROGERS Goddard’s CKO Small Investments, Big Rewards 42 reflects on what ten years on the job have BY LAURIE STAUBER Glenn’s partnership The Knowledge Notebook taught him. with universities generates first-rate Big Data—The Latest Organizational space-medicine research. Idea-Movement BY LAURENCE PRUSAK 25 Creating NASA’s Knowledge Map 17 BY MATTHEW KOHUT AND HALEY STEPHENSON One Person’s Mentoring Experience 44 A new online tool helps show who knows BY NATALIE HENRICH The author explains ASK Interactive what at NASA.
    [Show full text]
  • Fault Tolerance
    Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Page Faults • Deviation from expected behavior • Due to a variety of factors: – Hardware failure – Software bugs – Operator errors – Network errors/outages Page A fault in a system is some deviation from the expected behavior of the system -- a malfunction. Faults may be due to a variety of factors, including hardware, software, operator (user), and network errors. Faults • Three categories – transient faults – intermittent faults – permanent faults • Any fault may be – fail-silent (fail-stop) – Byzantine • synchronous system vs. asynchronous system – E.g., IP packet versus serial port transmission Page Faults can be classified into one of three categories: transient faults: these occur once and then disappear. For example, a network message transmission times out but works fine when attempted a second time. Intermittent faults: these are the most annoying of component faults. This fault is characterized by a fault occurring, then vanishing again, then occurring, … An example of this kind of fault is a loose connection. Permanent faults: this fault is persistent: it continues to exist until the faulty component is repaired or replaced. Examples of this fault are disk head crashes, software bugs, and burnt-out hardware. Any of these faults may be either a fail-silent failure (also known as fail-stop) or a Byzantine failure. A fail-silent fault is one where the faulty unit stops functioning and produces no ill output (it produces no output or produces output to indicate failure). A Byzantine fault is one where the faulty unit continues to run but produces incorrect results.
    [Show full text]
  • |||GET||| Group Interaction in High Risk Environments 1St Edition
    GROUP INTERACTION IN HIGH RISK ENVIRONMENTS 1ST EDITION DOWNLOAD FREE Rainer Dietrich | 9781351932097 | | | | | Looking for other ways to read this? Computational methods such as resampling and bootstrapping are also used when data are insufficient for direct statistical methods. There is little point in making highly precise computer calculations on numbers that cannot be estimated accurately Group Interaction in High Risk Environments 1st edition. Categories : Underwater diving safety. Thus, the surface-supplied diver is much less likely to have an "out-of-air" emergency than a scuba diver as there are normally two alternative air sources available. Methods for dealing with Group Interaction in High Risk Environments 1st edition risks include Provision for adequate contingencies safety factors for budget and schedule contingencies are discussed in Chapter 6. This genetic substudy included Two Sister Study participants and their parents, and some Sister Study participants who developed incident breast cancer invasive or DCIS before age 50 during the follow-up period. In terms of generalizability, our sample is predominantly non-Hispanic white women, and the results may not apply to other racial or ethnic groups. This third parameter may give some management support by establishing early warning indicators for specific serious risks, which might not otherwise have been established. ROVs may be used together with divers, or without a diver in the water, in which case the risk to the diver associated with the dive is eliminated altogether. Suitable equipment can be selected, personnel can be trained in its use and support provided to manage the foreseeable contingencies. However, taken together, there is the possibility that many of the estimates of these factors would prove to be too optimistic, leading.
    [Show full text]
  • Ten Strategies of a World-Class Cybersecurity Operations Center Conveys MITRE’S Expertise on Accumulated Expertise on Enterprise-Grade Computer Network Defense
    Bleed rule--remove from file Bleed rule--remove from file MITRE’s accumulated Ten Strategies of a World-Class Cybersecurity Operations Center conveys MITRE’s expertise on accumulated expertise on enterprise-grade computer network defense. It covers ten key qualities enterprise- grade of leading Cybersecurity Operations Centers (CSOCs), ranging from their structure and organization, computer MITRE network to processes that best enable effective and efficient operations, to approaches that extract maximum defense Ten Strategies of a World-Class value from CSOC technology investments. This book offers perspective and context for key decision Cybersecurity Operations Center points in structuring a CSOC and shows how to: • Find the right size and structure for the CSOC team Cybersecurity Operations Center a World-Class of Strategies Ten The MITRE Corporation is • Achieve effective placement within a larger organization that a not-for-profit organization enables CSOC operations that operates federally funded • Attract, retain, and grow the right staff and skills research and development • Prepare the CSOC team, technologies, and processes for agile, centers (FFRDCs). FFRDCs threat-based response are unique organizations that • Architect for large-scale data collection and analysis with a assist the U.S. government with limited budget scientific research and analysis, • Prioritize sensor placement and data feed choices across development and acquisition, enteprise systems, enclaves, networks, and perimeters and systems engineering and integration. We’re proud to have If you manage, work in, or are standing up a CSOC, this book is for you. served the public interest for It is also available on MITRE’s website, www.mitre.org. more than 50 years.
    [Show full text]
  • Detecting and Tolerating Byzantine Faults in Database Systems Benjamin Mead Vandiver
    Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2008-040 June 30, 2008 Detecting and Tolerating Byzantine Faults in Database Systems Benjamin Mead Vandiver massachusetts institute of technology, cambridge, ma 02139 usa — www.csail.mit.edu Detecting and Tolerating Byzantine Faults in Database Systems by Benjamin Mead Vandiver Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2008 c Massachusetts Institute of Technology 2008. All rights reserved. Author.............................................................. Department of Electrical Engineering and Computer Science May 23, 2008 Certified by.......................................................... Barbara Liskov Ford Professor of Engineering Thesis Supervisor Accepted by......................................................... Terry P. Orlando Chairman, Department Committee on Graduate Students 2 Detecting and Tolerating Byzantine Faults in Database Systems by Benjamin Mead Vandiver Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2008, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Abstract This thesis describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares answers from queries and updates on multiple replicas which are off-the-shelf database systems, to provide a single database that is Byzantine fault tolerant. The scheme works when the replicas are homogeneous, but it also allows heterogeneous replication in which replicas come from different vendors. Heteroge- neous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures.
    [Show full text]
  • Fault Tolerance Techniques for Scalable Computing∗
    Fault Tolerance Techniques for Scalable Computing∗ Pavan Balaji, Darius Buntinas, and Dries Kimpe Mathematics and Computer Science Division Argonne National Laboratory fbalaji, buntinas, [email protected] Abstract The largest systems in the world today already scale to hundreds of thousands of cores. With plans under way for exascale systems to emerge within the next decade, we will soon have systems comprising more than a million processing elements. As researchers work toward architecting these enormous systems, it is becoming increas- ingly clear that, at such scales, resilience to hardware faults is going to be a prominent issue that needs to be addressed. This chapter discusses techniques being used for fault tolerance on such systems, including checkpoint-restart techniques (system-level and application-level; complete, partial, and hybrid checkpoints), application-based fault- tolerance techniques, and hardware features for resilience. 1 Introduction and Trends in Large-Scale Computing Sys- tems The largest systems in the world already use close to a million cores. With upcoming systems expected to use tens to hundreds of millions of cores, and exascale systems going up to a billion cores, the number of hardware components these systems would comprise would be staggering. Unfortunately, the reliability of each hardware component is not improving at ∗This work was supported by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. 1 the same rate as the number of components in the system is growing. Consequently, faults are increasingly becoming common. For the largest supercomputers that will be available over the next decade, faults will become a norm rather than an exception.
    [Show full text]