An Empirical Approach to Improve the Quality and Dependability of Critical Systems Engineering

Nuno Pedro de Jesus Silva An Empirical Approach to Improve the Quality and Dependability of Critical Systems Engineering PhD Thesis in Doctoral Program in Information Science and Technology, supervised by Professor Marco Vieira and presented to the Department of Informatics Engineering of the Faculty of Sciences and Technology of the University of Coimbra August 2017 An Empirical Approach to Improve the Quality and Dependability of Critical Systems Engineering Nuno Pedro de Jesus Silva Thesis submitted to the University of Coimbra in partial fulfillment of the requirements for the degree of Doctor of Philosophy August 2017 Department of Informatics Engineering Faculty of Sciences and Technology University of Coimbra This research has been developed as part of the requirements of the Doctoral Program in Information Science and Technology of the Faculty of Sciences and Technology of the University of Coimbra. This work is within the Dependable Systems specialization domain and was carried out in the Software and Systems Engineering Group of the Center for Informatics and Systems of the University of Coimbra (CISUC). This work was supported financially by the Top-Knowledge program of CRITICAL Software, S.A. and carried out partially in the frame of the European Marie Curie Project FP7-2012-324334-CECRIS (Certification of CRItical Systems). This work has been supervised by Professor Marco Vieira, Full Professor (Professor Catedrático), Department of Informatics Engineering, Faculty of Sciences and Technology, University of Coimbra. iii iv “Safety is not an option!” James H. Miller Miller, James H. "2009 CEOs Who "Get It"" Interview. Safety and Health Magazine.” February 1, 2009. Accessed October 13, 2016. http://www.safetyandhealthmagazine.com/articles/2009-ceos-who-get-it-16. and “Failure is not an option” Bill Broyles, attributed to Apollo 13 crew of NASA v vi To my family, to my friends vii viii Abstract Critical systems, such as space, railways and avionics systems, are developed under strict requirements envisaging high integrity in accordance to specific standards. For such software systems, generally an independent assessment is put into effect (as a safety assessment or in the form of Independent Software Verification and Validation - ISVV) after the regular development lifecycle and V&V activities, aiming at identifying and correcting residual faults and raising confidence in the software. These systems are very sensitive to failures (they might cause severe impacts), and even if they are today reaching very low failure rates, there is always a need to guarantee higher quality and dependability levels. However, it has been observed that there are still a significant number of defects remaining at the latest lifecycle phases, questioning the effectiveness of the previous engineering processes and V&V techniques. This thesis proposes an empirical approach to identify the nature of defects (quality, dependability, safety gaps) and, based on that knowledge, to provide support to improve critical systems engineering. The work is based on knowledge about safety critical systems and how they are specified/developed/validated (standards, processes and techniques, resources, lifecycles, technologies, etc.). Improvements are obtained from an orthogonal classification and further analysis of issues collected from real systems at all lifecycle phases. Such historical data (issues) have been studied, classified and clustered according to different properties and taking into account the issue introduction phase, the involved techniques, the applicable standards, and particularly the root causes. The identified improvements shall be reflected in the development and V&V techniques, on resources training or preparation, and drive standards modifications or adoption. The first and more encompassing contribution of this work is the definition of a defects assessment process that can be used and applied in industry in a simple way and independently from the industrial domain. The process makes use of a dataset collected from existing issues reflecting process deficiencies, and supports the analysis of these data towards identifying the root causes for those problems and defining appropriate measures to avoid them in future systems. As part of the defect assessment process activities, we propose an adaptation of the Orthogonal Defect Classification (ODC) for critical issues. In practice, ODC was used as an initial classification and then it was tuned according to the gaps and difficulties found during the initial stages of our defects classification activities. The refinement was applied on the defect types, triggers and impacts. Improved taxonomies for these three parameters are proposed. A subsequent contribution of our work is the application and integration of a root cause analysis process to show the connection of the defects (or issue groups) with the ix engineering properties and environment. The engineering properties (e.g. human and technical resources properties, events, processes, methods, tools and standards) are, in fact, the principal input for the classes of root causes. A fishbone root cause analysis was proposed, integrated in the process and applied to the available dataset. A practical contribution of the work comprises the identification of a specific set of root causes and applicable measures to improve the quality of the engineered systems (removal of those causes). These root causes and proposed measures allow the provision of quick and specific feedback to the industrial engineering teams as soon as the defects are analyzed. The list/database has been compiled from the dataset and includes the feedback and contributions from the experts that responded to a process/framework validation survey. The root causes and the associated measures represent a valuable body of knowledge to support future defects assessments. The last key contribution of our work is the promotion of a cultural change to appropriately make use of real defects data (the main input of the process), which shall be appropriately documented and easily collected, cleaned and updated. The regular use of defects data with the application of the proposed defects assessment process will contribute to measure the quality evolutions and the progress of implementation of the corrective actions or improvement measures that are the essential output of the process. Keywords: Orthogonal defect classification, critical systems, defect, classification; root cause analysis, dependability, failure, safety. x Resumo Os sistemas críticos, tais como os sistemas espaciais, ferroviários ou os sistemas de aviónica, são desenvolvidos sob requisitos estritos que visam atingir alta integridade ao abrigo de normas específicas. Para tais sistemas de software, é geralmente aplicada uma avaliação independente (como uma avaliação de safety ou na forma de uma Verificação e Validação de Software Independente - ISVV) após o ciclo de desenvolvimento e as respetivas atividades de V&V, visando identificar e corrigir falhas residuais e aumentar a confiança no software. Estes sistemas são muito sensíveis a falhas (pois estas podem causar impactos severos), e apesar de atualmente se conseguir atingir taxas de falhas muito baixas, há sempre a necessidade de garantir a maior qualidade dos sistemas e os maiores níveis de confiabilidade. No entanto, observa-se que ainda existe um número significativo de defeitos que permanecem nas últimas fases do ciclo de desenvolvimento, o que nos leva a questionar a eficácia dos processos de engenharia usados e as técnicas de V&V aplicadas. Esta tese propõe uma abordagem empírica para identificar a natureza dos defeitos (de qualidade, confiabilidade, lacunas de safety) e com base nesse conhecimento proporcionar uma melhoria da engenharia de sistemas críticos. O trabalho é baseado em conhecimento sobre os sistemas críticos e na forma como estes são especificados / desenvolvidos / validados (normas, processos e técnicas, recursos, ciclo de vida, tecnologias, etc.). As recomendações de melhorias para os sistemas críticos são obtidas a partir de uma classificação ortogonal e posterior análise de dados de defeitos obtidos de sistemas reais cobrindo todas as fases do ciclo de vida. Estes dados históricos (defeitos) foram estudados, classificados e agrupados de acordo com diferentes propriedades, considerando a fase de introdução do defeito, as técnicas envolvidas, as normas aplicáveis e, em particular, as possíveis causas fundamentais (ou raiz). As melhorias identificadas deverão refletir-se nas técnicas de desenvolvimento / V&V, na formação ou preparação de recursos humanos e orientar alterações ou adoção de normas. A primeira e mais abrangente das contribuições deste trabalho é a definição de um processo de avaliação de defeitos que pode ser usado e aplicado na indústria de forma simples e independente do domínio industrial. O processo proposto baseia-se na disponibilidade de um conjunto de dados de problemas que refletem deficiências de processo de desenvolvimento e suporta a análise desses dados para identificar as suas causas raiz e definir medidas apropriadas para evitá-los em sistemas futuros. Como parte das atividades do processo de avaliação de defeitos, é proposta uma adaptação da Classificação Ortogonal de Defeitos (ODC) para sistemas críticos. Na prática, a ODC foi usada como uma classificação inicial e depois ajustada de acordo com as lacunas e dificuldades encontradas durante os estágios iniciais das atividades de classificação de defeitos. O

An Empirical Approach to Improve the Quality and Dependability of Critical Systems Engineering

Root Cause Analysis in Context of WHO International Classification for Patient Safety

Guidance for Performing Root Cause Analysis (RCA) with Pips

Root Cause Analysis in Surgical Site Infections (Ssis) 1Mashood Ahmed, 2Mohd

Root Cause Analysis: the Essential Ingredient Las Vegas IIA Chapter February 22, 2018 Agenda • Overview

Root Cause Analysis in Health Care Tools and Techniques

Statistical Process Control, Part 5: Cause-And-Effect Diagrams

Mistakeproofing the Design of Construction Processes Using Inventive Problem Solving (TRIZ)

Extending the A3: a Study of Kaizen and Problem Solving

Application of Root Cause Analysis in Improvement of Product Quality and Productivity

Root Cause Analysis and Corrective Action

The Mapping Tree Hierarchical Tool Selection and Use

Finding and Eliminating the Root Cause of Defects Identified in Customer Complaints a Case Study Using the Six Sigma Methodology to Facilitate Root Cause Analyses