<<

Toward Improved Traceability of Safety Requirements and State-Based Design Models

A Dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

DOCTOR OF PHILOSOPHY

at the

UNIVERSITY OF CINCINNATI

COLLEGE OF ENGINEERING AND APPLIED SCIENCE

March. 2021

by Mounifah Alenazi

M.Sc. Kennesaw State University May. 2016

Thesis advisor and Committee chair:

Nan Niu, Ph.D Abstract

Traceability has long been recognized as an important in building safety critical systems. Traceability therefore is often required by many government regulations. For example, the Federal Aviation Administrations (FAA) standard DO-178B specifies that developers must be able to demonstrate traceability of designs against requirements. In projects, the development of complex and dependable systems like autonomous vehicles relies increasingly on the use of the Language (SysML). In fact, SysML has become a de facto standard for systems engineering. Effective traceability in such systems can be very costly and difficult. Researchers have therefore proposed many techniques to automatically establish and evolve trace links for high assurance projects. Various research approaches use information retrieval-based tracing methods to automatically recover trace links between modeling artifacts. For example, to verify a safety requirement, a query is used to retrieve the related elements in the design models. Our ability to trace is therefore anchored to the ability to retrieve. While trace retrieval has been the predominant way of automatically creating links, the performance is yet to be satisfactory for broad industrial adaption, and many false positives remain a significant challenge. In this thesis, we present a novel approach that overcomes this challenge. In particular, the work in this thesis has three main objectives. The first is to identify and address the research challenges of identifying trace links in the context of SysML models. For this objective, we empirically investigate if traditional traceability approaches using textual information could yield promising results in our context. We also conduct a comprehensive investigation of traceability features within state-of-the- practice SysML modeling tools to understand how the traceability information is iii supported and managed in these tools. The second objective is to leverage mutation analysis and process mining to verify safety requirements. For this objective, we first carry out a systematic mapping study to identify the common modeling mistakes in SysML. Our goal is to understand the scope of these mistakes (the incorrect links), their types, the implications of those mistakes in model-driven , and then use these mistakes as a basis to identify mutation operators. Once the mutants are created, they undergo model checking so as to automatically verify the safety requirements. Building this foundation is a necessary step that facilitates the third objective which is to tackle false positives that have plagued automated requirements traceability. Rather than striving for defining an accurate tracing mechanism which often ends up with many imperfect links, our core idea is to exploit the mutants (imperfect tracing targets) and then take full advantage of them to discover the traceability links. Checking the requirements over the mutants leads to the distinction between killed and survived mutants. We leverage the underlying killed-survived distinction and develop a correlation analysis procedure to identify the traceability links. The results show considerable precision improvements compared with the state-of-the-art.

v Acknowledgements

First and foremost, I would like to express my deepest appreciation to my advisor Dr. Nan Niu for his support and enthusiastic encouragement throughout my graduate studies. I could not have finished this dissertation without his continuous guidance. Working under his supervision has been an unforgettable learning experience for me. I am grateful for the tremendous amount of time and effort he devoted to not only discussing my ideas, providing feedback and suggestions, collaborating with me, and celebrating our achievements, but also allowing to present our work at top-tier conferences and meet very well-known researchers in our field. Despite his busy schedule, he was always available and generous in sharing his experiences on academic life and beyond. Dr. Niu set a great example for me as a great mentor and research supervisor. I am very grateful to my committee members Raj Bhatnagar, Chia Yung Han, Carla Purdy, as well as my external examiner Gunter Mussbacher for serving in my committee and giving valuable and constructive comments. I thank Professors Dan Lo, Michael Franklin, Frank Tsui from KSU and George Purdy From UC for their impact on my academic life. It was an honor to be one of their students. Their impact will last forever. I would like to thank all the members of our lab, especially, Wentao, Hemanth, Rue, Zedong, and Xuanyi for collaborations, discussions, and friendships. I have always enjoyed our conversations. Special thanks to Abhijith for the good discussions and feedback. I also thank my best friends Asma, Khitam, Mona, and Fatma for the great times we spent together. I thank my country, the Kingdom of Saudi Arabia, for supporting me and my family throughout my graduate studies. I would like also to thank the University vi of Cincinnati for the UGS Award and for providing me the opportunity to pursue my doctoral studies. Finally, I would like to thank my parents, my brothers and sisters for their support and love and for always praying for me. My special thanks go to my dear husband Fahad for all his support and encouragement throughout these years. Thanks for your understanding and sacrifice. Thanks for helping me achieve my dream. Thanks to my kids, Faisal, Osama, and Raed. You have been my motivation, my inspiration and drive. This dissertation is dedicated to you. vii

Contents

Abstract...... ii List of Figures...... viii List of Tables...... xi

1 Introduction1 1.1 Motivation...... 1 1.2 Scope...... 2 1.3 Thesis Contribution...... 7 1.4 Thesis Organization...... 8

2 Background and Related Work 10 2.1 Systems (SysML)...... 10 2.2 Traceability...... 13 2.3 Mutation Analysis...... 20 2.4 Summary...... 23

3 Assuring Safety Requirements Using Textual Information 24 3.1 Introduction...... 25 3.2 Experimental Setup...... 26 3.3 Subject System...... 28 viii 3.4 Results and Analysis...... 35 3.5 Discussion...... 38 3.6 Summary...... 40

4 SysML Modeling Mistakes: A Systematic Literature Mapping 42 4.1 Introduction...... 42 4.2 Related Work...... 45 4.3 Mapping Study Design...... 47 4.4 Results and Analysis...... 52 4.5 Concluding Remarks...... 65 4.6 Summary...... 67

5 Tracing Safety Requirements and State-Based Design Models 68 5.1 Introduction...... 68 5.2 Running Example...... 71 5.3 Traceability Information Model...... 75 5.4 Mutation-Driven Traceability...... 78 5.5 Experimental Evaluation...... 88 5.6 Summary...... 96

6 Conclusions and Future Work 97 6.1 Thesis Summary...... 97 6.2 Limitations...... 99 6.3 Future Directions...... 100

Bibliography 101 ix

List of Figures

2.1 SysML and their relationships with UML 2 (adapted from [52])...... 11 2.2 Example of a simple traceability tree...... 14 2.3 Integration analysis of SysML and model checking (adapted from [149])...... 23

3.1 Transmission Control Module (TCM) [138]...... 29 3.2 Fault Tree Analysis...... 30 3.3 User Interface of the V-PLC [138]...... 30 3.4 F2 metric for similarity measures...... 37 3.5 Integrating a virtual PLC in SysML models adapted from [21].... 38 3.6 Fault Tree Analysis Example...... 39 3.7 Tree-based coverage for similarity measure S1 (left) and S2 (right) 40

4.1 SysML reviewed in our study (Figure 13 in PS2 ). 51 4.2 Distribution of the 42 SysML mistake types...... 53 4.3 Observability of the 42 mistakes in SysML models...... 55 4.4 SysML diagrams and mistake types...... 57 4.5 SysML diagrams and mistake observability...... 58 x 4.6 SysML mistakes’ impacts on requirements...... 62 4.7 Illustration of SysML mistakes’ impacts on requirements...... 63

5.1 State machine diagram (SMD) of the water distiller example (adapted from [51])...... 72 5.2 Traceability information contextualizing the artifacts and relations relevant to our approach...... 75 5.3 Overview of our mutation-driven traceability approach where mu- tants are created by modifying the tracing target in small ways to mimic typical modeling errors ( 1 ); mutants are then model checked ( 2 ) to identify the slice-trace ( 3 )...... 78 5.4 Event log snippet showing: (1) the SMD of Figure 5.1 (top records whose case ID=“original”), (2) the mutant resulted from flipping t5 (shaded records whose case ID=“mo4 t5”), and (3) the syntactic change of t5 flipping (dotted box)...... 82 5.5 Correlation analysis of the running example’s SMD mutants (black cell shows the correlation is unknown)...... 86 5.6 One SMD design of the adaptive cruise control (ACC) under our study...... 89 5.7 Calibrating threshold K of our approach...... 93 5.8 Ablation results of removing one and only one category of mutation operators...... 95 xi

List of Tables

2.1 Taceability Support within SysML Development Environments: Trace link types (source → target). (1) Requirements → use cases. (2) Use cases → implementation models. (3) Functional requirements → SysML models. (4) Nonfunctional requirements → SysML models. (5) Requirements → SysML models. (6) Requirements → test cases. (7) High-level SysML designs → low-level SysML models. Tasks: (a) simulating, (b) navigation, () reporting, and (d) traceability editing. 19

3.1 NLP Similarity Measures Results before Query Expansion..... 36 3.2 NLP Similarity Measures Results after Query Expansion...... 36

4.1 Software defect classifications...... 45 4.2 Primary studies listed chronologically (year of publication) and then within the same year alphabetically (first author)...... 49 4.3 Mistakes mentioned in more than one primary study...... 53 4.4 Evidence levels of the primary studies...... 59 4.5 Mistakes reported in industrially relevant primary studies...... 60

5.1 State Machine Diagram (SMD) Mutation Operators...... 81 xii 5.2 Mutation Operators of Table 5.1 Grounded in the Literature of SMD Modeling...... 82 5.3 Subject System Characteristics (integers represent total numbers whereas decimal numbers represent the averages)...... 89 5.4 Tracing Accuracy (BS refers to backward slicing [81], FS refers to forward slicing [103], and MD refers to our mutation-driven approach) 92 1

Chapter 1

Introduction

1.1 Motivation

Requirements traceability refers to the ability to describe and follow the life of a requirement [59]. In systems engineering projects, the development of complex and dependable systems like autonomous vehicles relies increasingly on the use of Systems Modeling Language (SysML). In fact, SysML has become a de facto standard for systems engineering [132]. Effective traceability in such systems can be very costly and difficult as inspectors may have to browse through the models and manually analyze large numbers of links between safety requirements and design models. Various research approaches use information retrieval (IR)-based tracing to automatically recover trace links between modeling artifacts (e.g., a safety requirement and design elements). For example, to verify a safety requirement, a query is used retrieve the related elements in the design models. Our ability to trace is therefore anchored to the ability to retrieve. While trace retrieval has been the predominant way of automatically creating links [20, 33] the performance 2 is yet to be satisfactory for broad industrial adaption, and many false positives remain a significant challenge. In most cases, automated trace retrieval methods can achieve 100% recall, but very low precision rates of 10-30% [31]. In this thesis, we present a novel approach to tackling the false positives that have plagued automated traceability research for decades [15, 20, 32, 68, 69, 71, 109, 116]. The idea is to intentionally generate many false positives from a state-based model (i.e., the tracing target) and then to check whether a safety requirement (i.e., the tracing source) is met in them in order to find the actual trace. Our key insight is that false positives are “close” to the model elements in the real trace, but that “closeness” turns out to be quite faulty after some tracing is done. This faulty closeness is a main reason causing false positives to be mingled with real elements, hurting precision. If we can exploit faulty closeness before tracing, then this proactive approach will provide new capabilities of addressing the low precision challenge of automated traceability. In this dissertation, we introduce new techniques to automatically verify and trace safety requirements at the modeling level. These techniques are based on mutation analysis and process mining.

1.2 Scope

Traceability is a key activity in requirements engineering that enables the rea- soning of the fulfillment of a particular requirement or a set of requirements and is required for certification in most safety critical systems (e.g., DO-178B). Recent effort has been made in researching specific areas of the traceability challenges, laid out in a traceability roadmap on “software traceability: trends and future 3 directions” [32], such as trace link creation, visualization, and maintenance. Tra- ditional approaches focus mainly on textual software artifacts, e.g., identifying a trace link from a source (a requirement in its textual form) and a target artifact (source code). While the textual information is helpful in identifying the candidate traceability links, the scope of applicability of IR-based trace recovery techniques is the set of systems which are “textually rich”. However, systems engi- neering projects have become “model rich”. System modeling reduces ambiguity, misunderstanding, and misinterpretation of system specifications. In fact, the main objective of the International Council on Systems Engineering Vision 2020, is to replace a document-centric approach through a model-centric one, i.e., shifting the records of authority from documents to digital models which enables engineering from multiple domains to easily capture requirements, understand design change impacts,, and analyze system design before it is built [66]. Recently, Holtmann et al. [72] claim that researchers and practitioners lack a concise terminology to discuss aspects of requirements traceability in which engineers rely on models. In this thesis, we take an incremental approach to identify and address the research challenges of identifying trace links in the context of SysML models. We first conduct a comprehensive investigation of traceability features within state-of- the-practice SysML modeling tools to understand how the traceability information is supported in these tools. Second, since textual information has been recog- nized as an important factor for automatically recovering trace links in , we empirically investigate if traditional traceability approaches us- ing textual information could yield promising results in our context, i.e., where model-driven engineering practices such as SysML models are adopted. The use of textual information to identify traceability links has been attempted in several stud- 4 ies [16, 50, 126, 131]. The reason is that most of the models and diagrams contain textual information. Therefore, artifacts having high textual similarity probably share several concepts, so they are likely good candidates to be traced from one another [93]. Nejati et al. [103] proposed an approach to automatically identifying the impact of requirements changes on system design when the requirements and design elements are expressed in SysML models. Two main steps are involved in their approach: extracting a set of impacted elements by computing reachability analysis and ranking the resulting set using NLP techniques. Motivated by their work, we adapted NLP techniques to empirically investigate if the trace links using textual information can assist in verifying safety requirements under test. We then integrate the concept of obstacle analysis to recover situations in which a safety requirement will not be satisfied. We use a virtual Programmable Logic Controller (PLC) in the context of SysML as our subject system. The results show that textual information gives only 59% assurance using F2-measure (precision of 31%). From the obtained results, we observed the following: (1) existing tools lack the ability to trace a safety requirement to specific modeling elements (e.g., a state). That is, the scope of tracing within a modeling diagram (e.g., state machine) is generally missing, (2) a cutoff point in the ranked list must be provided to filter out irrelevant elements that are not required for inspection, (3) as precision is very low, inspector must provide additional queries or keywords to retrieve the actually impacted elements, and (4) the ranked list of the impacted elements does not take into consideration the relationships between these elements. The proposed techniques using textual cues support “after-the-fact” tracing and strive to retrieve all the correct links and only the correct ones, i.e., to achieve both recall and precision at a 100% level. Therefore, our ability to trace is anchored to 5 the ability to retrieve. To overcome this challenge, we look at the problem from a different angle; rather than striving for defining an accurate tracing mechanism which often ends up with many imperfect links, our core idea is to create many imperfect tracing targets and then take full advantage of them to discover the links. Our proposed solution contains three main components: creating mutants, verifying model mutants, and identifying trace links. Creating mutants. Our work leverages mutation analysis. Mutation analysis is commonly used as a fault-based technique. Given a program, mutants are created by simple changes that are intended to represent the mistakes often made by programmers. Extending mutation analysis, we carry out a systematic mapping study to identify the common modeling mistakes in SysML. Our objective is to collect evidence to not only understand the defects in SysML models presented in the contemporary literature, but to do so with an explicit emphasis on practice, real-world relevance, and industrial readiness. We discuss also the impacts of the mistakes in model-driven requirements engineering. We use these mistakes as a basis to identify mutation operators. Similar to mutating a program, the identified mutation operators are syntactic modifications of the design model (adding, changing, or deletion). Different from mutating a program that is textual, we automatically mutate the graphical state machine diagram by first exporting the model into an xmi file. As a result, our approach takes as input state machines in xmi format, which is exported from existing SysML tools. We perform this step in the Cameo System Modeler tool [110]. Verifying model mutants. Once the mutants are created, they undergo model checking so as to automatically verify the safety requirements. An innovative aspect of our approach is to leverage Linear Temporal Logic (LTL) model checking 6 within process mining (i.e., the ProM tool [123, 147]). Process mining employs data mining algorithms to extract operational knowledge from logs [146]. These event logs record instances (or cases) of some underlying process, but automatically extracting that process is difficult when there is a lot of flexibility [147]. Event logs can also be used for verification. Van der Aalst et al. [147] develop a verification technique based on event logs for business process. Specifically, their approach uses an extension of LTL and combines this with a standard XML format to store event logs. The language is developed to formulate properties in the context of event logs. Given an event log, this LTL extension can assist in verifying certain properties. Our model mutants are a good fit to process mining in that flexibility of each mutant is restricted to a single, simple, and syntactic change over the original design model. Once the safety properties are met, it becomes valuable to identify the trace links that lead to their satisfaction. Maintaining these traces can be helpful in several tasks. For example, inspectors or design reviews can use these traces to perform impact analysis for change requests and to conduct different types of coverage analysis, e.g., test cases implementation. This is reflected in CoEST’s definition of traceability: “the ability to interrelate any uniquely identifiable software artifact to any other, maintain required links over time, and use the resulting network to answer questions of both the software product and its development process” [33]. It is impractical to maintain trace links manually during software development and maintenance [115]. Developers tend to not preform this activity to an appropriate level of detail. Therefore, traceability information becomes out of date or it is completely missing during software development. Identifying trace links. Checking the requirements over the mutants leads to 7 the distinction between the killed and survived mutants. We leverage the underlying killed-survived distinction, and develop a correlation analysis procedure to iden- tify the traceability links. The results show considerable precision improvements compared with the state-of-the-art.

1.3 Thesis Contribution

Our ultimate goal is to design a framework to improve the tracing results of safety requirements in the context of SysML modeling. The main contributions of this dissertation can be summarized as follows:

• We present a new method using the simulation log of an executed SysML model, and empirically investigate if the trace links using the textual infor- mation can assist in verifying safety requirements.

• We integrate Fault Tree Analysis (FTA) for assessing the completeness of the retrieved trace results.

• We conduct a comprehensive investigation of traceability features within state-of-the-practice tools to understand how the traceability information is supported in these tools.

• We conduct a systematic literature mapping of SysML common modeling defects and use them as a basis to define mutation operators.

• We create a tool that automatically simulates these mistakes and integrate mutation analysis and process mining-based model checking to verify safety requirements. 8 • We present a novel algorithm based on the correlation analysis derived from mutation analysis and process mining to identify the traceability links.

The overall contribution of the thesis is the development of a new approach to automatically verify and improve tracing of safety requirements in the context of SysML modeling.

1.4 Thesis Organization

The remainder of this thesis is organized as follows: Chapter2 We review the necessary background information on SysML, traceability, and mutation analysis. A comprehensive investigation of traceability features within state-of-the-practice tools is also presented in this chapter. Chapter3 In this chapter, we take an incremental approach to identify and address the research challenges of identifying trace links in the context of SysML models by empirically investigating if traditional traceability approaches, e.g., NLP methods could yield promising results. In particular, we empirically investigate if the trace links using the textual information in the simulation logs from SysML models can assist in verifying safety requirements.

Chapter4 In this chapter, we present a systematic literature mapping of SysML modeling defects. Our objective is to collect evidence to not only understand the defects in SysML models presented in the contemporary literature, but to do so with an 9 explicit emphasis on practice, real-world relevance, and industrial readiness. Chapter5 Building upon the results obtained from our mapping study, we create a tool that automatically simulates these defects and then develop a novel approach to trace safety requirements by leveraging mutation analysis and process mining. Chapter6 Finally, in Chapter6 we provide a summary of the thesis and discuss the findings. Moreover, directions for future research are outlined. 10

Chapter 2

Background and Related Work

In this chapter, we review the necessary background information on SysML mod- eling (Section 2.1), traceability (Section 5.2), and mutation analysis (Section 5.3).

2.1 Systems Modeling Language (SysML)

Model-Based Systems Engineering (MBSE), outlined in the International Coun- cil on Systems Engineering (INCOSE) Vision 2020, is a methodology of modeling to support systems requirements, design, analysis, and verification and validation activities starting from the conceptual design phase and continuing through de- velopment and later lifecycle phase [36]. The overall objective of the vision is to replace a document-centric approach through a model-centric one, i.e., shifting the records of authority from documents to digital models such as UML and SysML, which enables engineering from multiple domains to easily capture requirements, understand design change impacts, define traceability paths, and analyze system design before it is built [66]. SysML, first adopted by the (OMG) in 2006, is 11

SysML Diagram

Behavior Requirement Structure Diagram Diagram Diagram

Activity Sequence State Machine Block Definition Internal Block Diagram Diagram Diagram Diagram Diagram Diagram

Same as UML 2 Parametric Diagram Modified from UML 2

New diagram type

Figure 2.1: SysML diagrams and their relationships with UML 2 (adapted from [52]). a visual modeling language designed to provide simple but powerful constructs for modeling a wide range of systems engineering problems [113]. It supports the specification, analysis, design, verification and validation of complex systems that include components for hardware, software, data, personnel, procedures, and facilities. SysML extends UML 2, which tends to be software-centric. It reuses seven of UML 2’s fourteen diagrams, and adds two new diagrams (requirement and parametric diagrams) for a total of nine diagram types. Figure 2.1 shows these diagrams, which cover four main perspectives of systems modeling:

• Behavior: The behavior diagrams include the , , , and state machine diagram. A use case diagram provides a high-level description of functionality that is achieved through user interactions with systems or system parts. The activity diagram represents the flow of data and control between activities. A sequence diagram represents the interaction between collaborating constituencies of a system. The state machine diagram describes the state transitions and actions that a system or 12 its parts perform in response to events.

• Structure: The system structure can be represented by block definition diagram which describes the of system, subsystems, and all the system elements. The structure can also be shown in internal block diagram which depicts system parts, ports, and connectors. Finally, the diagram is used to organize the dependencies between the components that make up the system.

• Parametric: The parametric diagram represents constraints on system property values such as performance, reliability, and mass properties. It serves as a means to integrate the specification and design models with engineering analysis models.

• Requirements: The requirement diagram captures requirements and the derivation, satisfaction, verification, and refinement relationships. The relationships provide the capability to relate requirements to one another and to trace requirements to system design models and test cases.

Compared to UML 2, a couple of SysML features offers advantages for systems engineers [106]: (1) SysML reduces the bias of UML toward software since SysML’s semantics are more flexible and expressive. In particular, UML classes are replaced with blocks in SysML. A block is a modular unit of structure in SysML that is used to define physical entities (e.g., system, system component part, external systems, or items that flow through the system), as well as conceptual entities or logical abstractions. (2) The built-in requirement diagram allows for natural language requirements to be modeled and traced throughout the system’s life cycle. 13 2.2 Traceability

In their seminal work, Gotel and Finkelstein [60] defined traceability as “the ability to describe and follow the life of a requirement, in both a forwards and backwards direction (i.e., from its origins, through its development and specification, to its subsequent deployment and use, and through all periods of on-going refinement and iteration in any of these phases).” Figure 2.2 shows what artifacts are traced in systems development. For example, the requirement RQ1 in Figure 2.2 is realized by the design model D1.1, implemented by the source code SC1.1.1 and SC1.1.2, and tested by the test cases TC1.1.1.1 and TC1.1.1.2. Since there are many artifacts to trace, it is important to clearly define a traceability information model (TIM). Having a well-defined TIM model is crucial to the development of safety-critical systems. A TIM explicitly records what artifacts are important and what others are not under the current traceability consideration. In other words, TIM defines the relationships between the different artifacts created during system development [95]. As claimed by Maro et al. [95], practitioners lack concrete guidelines when creating TIMs especially in large systems engineering companies, e.g., TIM is not formally documented. Traceability is essential for assuring software and systems are safe to use. However, gaps exist between what is prescribed by the safety regulations/guidelines and how traceability is implemented in practice [125]. To reduce manual effort, various approaches have been proposed based on different techniques such as information retrieval, machine learning, code analysis, and probabilistic models to generate candidate trace links. The use of IR methods has received much attention to automatically recover the traceability information within a software project [41, 71, 118, 152]. The 14

TC1.1.1.1 SC1.1.1 D1.1 TC1.1.1.2 RQ1 SC1.1.2 D2.2 TC1.1.2.3 RQ2 SC2.2.3 D3.3 TC2.2.3.4 SC3.3.4 RQ3 D3.4 TC3.3.4.5 SC3.4.5

TC3.4.5.6

......

Requirements Design models Source code Test cases

Figure 2.2: Example of a simple traceability tree. idea of framing requirement traceability as an IR problem was first introduced by Hayes et al. [70]. However, in most cases, automated trace retrieval methods can achieve 100% recall, but very low precision rates of 10-30%, and significant human effort is needed to evaluate and filter the results. Thus, researchers have proposed different approaches to improve IR techniques. For example, Wang et al. [151] develop a novel algorithm by integrating term-based relevance feedback to the generated candidate trace links. The results show that the algorithm achieves high performance when tracing dependability requirements [150]. De Lucia et al. [40] complement IR methods by the usage of smoothing filter to remove false positive from the textual corpus of artifacts to be traced. The authors apply three different IR methods, namely: Vector Space Models, Latent Semantic Indexing, and Jensen Shannon similarity model. The results show that the proposed approach improve the tracing performance. Machine learning algorithms have been used to generate trace links. Recently, For example, Saini et al. [130], develop a method to extract domain models from 15 requirements expressed in natural text by combining NLP and machine learning. The generated trace links can be used to execute queries on the extracted domain models. Unterkalmsteiner [141] used domain-specific taxonomies to establish early trace links so that they can be available at later stages. The author developed a recommender system that suggests trace links from requirements to a domain- specific taxonomy based on a series of heuristics. However, the results show that both control and treatment group report low confidence on correctness and completeness [141]. Guo and her colleagues [65] created a deep learning tracing architecture by leveraging Word Embedding and Recurrent Neural Network to better capture the requirements semantics in safety-critical domains. In these domains, the interviews by Goodrum et al. [57] with 14 experienced developers and by Chen et al. [26] with nine safety experts advanced our understandings about practitioners’ views and needs of traceability. To address some of the needs, a family of reusable traceability queries was codified [34], a visual language hiding complex details in querying traceability was proposed [89, 90], and new ways of managing safety stories in agile projects were developed [30, 35]. Traditional approaches focus mainly on textual software artifacts, e.g., identi- fying a trace link from a source artifact (a requirement in its textual form) and a target artifact (source code). While the textual information is helpful in identi- fying the candidate traceability links, the scope of applicability of IR-based trace recovery techniques is the set of systems which are “textually rich”. However, systems engineering projects have become “model rich”. Recently, Holtmann et al. [72] claim that researchers and practitioners lack a concise terminology to discuss aspects of requirements traceability in which engineers rely on models. Thus, the authors develop a terminology for model-based traceability that allows requirements 16 engineers and engineers working with models to clear this ambiguity. Czauderna et al. [38] extended the probabilistic network IR method to generate candidate trace links of product or contractual requirements, regulatory codes, and mechatronic models, as long as those models include meaningful textual annotations. To coordinate traces between various mechatronic models, the authors proposed two centralized (push and pull) and one distributed architectures. In the push architecture, an engineer working with an individual model pushes data from that model to a shared repository, and then the data becomes accessible for tracing in other models. In the pull architecture, a single trace engine pulls artifacts from the models via the support of a scheduler defining the events and/or time intervals at which each model’s data is pulled. In the distributed architecture, an individual trace engine is located alongside each model’s tool/environment, and a centralized registry provides minimalistic logic coordination between those engines. Feldmann et al. [49] proposed a rule-based approach where the mechatronic model elements correspondences were built heuristically and stored in the XML Interchange (XMI) files on top of the generic Eclipse Modeling Framework. The stored traceability information could then be used for consistency checking. Model slicing has also been introduced to identify trace links. Model slicing, as program slicing, is proposed to reduce the complexity of large model design and to facilitate understanding and inspection of models. It has many potential implications such as model checking, model comprehension, and model testing. Various model slicing techniques have been proposed. These techniques are used to extract data and control dependency of an element of interest [81], models features [77], or model elements with respect to a given requirement [22]. SysML slicing [22] has been introduced to supports verifications and validations in 17 such that it yields smaller models, which are less expensive to inspect. The authors express the relationship between a safety requirement and design models through the use of traceability information for extracting model slices. Empirical evidence of the usefulness of this technique has been investigated, where results show a significant decrease in effort and an increase in decisions correctness for inspectors [22]. Korel et al. [81] present a technique for slicing state-based models using dependence analysis. The authors show that significant reduction of state-based models could be achieved. Nejati and her colleagues [103] propose a new SysML slicing algorithm based on dependency graphs. To improve the automated traceability, Nejati et al. used reachability analysis to perform forwarding slicing. While the resulting slice (trace) typically bears a recall value close to 100%, the precision level is very low. For example, forward slicing combined with natural language processing achieved the best performance in tracing 16 requirements changes to the design models, and even this best performance had an average precision of only 29.4% [103], meaning that a large number of false positives were generated. The last two approaches, i.e., [81, 103] serve as baselines in our approach. Next, we conduct a comprehensive investigation of traceability features within state-of-the-practice SysML modeling tools to understand how the traceability information is supported in these tools.

2.2.1 Traceability within SysML IDEs

In order to understand how the traceability information is supported in SysML tools1, we surveyed six SysML IDEs [153]. Our tool selection is driven by industry practices and the selection includes both proprietary and open-source tools. We

1Seven PLC tools have also been surveyed [153], however, they are not included in this thesis. 18 relied on product descriptions, market reviews, white papers, case studies, and other online resources for tool selection and traceability survey. We adapt and refine the four lifecycle areas, namely strategizing, creating, maintaining, and using traces [33], to structure our survey of the 6 SysML IDEs (Enterprise Architect, Magic Draw, Rational Rhapsody Architect, Astah SysML, Modelio, and Papyrus).

• Enterprise Architect is a popular tool for industrial automation. It was rated to have best values among the SysML tools [121] and has been used by over 650,000 users. For example, Pantec Automation, a leading house for control solutions, developed a modular design framework based on Enterprise Architect [120].

• Cameo Systems Modeler is a model-based system engineering tool used widely in the transportation, healthcare, aerospace industries, as well as navy defense [111]. As an example, Cameo Systems Modeler assisted Bombardier, the worlds largest manufacturer of both planes and trains, in engineering SysML and other systems models [24].

• Rational Rhapsody Architect is IBMs integrated systems engineering environment that uses UML and SysML for , as well as visual, model-based design. Its industrial users span financial services, communications, distribution, and other domains [73].

• Astah SysML, formerly known as Java and UML Developers Environment (JUDE), was created by Change Vision [25]. It is a lightweight tool for modeling SysML diagrams, e.g., package diagram is not supported, and it also lacks model-based simulation capabilities. 19

Table 2.1: Taceability Support within SysML Development Environments: Trace link types (source → target). (1) Requirements → use cases. (2) Use cases → implementation models. (3) Functional requirements → SysML models. (4) Nonfunctional requirements → SysML models. (5) Requirements → SysML models. (6) Requirements → test cases. (7) High-level SysML designs → low-level SysML models. Tasks: (a) simulating, (b) navigation, (c) reporting, and (d) traceability editing.

Link Creation Link Maintenance Link Usage Tracing Tool Mechanism Form Type Persistency Mechanism Tasks Other Strategy

Enterprise Architect Manual Traceability window; (1) (2) Offline (a) (b) creation Trace/relationship matrix change

Cameo Systems Manual Tabular view; Offline (a) (b)(c) (3) (4) Modeler creation Dependency matrix/list; change comprehensive Relation map trace links stored in Rational Rhapsody Manual Trace matrix (5) (6) Offline (a) some relational Architect creation change form to ensure model consistency (7) Astah SysML Manual Traceability map xml file Offline creation change (7) Modelio Automated Dependency diagram Initiate storage Automated (a) (d) generation updating (1)(5)(6) Papyrus Manual Trace matrix Automated (c) creation updating

• Modelio is an open-source modeling tool that supports UML, SysML, and Java code generation [99]. The INCOSE 2012 Tool Vendor Challenge was solved using Modelio environment and the proposed solution was based on SysML models.

• Papyrus is another open-source tool. In 2015, Papyrus became an Eclipse project aiming to achieve industrial grade. For example, Papyrus was selected by Sherpa as the underlying model-driven engineering platform because of its full coverage of SysML [45].

As mentioned earlier, we concentrate on link creation, maintenance, usage, and tracing strategy. Table 2.1 lists each of the six SysML tools in-place traceability 20 support organized by the four life-cycle areas. As for link creation, we identified 7 types of trace links (source-target), e.g., requirements to use cases, requirements to SysML models. The majority of our surveyed SysML tools store the links as native elements in trace matrix, relation maps, or other relational forms. An example is the lightweight Astah SysML tool that uses a traceability map to place a model at the center of the visualization and further presents coarse-grained trace links between the central model and all of the related models and diagrams. For link usage and strategy, we observed four tasks: simulation, navigating, reporting, and trace editing. Simulation plays an important role in SysML development environments. More than half of our surveyed SysML IDEs offer such support, e.g., simulation of activity diagram. Ports in SysML facilitate the simulation feature. SysML supports two ports. Full port represents a physical access point (e.g., a mechanical interface,) while proxy port exposes the parts visible to external connectors. The traceability strategy of our surveyed SysML IDEs is to store links in some relational form (e.g., matrix) to ensure model completeness and consistency. From the obtained results of our survey study, we observed that existing tools lack the ability to trace a safety requirement to specific modeling elements (e.g., a state). That is, the scope of tracing within a modeling diagram (e.g., state machine) is generally missing.

2.3 Mutation Analysis

In mutation analysis, faults are automatically seeded into the software artifacts (in most cases, the source code), and the survey by Jia and Harman [76] provides evidence of applicability and maturity for the technique used in software testing. Two main applicabilities exist: measuring a test set’s ability to detect faults and 21 generating additional test cases. In both cases, the distinction between killed and survived mutants is important. If the result of testing a mutant is different from the result of testing the original program, then the mutant is classified as killed; otherwise, it is survived. A test set’s effectiveness can then be scored on the proportion of the killed mutants, and additional test cases can be generated to kill the survived mutants. Thus, mutation testing plays an important role in the verification and validation of systems because of its ability to demonstrate the absence of certain faults. At model levels, researchers have also applied mutation analysis to reveal faults and to guide the generation of new test cases [2, 12, 64, 139]. Aichernig et al. [82] have made the first attempt of generating test cases from mutated UML models. The authors develop a model-based mutation testing approach called MoMuT::UML for generating mutants based on a test model of UML state machine. The main goal is to generate test cases that reveal the injected faults. The tool compares the behavior of the original model with each of the mutants and once a difference is found, a is created. Sun et al. [140] apply mutation testing on UML activity diagrams and also generate test cases that reveal defects in the model. Granda et al. [62] develop a mutation tool for based on UML . The generated mutants are determined by a set of mutation operators. This set of operators is determined by the type of models being tested (e.g., UML state machine, UML activity diagram, or UML class diagram). Researchers have also integrated model checking into mutation testing to gener- ate a test case from each failed mutant. The idea of integrating model checking with mutation testing was initially introduced by Ammann et al. [13]. The tenet is to automatically generate test cases from killed/detected mutants. From the 22 perspective of model-driven engineering, a translation from state machine diagram into a formal model (i.e., input language) of some model checker is a prerequisite step before applying model checking approach on the mutated models [48, 83, 155]. Several tools exist that support model checking a state machine such as vUML [86], USMMC [87], HUGO [80], and TABU [18]. These tools conduct translations from a state machine diagram to the input language of model checkers, such as SPIN, UPPAAL, etc. The verification can be accomplished by relying on verification tools for the target languages. Lorber et al. [88] used model checking on mutants of timed automata to generate test cases from these mutants. The approach works by checking safety properties first and then generating test cases. In the context of SysML models, Wang et al. [149] develop an automated approach of translating SysML state machine into a formal model in NuSMV, which is used to perform safety analysis and verification. Figure 2.3 describes the main steps involved in this translation. The method starts with SysML modeling along with systems specification. This model can be either a functional or an architectural model of the system depending on the stage of systems development. The model is then translated into a formal model according to some predetermined rules. Model checker is then used to verify whether this formal model satisfies safety properties. If failed, then a counterexample is produced showing a diagnostic trace of how the model fails to fulfill certain specifications. Compared to these prior approaches, our work differs fundamentally from prior research in that our intention is not to kill as many mutants as possible, but to use the killed-survived distinction to automatically trace safety requirements. Also, our approach leverages process mining techniques to identify faulty mutants by checking whether mutants satisfy safety properties. Next, we present our empirical 23

System specification and requirements generation

System requirement Safety analysis based on model checking Safety requirement

Verification through model checker Translation of SysML into formal model Properties verified

Yes No

SysML model Counter- End Functional model/ example Architectural model

Figure 2.3: Integration analysis of SysML and model checking (adapted from [149]). investigation on the trace links using the textual information in our context, i.e., where model-driven engineering practices such as SysML models are adopted.

2.4 Summary

In this Chapter, we reviewed some background knowledge and related work to our study. We also introduced our comprehensive investigation of traceability support within SysML tools. In particular, six SysML tools are surveyed, including commercial and open-source tools. Our goal is to understand how the traceability information is supported and managed in these tools. 24

Chapter 3

Assuring Safety Requirements Using Textual Information

In complex industrial projects, textual information has been recognized as an important factor for automatically recovering trace links in software development. The goal of this chapter is to empirically investigate if the trace links using textual information can assist in verifying safety requirements under test. We integrate the concept of obstacle analysis to recover situations in which a safety requirement will not be satisfied. Therefore, we use fault tree analysis to validate the safety requirements, and further use the elements of the fault tree to evaluate the quality of the automatically recovered trace links. We use a virtual Programmable Logic Controller (PLC) in the context of SysML as our subject system [9]. 25 3.1 Introduction

The availability of traceability has proven vital to several engineering activ- ities such as verification and validation (V&V) and software reuse [7][108][136]. Traceability refers to the ability to interrelate any uniquely identifiable artifact to any other [33]. Several authors have applied IR techniques and NLP approaches to recover trace links between different components. The reason is that most of the models and diagrams contain textual information. Therefore, artifacts having high textual similarity probably share several concepts, so they are likely good candidates to be traced from one another [93]. Nejati et al.[104] proposed an approach to automatically identify the impact of requirements changes on system design when the requirements and design elements are expressed in SysML models. Based on model slicing, this approach uses natural language processing (NLP) methods to rank the resulted elements from slicing. Although significant efforts have been devoted to retrieving traceability links, little is known about how the retrieved links are used to assure a virtual PLC (V-PLC) when integrated with safety analysis techniques. Programmable logic controllers (PLCs) are often used to implement safety critical systems [114]. PLCs based on IEC 61131 represent the state of art in industrial automation systems. However, these PLCs have limited capabilities for describing heterogeneous disciplines [21]. To integrate the specific knowledge of each discipline, MBSE was introduced [66]. Brecher et al.[21] proposed an approach of integrating a PLC into a model-based development system. The aim of the proposed approach is to validate a V-PLC before connecting it to the real system by using executable SysML models. The challenge is that the simulation model of any system is only an approximation of the actual system, no matter 26 the amount of time spent on building the model [96]. However, in requirements engineering, a technique such as obstacle analysis is used to recover situations in which a safety requirement will not be satisfied. Dealing with obstacles is very important for a safety critical system [8, 148]. In safety critical systems, safety analysis is performed using safety analysis techniques such as Fault Tree Analysis (FTA) and Failure Modes and Effects Analysis (FMEA) [127]. System validation is performed in a model-based paradigm using simulation. In this chapter, we integrate the simulation details of the V-PLC with the safety assessment technique FTA. FTA is one of the most prominent techniques in safety analysis and is used by a wide range of industries [127]. It was developed extensively by the nuclear and aerospace industries and can be viewed as a systematic technique for acquiring information about a system [17]. A fault tree is a graphical method that models how failures propagate through the system. In other words, it models how component failures lead to the undesired event of system failures [127]. The main goal of this chapter is to answer the following research question: To what extent can the trace links of textual information assist engineers in validating a virtual PLC in the context of SysML models? In other words, how much confidence can engineers claim about the V-PLC when relying only on the textual information?

3.2 Experimental Setup

Our objective is to validate a V-PLC when it is integrated into a model-based development process. The V-PLC with its behavior is modeled as a part of the 27 overall system (e.g., SysML models). In our approach, we use the simulation model that represents the behavior of the V-PLC (e.g., a safety requirement). In most cases, SysML models are exported in the XML format. However, in our case, we use the simulation results of the executed model. In particular, the simulation model consists of a static user interface (UI) and a dynamic simulation log file. The static UI contains elements that we utilize as a trace query. The simulation log file records all the details of the engineers’ actions during model execution. The capability of recording all the details is a key advantage of using simulations to perform safety analysis. The steps of our approach are as follows:

1. Manually generating FTA for a safety requirement in the V-PLC.

2. Representing the behavior of the V-PLC in the system models (e.g., SysML).

3. In addition to V-PLC representation, a query set of elements (i.e., textual content) based on the targeted requirement is required from the engineers. This query is based on the textual information presented in the user interface of the V-PLC.

4. Applying NLP similarity measures on the simulation file to identify the set of elements that impact the targeted safety requirement of the V-PLC.

5. Assessing the retrieved results by comparing it to the answer set derived from FTA.

The main purpose of our approach is to assess how much assurance we can claim about the tracing results of the virtual PLC when engineers rely only on the textual information. Our work can further inform how these tracing results can be reused to assure other V-PLCs or requirements in the system model. 28 3.3 Subject System

The case study we conducted in this chapter is the Transmission Control Mod- ule (TCM) described in [138] (Fig. 3.1). The SysML models for this subject system is also available in [138]. TCM consists of not only mechanical parts but also electronics. It controls gearbox and switch between gears based on input from several sensors as well as data provided by engine control module (ECM). It then processes this input to calculate how and when to shift gears in the trans- mission and generates the signals that drive actuators to complete this shifting. Electronic sensors monitor the selection of gear position, vehicle speed, throttle position, and many other attributes. This information helps the control module to adjust the current supplied to solenoids in the transmission that control the position of various valves and gears. For example, the gear position selector switch communicates to the TCM which gear has been selected by the operator. The crankshaft position sensor provides information to the TCM to determine the current rotational speed of the engine. This information is used by the TCM to determine when to change gears. The brake pedal position sensor helps to assure that the driver has applied the brake before shifting into park or reverse. To describe transmission controller and demonstrate how gears in the gearbox are changed, SysML models are used, i.e., structure diagrams, behavior diagrams, and parametric diagrams. A user interface (UI) is then added to simulate the model.

Our goal is to assure a safety requirement of the TCM. The safety requirement we have considered is: R= Revolution Per Minute (RPM) value shall not exceed 3900. An obstacle to this goal is O=RPM value exceed 3900 (the top undesired event shown in Fig. 3.2). This obstacle is called a “hazard” obstacle since it obstructs 29

Figure 3.1: Transmission Control Module (TCM) [138].

the satisfaction of a safety goal [148]. Obstacles recover situations in which a goal or a requirement is violated. The obstacle O is refined using AND/OR structure (directed acyclic graph) which shows how the occurrence of other events and which combination of them can lead to the top event. To help identify obstacles to the high-level requirement R, we selected FTA because it enables the analysis of individual risks. Moreover, according to Lee et al. [84], as systems become more complex and the consequences of accidents become catastrophic, a safety technique such as FTA should be applied. Fig. 3.2 shows the FTA of obstacle O which corresponds to our case study (Fig. 3.1). Our construction of the FTA was manual and inspired by the autonomous car described in [19]. As mentioned above, a UI is created to execute the SysML models. Fig. 3.3 shows the UI of the Transmission System. From this UI, engineers can use the textual elements as trace queries. We represent the trace queries as follows: TQ= {parking, reverse, neutral, drive, RPM, carSpeed, gear}. The next step after executing the model is to represent the behavior of this V-PLC as an XML or a text file. We use Cameo simulation toolkit plugin by MagicDraw1. It provides a

1https://www.nomagic.com/products/cameo-systems-modeler 30

Figure 3.2: Fault Tree Analysis

simulation log file that records all the simulation details during execution. A preprocessing technique for the simulation file is needed. The main objective of preprocessing is to obtain the key-terms from the simulation which can be used for NLP similarity measures. Several methods are used for preprocessing such as tokenization, stemming, and stop word removal. We denote the retrieved results as TraceSet.

Figure 3.3: User Interface of the V-PLC [138] 31 We use the Rapid-Miner tool for data preprocessing2. RapidMiner is a data science software platform that provides an integrated environment for data prepro- cessing, machine learning, text mining, and predictive analysis [43]. Rapid-Miner is used for both research and real-world data mining tasks. The data mining process can be made up of various nestable operators, described in XML files and created in Rapid-Miner’s graphical user interface. It provides more than 500 operators for all main machine learning procedures. It integrates learning schemes of Weka machine learning environment and also statistical schemes of R project [43]. After preprocessing the data, we apply NLP-similarity measures using TQ and TraceSet obtained from the UI and the Simulation file, respectively. We use the concept of query expansion and run multiple iterations to assess if the retrieved results are improved. Query expansion is defined as “a method for improving retrieval performance by supplementing an original query with additional terms” [46]. The main purpose of query expansion is to improve the overall recall of the related elements. Once the first candidate trace links are obtained, different query expansion techniques can be applied. We added to the TQ additional links that were retrieved after computing the similarity measures, i.e., the top links previously retrieved. This process is known as relevance feedback. Next, a brief description of NLP-similarity measures (syntactic and semantic) of the textual information used in our study is reviewed.

3.3.1 Natural language processing (NLP)

Since engineers have the textual information presented in the user interface of the V-PLC, our approach explicitly includes syntactic and semantic information

2https://rapidminer.com/ 32 integration. The simulation file of the virtual PLC with the textual information displayed on the UI enables us to use NLP-similarity measures since the file contains rich textual information that can be used for retrieving more traces other than the ones displayed in the UI, e.g., TQ. NLP-similarity measures can be syntactic or semantic. Syntactic measures are based on their syntactical representation (their string format), whereas semantic measures use general-purpose dictionaries such as WordNet. WordNet3 started in 1990 as a language project by Miller and Christian at the Cognitive Science Laboratory, Princeton University [119]. It is defined as a large lexical database of English language. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. These sets are linked by a different type of relations (i.e., is-a relationship). As a result, WordNet produces a combination of a thesaurus and dictionary which can be used for many applications such as text analysis. The recent online version of WordNet is v.3.1, announced in June 2011, containing around 117,659 synsets and 206,941 word-sense pairs [135]. For the choice of similarity measures, we follow the same strategy in [104], i.e., three syntactic measures (SoftTFIDF, Levenshtein, and Monge-Elkan) and four semantic measures (Resnik, JCN, Path, and Lin). While the goal in [104] was to rank already computed impacted elements of SysML with a change of requirements statement, our goal is to identify the impacted elements of the V-PLC modeled in SysML diagrams relying only on the textual information displayed on the UI. Next is a brief description of the similarity measures used in this study.

• Resnik: RES similarity measure depends on the amount of information two concepts (synsets) have in common. That is the information content of their

3https://wordnet.princeton.edu/ 33 most specific common subsumer. If there is no common concept, then the similarity between two concepts will be 0 [47].

• Jian and Conrath: While RES depends only on the information content shared by two concepts, JCN is based on a combination of edge counts in the WordNet is-a hierarchy and the amount of information shared by two concepts. In other words, this measure uses the sum of individual distances between the nodes in the shortest path and the information content as a decision factor [47].

• PATH measure: This method computes similarity between two concepts by counting the number of nodes along the shortest path in is-a hierarchy in WordNet taxonomy.

• Lin: Lin extended RES measure of the information content by considering the information required to describe what the concepts are. Lin is like JCN method but with a small modification, i.e., the similarity between two concepts is stated as the ratio of the amount of information two concepts have in common and the information required to describe these concepts [47].

• Soft-TFID is a variation of TFIDF. It combines two measures the TF-IDF and Jaro-Winkler. Soft-TFID first applies Jaro-Winkler to all pairs of tokens between the two concepts and then uses the TF-IDF measure to tokens that have a similarity score above the threshold [56].

• Levenshtein defines the distance between two strings by counting the min- imum number of operations needed to transform one string into the other. Operations include insertion, deletion, or substitution of a single character, 34 or a transposition of two adjacent characters [56].

• Monge-Elkan measure is a hybrid similarity measure that combines internal character-based (e.g., edit distance) and token-based (i.e., word level) methods. Monge-Elkan takes the average score of the best matching tokens from the secondary measure such as Levenshtein, Jaro, and Smith-Waterman [56].

3.3.2 Evaluation Metrics

In this study, the retrieved links are evaluated using recall, precision, and F2 metrics. In the field of IR, recall and precision are the standard measures often used to assess the quality of the retrieved trace links. Recall can be defined as the relevant links that are successfully retrieved, while precision measures the accuracy of the retrieved links. More specifically, recall and precision can be calculated as follows:

TP Recall = (3.1) TP + FN

TP P recision = (3.2) TP + FP where TP is the true positive, FP is the false positive, and FN is the false negative. For example, suppose that 70 links were extracted out of 100 words in an answer set and among those 70, 30 links were correct, then the recall is 30/(70+30)=0.30, while precision 30/(30+40)=0.43.

To minimize noise, as in line with common practice [104], a threshold score is established such that all links above or at the threshold are retrieved, and all links 35 below the threshold are rejected. Because it is not feasible to achieve identical recall values across every trace set, it can be challenging to compare recall and precision results across experiments [70]. Therefore, we use another metric known as the F-Measure, which computes the harmonic mean of recall and precision. In this paper, we used a variant of the F-measure, known as the F2- Measure, which weights recall values more highly than precision (by placing more emphasis on false negatives). This weighting is appropriate in the traceability domain where it is essential to recall as many of the correct links as possible [122]. F2 can be calculated as follows:

5 ∗ P recision ∗ Recall F 2 = (3.3) (4 ∗ P recision) + Recall

3.4 Results and Analysis

3.4.1 Results

We assess the usefulness of the similarity measures by comparing the retrieved links with the answer set derived from our FTA. Table 3.1 shows our results of applying similarity measures. After applying similarity measures, a postprocessing step for the retrieved results is done, i.e., we removed all the duplicates since two trace queries may retrieve the same set of traces. Our results show that Levenshtein and Monge-Elkan result in 70% recall of set-based coverage with 29% precision (54% F2), while softTFIDF results in high precision 50% with low recall 25% (27% F2). The reason is that Levenshtein and Monge-Elkan assume all tokens have equal weight and assign a similarity score between every two key-terms (high recall), while softTFIDF considers only similarity between tokens that are above the threshold, 36 Table 3.1: NLP Similarity Measures Results before Query Expansion Levenshtein softTFIDF RES JCN PATH Lin Recall 0.70 0.25 0.40 0.25 0.25 0.30 Precision 0.29 0.50 0.25 0.19 0.13 0.15 F2 measure 0.54 0.27 0.35 0.22 0.21 0.25

Table 3.2: NLP Similarity Measures Results after Query Expansion

Levenshtein softTFIDF Res Jcn PATH Lin Recall 0.75 0.25 0.40 0.25 0.25 0.30 Precision 0.31 0.50 0.25 0.19 0.13 0.15 F2 measure 0.59 0.27 0.35 0.22 0.21 0.25

so it assigns 0 for non-closely matching terms (high precision). For example, the retrieved trace set using the term drive as a query set are: drive, driver, drivershaft because SoftTFIDF allows partial matching of words instead of only allowing exact matching such as in TFIDF; this is why we received the lowest recall in TFIDF of only 15% . Table 3.2 shows our results after applying query expansion. The results show that Levenshtein and Monge-Elkan perform better with the additional terms yielding a recall of 75% with 32% precision. However, the rest of the methods perform the same even after query expansion, especially the semantic measures (considering that we removed the duplicate traces). Semantic measures such as RES and JCN retrieved most of the trace queries using only a subset of the TQ. However, the overall performance is still low even with query expansion. The reason for that is the small dataset of the V-PLC used plus WordNet does not include much domain- specific terminology. From this result, we can see that textual information only gives 59% assurance using set-based measure. The F2 statistics are summarized in Fig. 3.4. 37

Figure 3.4: F2 metric for similarity measures

3.4.2 Threats to Validity

Several factors can affect the validity of our study. Construct validity is the degree to which the variables accurately measure the concepts they claim to measure [154]. To mitigate the threats, we adopt standard IR metrics (recall and precision), which are used extensively in requirements traceability research. Threats to internal validity are influences that can affect the independent variable with respect to causality [154]. A potential threat to our study’s internal validity is our manual construction of the FTA, (i.e., error prone and time-consuming). Threats to external validity [154] impacts the generalizability of results. A threat to the external validity could be the answer set of the constructed fault tree. In particular, the elements of the fault tree we used to evaluate the similarity measures. However, these elements were chosen according to the component of the V-PLC in the transmission system. Another external validity is that our chosen tool provides the simulation details for any executed model. However, this may not 38 apply to other SysML tools.

3.5 Discussion

Studies show that the performance of the similarity measures is affected by characteristics such as text length, spelling accuracy, and presence of abbreviations [54]. Another common observation is that measures that demonstrate good perfor- mance and robustness for one data set can perform poorly on another [54]. In our study, we used the simulation details of the V-PLC built in SysML models as our data set. We assessed the usefulness of the textual information displayed in the UI and the simulation results by applying NLP-similarity measures and compared the automatically recovered trace links by the elements of the fault tree. Our results show that textual information gives only 59% assurance.

Figure 3.5: Integrating a virtual PLC in SysML models adapted from [21].

As we mentioned earlier, our requirement R may need multiple PLCs to check its fulfillment, and the same V-PLC can be used to verify more than one requirement. In our approach, we considered the safety requirement to be that RPM value should not exceed 3900 and analyzed how different components of a V-PLC are responsible 39 for the success or failure of the above-stated requirement. The recovered trace links using the set-based measures on the FTA help us identify the components of TCM that can be reused to verify other V-PLCs (Fig. 3.5). For example, when we consider the braking system, some of the retrieved results of the braking system are overlapped with impacted elements of the TCM.

Figure 3.6: Fault Tree Analysis Example

Set-based evaluation versus tree-based evaluation is an important comparison because it could explain which similarity measure is more effective. To explain this, let us randomly consider any of the two similarity methods Syntactic or Semantic (S1 and S2 ). As mentioned earlier in our approach, we use these two methods to retrieve trace links from the simulation file, based on our query requirement. Suppose we found that these two methods have returned the same Recall and Precision values, i.e., number of relevant and retrieved trace links are the same for both S1 and S2. Now consider the example Fault Tree in Fig. 3.6. Let the results obtained from both similarity measures S1 and S2 be 1, 6, 8 and 1, 3, 5 respectively, and the relevant nodes for this Fault Tree are 1, 2, 3, 5, 6, 8. We observe that both S1 and S2 have the same recall and precision values. But when we try to analyze the coverage of these results on a Fault Tree, it is very different. Let us observe Fig. 3.7 for the tree-based coverage of results obtained 40

Figure 3.7: Tree-based coverage for similarity measure S1 (left) and S2 (right) from similarity methods S1 (left) and S2 (right) respectively. Assessing which similarity measures could be more effective using tree-based or set-based may reveal new insights.

3.6 Summary

In this chapter, we empirically investigate if the trace links using textual information can assist in verifying safety requirements under test. We integrate the concept of obstacle analysis to recover situations in which a safety requirement will not be satisfied. Therefore, we use fault tree analysis to validate the safety requirements, and further use the elements of the fault tree to evaluate the quality of the automatically recovered trace links. Our results show that textual information using Monge-Elkan and Levenshtein methods give only 59% assurance using F2- measure (precision of 31%). As a result, engineers must provide additional queries or keywords to retrieve the actually impacted elements. Also, a cutoff point in the ranked list must be provided to filter out irrelevant elements that are not required for inspection. The ranked list of the impacted elements also does not take into 41 consideration the relationships between these elements. In the next chapter, we present the first step toward our proposed solution, i.e., we carry out a systematic mapping study to identify the common modeling mistakes in SysML. 42

Chapter 4

SysML Modeling Mistakes: A Systematic Literature Mapping

In this chapter, we carry out a systematic mapping study to identify the common modeling mistakes in SysML. Our objective is to collect evidence to not only understand the defects in SysML models presented in the contemporary literature, but to do so with an explicit emphasis on practice, real-world relevance, and industrial readiness. From 19 primary studies, 42 SysML modeling mistakes are identified. We adopted a hierarchy from our earlier literature review to assess the evidence level of all the selected primary studies. We also discuss the impacts of the mistakes in model-driven requirements engineering [4].

4.1 Introduction

Unlike in traditional software development where the software (or more ex- clusively, the working code) is the main artifact, in model-driven development 43 (MDD) the main artifact is a model or a set of models. These models encapsulate the modeler’s knowledge and views of the subject system, so that the stakeholder concerns can be managed throughout the development life cycle. For systems engineering applications involving interdisciplinary teams to design, build, and evolve complex systems like railway controls and autonomous vehicles, SysML [113] has become a de facto choice. Such a choice allows for structural and behavioral representations of the system, and for reasoning about the extent to which the requirements are met [132]. Because modeling is a human activity, mistakes are unavoidable. SysML modeling mistakes can occur for many reasons: human errors during the modeling process, lack of language support at the meta-model level, insufficient or overly constrained tooling, and so on. Orthogonal to the mistake sources, the consequences are typically defects manifested in the models themselves. Since these models are the main artifacts of MDD, understanding the defects is imperative. One kind of understandings is to classify the defects. To this end, prior work has contributed several classification schemes to help distinguish the defect types, char- acterize the inherent attributes, and inform the resolution strategies. These schemes include the IEEE 1044-2009 standard for software anomalies [74], Chillarege’s classi- fication for in-process measurements [27], and Grady’s software failure for root-cause analysis [61]. However, existing classifications are not concerned with models in the MDD context, let alone SysML models. In the absence of this knowledge, Briand and his colleagues [22] proposed a defect seeding strategy specific to SysML and further seeded four defect types in their study: incorrect association navigation of a block definition diagram, incorrect association multiplicity of a block definition diagram, incorrect operation ordering of an activity diagram, and incorrect effect on 44 transitions of a state machine diagram. Admittedly, the seeding strategy and the actual defects were based on the researchers’ subjective opinions and experience [22]. A different kind of understandings, influenced by the paradigm of evidence-based [79], focuses on systematic literature review or mapping. The goal is not necessarily creating new classifications but collecting evidence of the state-of-the-art so that the trends of a given field can be depicted and the knowledge gaps can be identified. To that end, Granda et al. [63] reported a closely related study in MDD by concentrating on the defects in UML-based conceptual models. A set of 28 articles was selected to serve as the primary studies of their systematic literature mapping. The mapping results indicated a tendency of reporting only “incorrect” defects (80%) rather than “missing” (8%) or “unnecessary” (12%) ones. The work of Granda et al. [63] also pointed out the need to develop more mature defect detection mechanisms beyond static methods (e.g., manual or automated inspections, checking consistency rules, and checking OCL constraints). In this chapter, we present a systematic literature mapping of SysML modeling defects to fill the gap elucidated in [22]. Our objective is to collect evidence to not only understand the defects in SysML models presented in the contemporary literature, but to do so with an explicit emphasis on practice, real-world relevance, and industrial readiness. For example, we adopt a hierarchy from our earlier literature review [11] to assess the evidence level of all the selected primary studies. This hierarchy ranges from “no evidence” and “evidence obtained from working out toy examples” on the weaker end to “evidence obtained from industrial case studies” and “evidence obtained from industrial practice” on the stronger end. It is on the basis of these practitioner-oriented criteria that we conduct our survey and analyze the results. Moreover, we discuss the implications of our findings to 45 Table 4.1: Software defect classifications

IEEE 1044 Orthogonal Defect HP Scheme [61] Conceptual Models [63] SysML Designs [22] STD [74] Classification [27] main artifact source code product & process process UML diagrams SysML diagrams (life cycle phase) (implementation) (software change) (early phases) (conceptual modeling) (systems design) main categories unnecessary, extra, incorrect, missing, unnecessary, incorrect (total # of leaf missing, and omission, and unclear, changed, missing, and (4) -level categories) incorrect (5) commission (8) and better way (23) incorrect (6) known testing vulnerability process teaching safety usage reporting [67] discovery [101] improvement [124] UML [134] compliance [39]

model-driven requirements engineering, shedding light on the way ahead. The remainder of this chapter is organized as follows. Section 4.2 reviews related work on defect classifications. Section 4.3 explains our literature mapping’s study design. Section 4.4 analyzes the results in terms of the SysML modeling mistakes, and further discusses how those mistakes link to requirements. Section 4.5 presents concluding remarks of the mapping study.

4.2 Related Work

Across the life span of a system, especially a software-intensive system, defects can appear at different stages and in different forms. Several defect classifications have been proposed in the software engineering literature and are summarized in Table 4.1. While the IEEE 1044-2009 standard [74] focuses on the source code and the orthogonal defect classification [27] focuses on the code change, the other schemes listed in Table 4.1 are mainly concerned with early phases of defect detection and defect prevention (e.g., in requirements engineering and design). The main artifacts and life cycle phases also shape how the defect classifications are used. Centered around the code, the IEEE 1044-2009 standard is instrumental in software testing (e.g., being an inspiration to get more structure into the incident 46 reporting [67]) and the orthogonal defect classification is applied to understand what might be special about the code defects that compromise the security of a system [101]. Although detecting the erroneous code and code change is important, defects occur in requirements would significantly cripple the resulting system [23]. For this reason, improving software processes cannot afford to overlook the business and requirements angles [124]. Learning conceptual modeling [134] and practicing safety inspections should also pay attention to the various defects [22, 63]. Note that the “known usage” of Table 4.1 is based on our knowledge and is meant to illustrate the subtleties of the classification schemes. Despite the subtleties, the classifications themselves share certain similarities. In most cases, “unnecessary”, “missing”, and “incorrect” requirements, model elements, code, etc. are considered to be defects and are further differentiated, though the terminologies are by no means unanimous. One can probably map “extra”, “omission”, and “commission” of [27] to “unnecessary”, “missing”, and “incorrect” of [74] respectively. To mitigate ambiguity, a hierarchy of sub-categories is often formed. In the work done by Granda and her colleagues [63], a two-level hierarchy is presented for the UML-based conceptual modeling defects, e.g., “unnecessary” is decomposed into “redundant” and “extraneous”. This results in 6 leaf-level categories as shown in Table 4.1. As far as the SysML defects are concerned, the strategy proposed by Briand et al. [22] leads to 4 types of seeds mentioned earlier, all of which are “incorrect” operations injected into existing designs. Questions remain about whether other types of defects like “unnecessary” occur in SysML models, how frequent and severe the defects are, what kinds of models are susceptible to which mistakes, how believable the reported evidence is, etc. These motivate us to 47 search the literature more systematically in order to map the state-of-the-art.

4.3 Mapping Study Design

Before teasing out our research questions, we clarify the terminology appeared in the relevant literature. According to IEEE, an anomaly is: “Any condition that deviates from expectation based on requirements specifications, design documents, user documents, standards, etc. or from someone’s perception or experience”, whereas a defect is: “An imperfection or deficiency in a work product where that work product does not meet its requirements or specifications and needs to be either repaired or replaced” [74]. Instead of gearing toward work product and even repair or replacement actions, we choose the term mistake1 to incorporate the human and social aspects in SysML modeling. For example, our intention is to cover mistakes like error—“a human action that produces an incorrect result” [74]—so that the cause of a defect, and not just the defect manifested in the work product, could be understood. The implications of the SysML modeling mistakes are surveyed mainly from the MDD requirements engineering perspective in our work. Due to the broad contextual considerations such as cause and implication, we carry out a systematic mapping study on SysML modeling mistakes. Compared with a systematic literature review, a mapping study follows the same process of formulating research questions, defining literature search criteria, determining primary studies, extracting data, and reporting [79]. Differences include that a systematic mapping deals with a broader research topic and its data extraction

1Merriam-Webster (http://www.m-w.com/) defines mistake as: “a wrong action or statement proceeding from faulty judgment, inadequate knowledge, or inattention”. This explanation fits the purpose of our study. 48 and reporting tend to use summaries rather than techniques like meta-analysis or narrative synthesis. Napole˜ao et al. [102] highlighted quality assessment as being the only practical difference between systematic literature reviews and systematic mapping studies. In our work, quality assessment of primary studies emphasizes evidence strength as it relates to MDD practitioners.

4.3.1 Research Questions

We set out to answer four research questions:

RQ1: What are the SysML modeling mistakes, their types, and their causes?

RQ2: Which SysML diagrams are subject to the modeling mistakes?

RQ3: What are the evidence levels of the reported SysML modeling mistakes?

RQ4: How do the SysML modeling mistakes impact requirements engineering in MDD practice?

It is important to note that our goal is not to devise a new classification scheme. We therefore use the 6 leaf-level categories presented by Granda et al. [63] as a baseline: “missing”, “inconsistent”, “incorrect”, “ambiguous”, “redundant”, and “extraneous”. Meanwhile, we are open to emerging categories or facets. It is also

worth noting that RQ4 has a direct relevance to MDD practice, and for that reason, we choose only those studies with industrial-strength evidence (as opposed to the weaker levels of evidence) to discuss the influences of SysML modeling mistakes on requirements engineering. 49

Table 4.2: Primary studies listed chronologically (year of publication) and then within the same year alphabetically (first author) ID Source DOI or Grey Literature PS1 C. Choppy and G. Reggio, “A Method for Developing UML State Machines”, in SAC, 2009 10.1145/1529282.1529365 PS2 Y. Jarraya, et al. “On the Meaning of SysML Activity Diagrams”, in ECBS, 2009 10.1109/ECBS.2009.25 PS3 R. Karban, et al. “MBSE in Telescope Modeling”, International Systems Engineering Newsletter, 2009 10.1002/inst.200912424 PS4 C. L. Delp, “FireSAT: Model vs Documents Alone”, 2010 grey1 L. Mi and K. Ben, “A Method of Software Specification Mutation Testing Based on UML State PS5 10.1016/j.proeng.2011.08.023 Diagram for Consistency Checking”, in CEIS, 2011 G. Reggio, et al. “ “Precise is Better Than Light” a Document Analysis Study about Quality of PS6 10.1109/EmpiRE.2011.6046257 Business Process Models”, in EmpiRE, 2011 PS7 Z. Andrews, et al. “Model-Based Development of Fault Tolerant Systems of Systems”, in SysCon, 2013 10.1109/SysCon.2013.6549906 PS8 R. Steiner, “Common SysML Conceptual Stumbling Blocks”, in San Diego INCOSE Mini-Conference, 2013 grey2 PS9 B. K. Aichernig, et al. “Model-Based Mutation Testing of an Industrial Measurement Device”, in TAP, 2014 10.1007/978-3-319-09099-3 1 S. Ali, et al. “Does Aspect-Oriented Modeling Help Improve the Readability of UML State Machines?”, PS10 10.1007/s10270-012-0293-5 Software & Systems Modeling, 2014 E.´ Andr´e, et al. “Activity Diagrams Patterns for Modeling Business Processes”, Software Engineering PS11 10.1007/978-3-319-00948-3 13 Research, Management and Applications, 2009 E. A. Antonio, et al. “Verification and Validation Activities for Embedded Systems–A Feasibility Study on PS12 10.5220/0004887302330240 a Reading Technique for SysML Models”, in ICEIS, 2014 L. Briand, et al. “Traceability and SysML Design Slices to Support Safety Inspections: A Controlled PS13 10.1145/2559978 Experiment”, ACM Transactions on Software Engineering and Methodology, 2014 PS14 H. Kruus, et al. “Teaching Modeling in SysML/UML and Problems Encountered”, in EAEEIE, 2014 10.1109/EAEEIE.2014.6879380 PS15 Shannon (GenMyModel Community Manager), “5 Common UML Mistakes”, 2014 grey3 S. Feldmann, et al. “Towards Effective Management of Inconsistencies in Model-Based Engineering of PS16 10.1016/j.ifacol.2015.06.200 Automated Production Systems”, in INCOM, 2015 PS17 K. Hampson, “Technical Evaluation of the Systems Modeling Language (SysML)”, in CSER, 2015 10.1016/j.procs.2015.03.054 S. Pavalkis, “MBSE in Telescope Modeling: European Extremely Large Telescope – World’s Biggest Eye PS18 grey4 on the Sky: Tool Vendor Perspective”, in Space Symposium, 2015 H. Sun, et al. “Improving Defect Detection Ability of Derived Test Cases Based on Mutated UML Activity PS19 10.1109/COMPSAC.2016.136 Diagrams”, in COMPSAC, 2016 grey1: https://mbse.gfse.de/documents/SpaceSystemsIW10.pdf grey2: https://sdincose.org/wp-content/uploads/2013/11/11-Rick-Steiner-SysML-Conceptual-Stumbling-Blocks.r.04.pdf grey3: http://blog.genmymodel.com/5-common-uml-mistakes.html grey4: https://www.spacesymposium.org/wp-content/uploads/2017/10/S.Pavalkis 31st Space Symposium Tech Track paper.pdf

4.3.2 Search Criteria

Our search for the primary studies was carried out in June 2019 and involved two stages: an automatic one over Elsevier’s Scopus and a manual one including the grey literature. We relied on Scopus due to its structured and advanced ways ABCDEFG

50 to specify query. Our first attempt issued the search string:

TITLE‐ABS‐KEY ( ( “SysML” OR “SysML diagram” OR “SysML design” OR “SysML model” ) AND (“mistakes” OR “design mistakes” OR “design ABCDEFGerror” OR “defect”) ) AND PUBYEAR > 2006 AND PUBYEAR < 2020 AND ( LIMIT‐TO ( DOCTYPE , “cp” ) OR LIMIT‐TO ( DOCTYPE , “ar” ) OR LIMIT‐ TO ( DOCTYPE , “ch” ) ) AND ( LIMIT‐TO ( LANGUAGE, “English” ) )

CONT’D Although Scopus returned 14 papers, only 1 was regarded as relevant by us. We

TITLE‐ABS‐KEY ( ( “SysML” OR “SysML diagram” OR “SysML design” OR reconsidered the query by removing the year restrictions even though SysML was “SysML model” ) AND (“mistakes” OR “design mistakes” OR “design first adopted byerror” the OR OMG “defect”) in ) 2006. AND PUBYEAR We further > 2006 expandedAND PUBYEAR the < 2020 query AND with the UML ( LIMIT‐TO ( DOCTYPE , “cp” ) OR LIMIT‐TO ( DOCTYPE , “ar” ) OR LIMIT‐ diagrams reused TO by( DOCTYPE SysML, , “ch” as ) well ) AND as ( LIMIT an additional‐TO ( LANGUAGE, mistake “English” possibility ) ) of “modeling error”. TheHIJKLMN refined search string was as follows:

TITLE‐ABS‐KEY ( ( “SysML” OR “SysML diagram” OR “SysML design” OR “SysML model” OR “UML state machine” OR “UML activity diagram” ) AND (“mistakes” OR “design mistakes” OR “design error” OR “modeling error” OR “defect”) ) AND ( LIMIT‐TO ( DOCTYPE , “cp” ) OR LIMIT‐TO ( DOCTYPE , “ar” ) OR LIMIT‐TO ( DOCTYPE , “ch” ) ) AND ( LIMIT‐TO ( LANGUAGE, “English” ) )

This search resulted in 41 papers. Two researchers collaboratively judged relevance by going through the abstract and the content of each paper. At the end, only 5 CONT’D papers were relevant: PS1, PS10, PS11, PS12, and PS19 of Table 4.2. We then engaged in a manual search via Google Scholar to check recursively the references and the citations of those 5 relevant papers. Unlike the first stage, we included both peer-reviewed publications and grey literature such as presentations, white papers, HIJKLMN and blogs. The final list consisted of 19 primary studies as shown in Table 4.2. 51

Figure 4.1: SysML activity diagram reviewed in our study (Figure 13 in PS2 ).

4.3.3 Data Extraction

While the actual extracted data2 are presented in the next section to answer our research questions, we describe here the process of how we extracted data from the selected primary studies. We used a two-phase process. First, two researchers individually reviewed five randomly selected papers (PS13, PS8, PS5, PS2, and PS12). The researchers followed a pre-defined data extraction form, and then compared their results in a two-hour meeting. The observations were that their agreement levels were high, and consensus was established after the meeting. As an example of the first phase, the researchers independently reviewed PS2 on SysML activity diagrams. In both data extraction results, the mistake of “a join placed after a decision node” was recorded. Figure 4.1 illustrates this mistake with PS2’s hypothetical design of the behavior corresponding to banking

2The entire data of our study are shared in an institution-wide repository, Scholar@UC [6], for replication and cross-validation purposes. 52 operations on an automated teller machine (ATM). The guards [g1] and [g2] denote the probability of triggering new operations or looping back of re-performing some earlier operations. In Figure 4.1, if [g1] is being evaluated twice, i.e., “Choose account” is performed twice, then a deadlock may occur depending on how [g2] is evaluated. The researchers classified this mistake as “incorrect” by following the scheme presented by Granda et al. [63]. The other data extraction fields of this mistake were also consistent between the two researchers. Building on the first phase, we randomly assigned the remaining 14 primary studies to those researchers: 7 per person. The final data extraction results were aggregated and consolidated. We report these results next to answer the four research questions of our mapping study.

4.4 Results and Analysis

4.4.1 Forty-Two Mistakes

Our literature mapping identifies 42 distinct mistakes. While 6 mistakes are mentioned in two primary studies (we list them in Table 4.3), the majority (86%) mistakes are reported only once in the remaining primary studies. Although PS5 and PS10 examine consistency and readability, state machine diagram is what both studies focus on. As a result, all the four mistakes discussed in PS5 are also mentioned in PS10. Additionally, PS10 covers “a state is subsumed by another state” which is also presented in PS1: another study on state machines. Table 4.3 shows the positive contributions of the grey literature. Specifically, “replacing a fork/join node with a control” in the activity diagram is not only identified as an incorrectness by the most recent work (PS19), but also recognized 53 Table 4.3: Mistakes mentioned in more than one primary study

Mistake Description (diagram type) Mentioned a state is subsumed by another state (state machine) PS1, PS10 a transition that comes from or leads to a wrong state PS5, PS10 or moves with wrong conditions (state machine) a transition is missing (state machine) PS5, PS10 a transition is subsumed by another (state machine) PS5, PS10 a state is missing (state machine) PS5, PS10 replacing a fork/join node with a control (activity) PS15, PS19

as a common mistake in an earlier blog post (PS15). For the four pieces of grey literature that we have surveyed, 6 mistakes are found only in them. In another word, these 6 SysML modeling mistakes would be hidden if the grey literature were not explicitly searched or considered. Adopting the scheme by Granda et al. [63] allows us to classify the 42 mistakes identified. Figure 4.2 shows the classification distribution. The largest proportion (“incorrect”) accounts for 45% of the mistakes, including those listed in Table 4.3.

Figure 4.2: Distribution of the 42 SysML mistake types. 54 In line with the findings in [63], our results confirm that “incorrect” remains the most frequently reported mistake type, though our proportion here (45%) is based on the mistakes (N=42) while that of Granda et al. [63] (80%) is based on the primary studies (N=28). Different from the trend of 8% revealed in [63], “missing” makes up 40% of the mistakes in our data. A closer look shows that many instances are about what are desired but currently “missing”. For example, PS3 points out SysML does not differentiate intrinsics between various interfaces like logical and mechanical. Therefore, “missing” in our results includes not only model defects (e.g., “state machine transitions are modeled without triggers” in PS1), but also modeling deficiencies (or even feature requests) like the one pointed out in PS3. Four mistakes are classified as “redundant”, implying the precision of using the leaf-level category (“redundant” or “extraneous”) rather than the higher-level one (“unnecessary”). For instance, in SysML activity diagram modeling, PS19 shows that it would be redundant to add a new pair of fork and joint within an existing pair of fork and joint. Our study uncovers two “inconsistent” mistakes. Both are discussed in PS16 and are syntactic in nature: One is about using improper type conversions or castings, and the other refers to using incompatible value, data, or primitive types. To explore the cause(s) of each identified mistake, we apply open-ended coding without adopting any pre-existing schemes. To our surprise, many primary studies lack the information on this. Referring back to Figure 4.1, though the mistake is clearly presented in PS2, we could not locate relevant information about why such a mistake happened, who made it, under what circumstances it occurred, etc. The two plausible causes that we have identified from the primary studies 55

Figure 4.3: Observability of the 42 mistakes in SysML models.

are: “meta-model limitation” and “tool limitation”. An example of the former is SysML’s lack of support for time being a first-class element (PS4), and that of the latter is Cameo’s MDD tool may corrupt the model if extensive links are made (PS18). Given that most causes remain unknown, we turn our attention to how the mistakes manifest themselves in the SysML models. We perform this analysis with three degrees of observability: “directly observable” (e.g., cyclic associations in a block definition diagram discussed in PS15), “indirectly observable” (e.g., shifting down the fork node while lifting up the join node in PS19’s study of activity diagrams), and “not observable” (e.g., no timing element in block definition diagram according to PS4). Figure 4.3 shows the results of our observability analysis, where 60% of the mistakes are directly observable, and hence can be syntactically checked, in the SysML models. The 14% indirectly observable ones would require semantic interpretations that are oftentimes needed from the modelers. Unfortunately, 26% of the mistakes are not observable in the resulting models themselves, many of which are “meta-model limitation” and “missing” mistakes. In another word, if 56 there is currently no way of expressing a certain construct in SysML (meta-model is limited), then that construct will be missing and impossible to observe in existing models. In summary, 42 mistakes are identified in our mapping study, a majority of which represents incorrectness in SysML modeling. Broadening the mapping study to incorporate causes of the mistakes allows us to show a nontrivial proportion of insufficiencies reported in the literature. These insufficiencies can further lead to new and improved features of the meta-model or the modeling tool. Despite our explicit consideration of causes, they are difficult to extract. Our updated analysis shows that, independent of the causes, a majority of the mistakes can be readily detected in the SysML models syntactically.

4.4.2 Five Diagrams

Our study shows that, out of the 9 model types of SysML (cf. Figure 2.1), mistakes have appeared in 5 diagrams. From the mostly discussed to the least discussed, these five are: activity diagram, block definition diagram, state machine diagram, requirement diagram, and internal block diagram. The number of mistakes specifically applicable to these diagrams is 15, 11, 10, 3, and 1 respectively. The total here is 40, leaving the following 2 mistakes unaccounted for:

• “extensive linking by modelers has side effects (introduced by changes) as these changes can go unnoticed and corrupt the model” (PS18), which we believe can affect a set of interrelated SysML diagrams; and

• “an open issue about navigation to the different views of a block (mechanical, optical, . . . )” (PS3), which is a tool limitation affecting multiple diagram 57

Figure 4.4: SysML diagrams and mistake types

types, e.g., block definition and internal block diagrams.

For reasons of being crosscutting, we exclude the above two mistakes from our current RQ2 analysis. Figures 4.4 and 4.5 help visualize the modeling mistakes that the SysML diagrams are susceptible to. In Figure 4.4, “missing” appears in all the 5 diagrams. If “missing” suggests new modeling capabilities, then it seems all SysML diagram types are open to improvements. Even though “incorrect” accounts for 45% of the identified mistakes (cf. Figure 4.2), they are reported to appear in only state machine, activity, and block definition diagrams. One reason may be that these diagrams are used more often; another might be that they are difficult to be correctly practiced. An interesting pattern of Figure 4.5 is that, in behavioral diagrams (state ma- chine and activity diagrams), more than half of the mistakes are directly observable. 58

Figure 4.5: SysML diagrams and mistake observability

In state machines, this ratio is as high as 90%. Such a pattern does not hold for structural diagrams (block definition and internal block diagrams) or requirement diagram. The results depicted in Figures 4.4 and 4.5 indicate that, in SysML, the way in which the system works (or is intended to work) can be more readily checked than the way in which the system components are arranged. In summary, 5 SysML diagrams are shown to be susceptible to mistakes, and our literature mapping fails to recognize any mistakes in sequence, use case, package, or parametric diagrams. Although improvements can be made to all diagrams (e.g., filling in the “missing”), “incorrect” practices tend to happen when state machine, activity, and block definition diagrams are built. Once the SysML models are built, the mistakes in behavioral diagrams are easier to spot than those in structural and requirement diagrams. To better understand in what contexts (e.g., research or 59 industry) and application domains SysML models are developed, we next assess the evidence level of the primary studies.

4.4.3 Seven Industrially Relevant Pieces of Evidence

Different from answering RQ1 and RQ2 where mistakes serve as the units of analysis, we address RQ3 by treating each primary study as our analysis unit.

RQ3 is our main effort to emphasize industrial relevance because the evidence level reported in the literature is critical for practitioners to believe the research findings. In our context, the strength of evidence would directly inform the practitioners about how likely the SysML modeling mistakes are and how much attention they shall pay to specific mistakes. Building on our experience in conducting systematic mappings and reviews [11, 143, 144], we use a hierarchy of evidence levels to make our assessment more practical. In Table 4.4, we present the evidence levels from weakest (top of the table) to strongest (bottom of the table), along with the number of primary studies at each level. Table 4.4 shows that all the primary studies, except for one, provide

Table 4.4: Evidence levels of the primary studies Evidence Level # of Primary Studies No evidence 1 Evidence obtained from demonstration 3 or working out toy examples Evidence obtained from expert opinions 5 or observations Evidence obtained from academic studies, 3 e.g., controlled lab experiments Evidence obtained from industrial studies, 4 e.g., causal case studies Evidence obtained from industrial practice 3 60

Table 4.5: Mistakes reported in industrially relevant primary studies ID Mistakes Domain (model size) hiding internal blocks also hides the nested connector PS3 no intrinsic differentiation between various interfaces Optical Telescope (13000 model elements, 700 symbols, 150 (industrial relationship of the ports is hardly shown diagrams, 50 high level requirements, 50 control systems re- practice) difficult to relate the different parts into the associated context quirements, refined by 150 use cases) an open issue about navigation to the different views of a block PS4 (indus- putting activity on internal block diagram creates separate usage Space Systems; FireSat Mission (10 stakeholder requirements, trial practice) impossible for time to be a first-class element 12 systems requirements, 4 test requirements, 13 blocks a transition to/from a wrong state or with wrong conditions PS5 a transition is missing (industrial Control Sub-System (32 states, 45 transitions) a transition is subsumed by another study) a state is missing PS7 (indus- no external send signal action to an event or no data handling Radio System; Mobile Phone System; Emergency Response trial study) can lead to a deadlock (not provided) PS9 (indus- incorrect time/signal triggers in a state machine diagram Automotive (19 states, 39 transitions) trial study) PS18 (indus- extensive linking can corrupt the model [same as PS3] trial practice) PS19 (indus- incorrect guard condition of activity diagram’s decision node Aircrafts (23 model elements) trial study) input pin or output pin is missing in an activity diagram

evidence to contextualize the mistakes, though some are demonstrations explaining the mistakes or word-of-mouth opinions from individual experts. Although academic studies may encompass the rigor of executing well-controlled experiments, we favor the evidence obtained from studying contemporary real-world problems or systems in their industrial contexts. Seven primary studies present the SysML modeling mistakes grounded in industrial-strength applications or drawn from the actual practices. The mistakes reported in these studies are rooted in much stronger evidence levels and are thus more believable. The extent to which these mistakes are applicable to other systems engineering projects depends on many factors, including the application domains as well as the size and complexity of the models. We therefore list the relevant information in Table 4.5. 61 The domains shown in Table 4.5 are truly interdisciplinary, requiring expertise in not only software engineering but also automotive, aeronautics, aerospace, and so forth. Mistakes occur in SysML models of various sizes, ranging from a couple of dozen elements to tens of thousands of elements. This finding suggests that mistakes do not necessarily correlate with size or even complexity; rather, they may be due to human errors or collaboration breakdowns. Another observation of Table 4.5 is the crucial role that requirements play in industrial SysML projects. In PS3 and PS4, for example, the requirements information is explicitly presented. Next, we discuss how the SysML modeling mistakes, especially the industrially relevant ones, influence requirements engineering in MDD.

4.4.4 Three Impacts on Model-Driven Requirements Engi-

neering

Building on the answers to RQ3, we address RQ4 with an interdisciplinary system (namely an emergency response system) by focusing primarily on the mistakes listed in Table 4.5. In addition, the primary studies of Table 4.5 provide the goals (e.g., safety, security, and reliability) and tasks (e.g., change impact analysis, consistency checking, and compliance assurance) that a requirements engineer considers and performs in the MDD context. In Figure 4.6, we thus adopt the i∗ graphical notations [156] to represent these softgoals and tasks. It is not surprising to us that one of the impacts of SysML mistakes is to lead to requirements being violated ( A of Figure 4.6). To illustrate the impact, we show some SysML modeling fragments in Figure 4.7, and consider a high-level requirement of the emergency response system: “For every call received, send an emergency response unit (ERU) with correct equipment to the correct target”. 62 SysML modeling mistakes

<> Vehicle

<> <> Steering Sys Braking Sys

RPM update

{probability =0.8} RPM=RPM+ 100

RPM=RPM- {probability =0.2} 100 . . .

A B C Violating Refining Having no impact on Requirements

Safety Security

Change impact Reliability analysis

Consistency Compliance checking assurance

. . .

Figure 4.6: SysML mistakes’ impacts on requirements.

The mistakes of mixing aggregation and composition in a block definition diagram ( 1 and 2 of Figure 4.7a) would violate the requirement demanding the phone system to be operated and managed by external communication systems and not by the emergency response system. In addition, the multiplicity mistake ( 3 of Figure 4.7a) would violate the requirement of allowing the emergency response system to communicate with many ERUs each providing equipment needed for the aid, and not with one and only one ERU. 63

6

1 2 3

4

5

(a) (b)

7

(c)

Figure 4.7: Illustration of SysML mistakes’ impacts on requirements.

The SysML modeling mistakes can sometimes be valuable to requirements engineer. In Figure 4.7b, the mistakes of adding control flows instead of a fork node ( 4 ) and adding a join node after a decision node ( 5 ) could assist in requirements refinement ( B of Figure 4.6). In particular, a synchronization from “Divert ERU” to “Log diversion” and to the merge node must be performed. In other words, when there is no ERU available at the time of the rescue event and the case is critical, the diverted ERU shall receive the rescue information immediately. To satisfy this 64 requirement, a fork node should be modeled instead of the control flows. To avoid the deadlock, the join node ( 5 of Figure 4.7b) shall be removed since it will be waiting for an input that will never be delivered. One refinement option is to relax the timing constraint for the diverted ERU to receive the rescue information, e.g., requiring there exists a next state, rather than requiring all the next states, with tolerable rescue receiving delay. “Start rescue” event ( 6 of Figure 4.7b) could contribute to the deadlock if there is no external signal sent. Recognizing such a mistake helps uncover new requirements expressing the desire of having some external-signal-sent action in the SysML design. To our surprise, a third relation suggests that some mistakes have no impact on requirements ( C of Figure 4.6). The state machine diagram of Figure 4.7 illustrates this. The mistake of modeling transitions without triggers ( 7 of Figure 4.7c) would neither violate the requirement: “call center shall generate and manage rescue events” nor suggest new or improvement conditions/constraints related to this requirement. One main reason is that the modeling mistake appears outside the design slice of the targeted requirement [22]. This is an important issue given the increasing size and complexity of SysML models and only a set of critical requirements (e.g., safety and security) shall be reasoned about thoroughly to support tasks such as compliance assurance. Understanding the scope of the SysML modeling mistakes, combined with developing better methods to trace critical requirements in the MDD context, will be valuable for practitioners to resolve crucial mistakes while tolerating or delaying the resolution of others. 65 4.5 Concluding Remarks

This chapter reports our systematic mapping study of SysML modeling mistakes and the impacts of the mistakes in model-driven requirements engineering. Based on the 19 primary studies, we summarize our mapping results as follows.

• Forty-two SysML modeling mistakes fall into incorrect, missing, redundant, and inconsistent categories. While the causes of the mistakes have not been explicitly reported in the literature, most mistakes can be directly observed and thus syntactically identified in the SysML models.

• Five out of nine SysML diagram types are subject to the modeling mistakes, spanning from structure (block definition and internal block diagrams) through requirement to behavior (state machine and activity diagrams). This could indicate that these diagrams are practiced more often and/or are difficult to be practiced correctly. In line with the work of Granda et al. [63], more mature defect detection mechanisms beyond static methods (e.g., manual or automated inspections, checking consistency rules, and checking OCL constraints) should be considered for uncovering behavioral mistakes.

• Seven primary studies show higher-level evidence rooted in industrial studies and practices. Unlike UML, SysML mistakes are made in truly interdis- ciplinary systems such as space systems and emergency response systems. These industrially relevant studies suggest that modeling mistakes appear no matter how large or complex the system is, and due to the interdisciplinary nature of systems engineering, identifying the mistakes and performing root cause analysis of the mistakes would likely involve subject domain experts 66 and engineers with diverse backgrounds in software, electrical, mechanical, etc.

• Three impacts on model-driven requirements engineering come from SysML modeling mistakes, emphasizing the recognition of requirements engineer’s critical concerns like safety and security as well as the tasks that they must accomplish. While certain mistakes, especially those in the “incorrect” cate- gory, violate the requirements, not all mistakes should be considered harmful. It turns out that some mistakes (e.g., the “missing” ones) could lead to new requirements to be discovered to alleviate the omissions; yet some other mistakes (e.g., the “inconsistent” ones) could help overcome requirements over-specification or under-specification. Finally, our work calls for better ways of understanding the scope of the modeling mistake so that their impacts on requirements can be properly reasoned about, resolved, or tolerated.

Like all the systematic mapping studies, ours is limited by the literature search strategies implemented. A threat to construct validity is our formulation of search queries and the sources of our search. We combined both automatic and manual search in our work, and included terms such as “defect” and “error” in the queries. Nevertheless, others may have different views about the key construct of “SysML modeling mistakes”, and therefore our mapping results shall be interpreted only within the 19 primary studies that we identified. We believe that the threats to internal validity are minimal due to the descriptive nature of our data extraction effort and the fact that we were not interested in creating any new classification schemes. As far as external and conclusion validities are concerned, we have shared the entire data of our systematic mapping study in an institution-wide repository, 67 Scholar@UC [6], and would welcome replications, cross-validations, evolutions, and expansions.

4.6 Summary

The main contribution of this chapter is conducting a systematic literature mapping of SysML modeling mistakes. We identified 42 SysML modeling mistakes from 19 primary studies. With an emphasis on the evidence of industrial relevance, we further uncover that, despite some mistakes hurt requirements satisfaction, others help make the requirements more complete and the specifications more precise. Our work sheds light on understanding the scope of the SysML mistakes and checking requirements fulfillment in the face of the mistakes. In the next chapter, we propose an approach of tracing safety requirements on which these mistakes serve as the basis to define mutation operators. These mistakes are simulated and then verified with an LTL checker. 68

Chapter 5

Tracing Safety Requirements and State-Based Design Models

In this chapter, we present our mutation-driven approach to tracing safety requirements and state-machine diagram in SysML modeling. Our approach contains three main components: creating mutants, verifying model mutants, and identifying trace links [3,5].

5.1 Introduction

When engineering safety-critical systems like medical devices, it is of vital importance to ensure the system design meets the safety requirements. For example, one such requirement for a therapeutic robotic arm [53] concerns: “Automatic stoppage of the robotic arm if arm velocity sensors disagree on current velocity by more than x mps [92]”. Design reviews, sometimes carried out by independent inspectors, are one of the main methods for ascertaining the satisfaction of safety 69 requirements. In fact, one of the most widely used industrial standards for embedded systems—IEC 61508 [75]—rates design reviews as Highly Recommended (the highest importance rating) for all systems at all criticality levels. The design of a software-intensive system often involves modeling. Modeling techniques have received wide industry acceptance, especially in critical domains like avionics and telecommunications, where state-based models are prevalent for describing system behaviors. These models consist of a finite set of states including the start state(s), a set of events (or “inputs”), and a transition function that determines the next state based on the current state and event [14]. Many variants exist, e.g., Statecharts specifying “Remote Identification” and other features at AT&T [105], RSML (requirements state machine language) adopted by the FAA to regulate collision avoidance installed on commercial aircrafts [85], etc. With the use of model-driven engineering being on the rise, state-based models become larger and more complex. This presents a significant challenge for design reviews, where the inspector may have to browse through the models and manually analyze large numbers of links between safety requirements and design models [22]. Automated requirements traceability [32, 71] can alleviate this challenge, e.g., information retrieval algorithms rely on the textual information of requirements and that of model elements to establish plausible traceability [20]. In model-rich but not necessarily text-rich situations, researchers have developed slicing techniques to automatically identify those model elements related to a given interest (e.g., an event or a changing requirement) [14]. For instance, the seminal work by Korel et al. [81] analyzed data and control dependencies for backward slicing, and more recently, Nejati et al. [103] used reachability analysis to perform forward slicing. While the resulting slice (trace) typically bears a recall value close 70 to 100%, the precision level is very low. For example, forward slicing combined with natural language processing achieved the best performance in tracing 16 requirements changes to the design models, and even this best performance had an average precision of only 29.4% [103], meaning that a large number of false positives were generated. In this chapter, we present our novel approach to tackling the false positives that have plagued automated traceability research for decades [20, 32, 68, 71, 108, 116, 151]. The idea is to intentionally generate many “false positives” from a state-based model (i.e., the tracing target) and then to check whether a safety requirement (i.e., the tracing source) is met in them in order to find the actual slice-trace. Our key insight is that false positives are “close” to the model elements in the real trace, but that “closeness” turns out to be quite faulty after some tracing is done, e.g., measuring textual similarity or performing dependence analysis. This faulty closeness is a main reason causing false positives to be mingled with real elements, hurting precision. If we can exploit faulty closeness before tracing, then this proactive approach will provide new capabilities of addressing the low precision challenge of automated traceability. To investigate the notation of “faulty closeness”, our work leverages mutation analysis. Mutation analysis is commonly used as a fault-based software testing technique [76]. Given a program, mutants are created by simple changes that are intended to represent the mistakes often made by programmers. In our work, we define the mutants of a state-based design model based on the common modeling mistakes surveyed in the literature. Each mutant thus encapsulates the “faulty closeness” in some form. We then trace the safety requirement by analyzing how the mutants satisfy it. This step is carried out by manually formulating temporal logic 71 formulas to capture the safety properties and employing an LTL model checker for automated verification. A mutant is killed if its model checking fails; otherwise, it is survived. We determine the final trace by examining the co-occurrence patterns of model elements (i.e., states and their transitions) in the killed mutants versus their patterns in the survived ones. The contributions of this chapter are threefold: (1) an innovative approach to exposing a great number of model mutants in support of safety requirements tracing, (2) an automated implementation of our approach based on model checking within process mining, and (3) an experimental evaluation of two subject systems with 27 requirements showing considerable precision improvements. The rest of the chapter is structured as follows. We introduce the background of our work via a running example in Section 5.2. Section 5.3 describes our traceability information model. Section 5.4 details our mutation-inspired approach and process-mining- based implementation. We present the experimental results in Section 5.5, and conclude the chapter in Section 5.6.

5.2 Running Example

This section uses a water distiller example adapted from [51] to illustrate the functional safety requirements that are to be traced. We also describe the syntax of the tracing target, which is rooted in the state machine diagram (SMD) of Systems Modeling Language (SysML) [113]. Finally, we show a couple of state-of-the-art methods [81, 103] in establishing the candidate traceability links of the running example, motivating our new tracing approach. Consider a water distiller intended for use in remote, undeveloped regions where 72

!water level low !water level high t1 t5 t8 (s1) (s4) (s5) (s6) Off water temp =100 Level low Level ok Level high do/power off do/open feed do/shut valves do/open drain power on t6 t7 t2 command water level low water level high (s9) (s2) Shutdown Filling t9 shutdown command do/cool down do/open feed t14 !sludge ok t15 t3 !water level low drain command (s7) (s8) t10 shutdown command (s3) Purging residue Building up residue residue command t12 Warming up do/open drain do/close drain t4 t11 do/heater on shutdown command t13

Figure 5.1: State machine diagram (SMD) of the water distiller example (adapted from [51]). water is generally available but seldom safe to drink, possibly because of viral and bacterial contamination. A distiller unit purifies water via heating; however, an actual solution must consider broad issues like environmental protection, energy conservation, installation cost, and functional safety [51] IEC 61508 [75] defines functional safety as part of the overall safety relating to the equipment under control (e.g., the water distiller). The goal is to ensure that any safety-related system must work correctly or fail in a predictable, safe way. In our running example, a fault of heated water running low may cause the hazard of leakage or explosion, and a functional safety requirement mitigating the fault can implement the proper safeguards to prevent the water level from staying low. Tracing functional safety requirements supports critical needs such as inspections, assurance, and certification [117]. For the tracing target, we concentrate on the SMD modeled in SysML. SysML represents a significant and increasing segment of industrial support for building critical systems [129, 133]. SMD is one of SysML’s 73 behavioral models, and the SMD considered in our work follows the syntax of an extended finite state machine (EFSM) [14]. Specifically, an EFSM consists of states (including an initial state and an exit state) and transitions between states. A transition is triggered when an event occurs and the guard (condition predicate) associated with the transition is evaluated to be true. During a transition, some action (input/output operation, variable manipulation, etc.) may be performed. An answer set can be defined as “a known set of trace links derived prior to tracing experiment, usually prepared by experts” [58]. Deriving these answer sets can be subjective and thus different answer sets might be constructed. Figure 5.1 shows a SysML SMD design depicting the intended behavior of the water distiller. In this particular design with 9 states and 15 transitions, one may consider the answer set as a={s4, s5, s7, t5, t9}. Others may consider the answer set as b={s4, s5, t5, t6}. In other words, an inspector who is tasked with assuring the requirement preventing the water level from staying low would want to focus on these model elements (i.e., a or b) because they provide the measures in the design to safely guard against the distillers low water level. The model elements in the real trace (also known as the answer set 1) is typically determined manually and shall be driven by the safety requirements rather than by the design. For example, an independent inspector not involved in the construction of the SysML models would be interested solely in “water level being low or not” without concerning (or knowing) whether any check on sludge is performed or if there is a shutdown state in the SMD design. In the running example, therefore, we designate “water level” to be the only point of interest and explain how backward

1We define the answer set to be the set of relevant states by excluding the transitions, e.g., {s4, s5, s7}—rather than {t5, t9} or {s4, s5, s7, t5, t9}—is the answer set of our running example shown in Figure 5.1 with respect to the safety requirement: “preventing the water level from staying low.” We discuss the threats to construct validity of this choice in Section 5.5. 74 slicing [81] and forward slicing [103] work based on this point of interest (slicing criterion).

• Backward slicing (BS) identifies those model elements that affect “water level” by analyzing the define-use dependencies of this interested data variable. Beginning with the exit state of Figure 5.1, the BS algorithm [81] traverses the SMD backward and selects {s6, s5, s4, s3, s2} to be the model slice.

• Forward slicing (FS) identifies those model elements that are being affected by “water level” via reachability analysis. Beginning with the initial state, the FS algorithm [103] traverses the SMD in a forward manner and returns the reachable subset of {s3, s4, s5, s6, s7, s8, s9} as the model slice.

It is important to point out that, given the safety requirement of “preventing the water level from staying low”, we follow the essences of the state-of-the-art [81, 103] to generate the candidate traceability links. Considering the two answer sets we derived earlier for the running example, i.e, a={s4, s5, s7} and b={s4, s5}. With {s4, s5, s7}, BS achieves the recall=67% and precision=40%, whereas FS’s recall=100% and precision=43%. Applying our approach to the running example returns {s4, s5, s6, s7}, resulting in the recall=100% and precision=75%. With {s4, s5}, BS achieves the recall=100% and precision=40%, whereas FS’s recall=100% and precision=28.57%. Applying our approach returns recall= 100% and precision=50%. While the details of our approach will be presented in Section 5.4, we next define the context and scope of the traceable artifact types and their relations. 75

Tracing source

contributes to mitigates Hazard Fault Requirement

Safety property specifies 1

Tracing target verifies

1..* State machine 1..* 1 incoming 1 outgoing

1..* 1..* 1..* Transition Effect

Trigger Guard

Signal event Time event Change event

Figure 5.2: Traceability information contextualizing the artifacts and relations relevant to our approach.

5.3 Traceability Information Model

Strategically, defining a traceability information model (TIM) is key to the development of safety-critical systems [92]. A TIM explicitly records what artifacts are important and what others are not under the current traceability considera- tion. In addition, the traceable artifacts’ relations are expressed in the TIM. In practice, depicting the planned, permitted trace paths in a TIM offers at least two benefits [91]: 76 • As tracing is a complex task, a TIM provides a guideline to ease its set up and allows for the validation of changes; and

• As traceability is also used by people who did not create it, these people need to know how it has been defined and what to expect from it.

Figure 5.2 presents the TIM underlying our work. While “requirement” is a central artifact type, the “tracing source” of Figure 5.2 shows it is the mitigation of “hazard”-contributing “fault” that gives rise to this specific type of functional safety “requirement”. Referring to the example mentioned earlier, “automatic robotic arm stoppage” (requirement) is needed to mitigate the “velocity sensor failure” (fault), which in turn contributes to the danger of “moving the patient’s arm at an excessive velocity” (hazard) [53, 92]. This shows the human-centric nature surrounding the “tracing source” of Figure 5.2, as the causal chain of reasoning involved in this therapeutic robotic arm case requires domain knowledge and relevant expertise. Methods like fault tree analysis (FTA) [128] and failure modes and effects analysis (FMEA) [142] can facilitate but cannot replace the manual work in safety requirements engineering. Since requirements engineering must span the gap between the informal world of stakeholder needs and the formal world of software systems behavior, the key question over the use of is not whether to formalize, but when to formalize [112]. SysML’s SMD, practiced in the context of model-driven engineering, embraces the formalization of systems behavior in the design. We thus use one such formal method (namely, model checking [29]) to link the “tracing source” and “tracing target” of Figure 5.2. In our TIM, each SMD design “verifies” one or more “safety properties”. These properties are derived from the functional safety 77 requirements and are formulated into temporal logic formulas amenable to model checking. Our TIM places the derivation and formulation of the safety properties inside “tracing source”, emphasizing the human-centric nature of these activities. On the “tracing target” side of Figure 5.2 is the SMD, which is typically used in SysML to model the behavior of critical components, such as hardware, software, data, personnel, procedures, and facilities [113]. When inspecting functional safety does not require an entire SMD design, it becomes valuable to identify the specific model elements (i.e., the subset of states and their transitions). This subset represents a model slice of a whole SMD. In this chapter, we refer to the subset as the trace and the elements of the subset as the traceability links with respect to a give safety requirement. Although tracing is aimed at identifying the specific states and transitions, Figure 5.2 shows that the SMD semantics are defined also by “trigger”, “guard”, and “effect”. In our running example of Figure 5.1, water temp = 100 is a “change event” and shutdown command is a “signal event”, both triggering the water distiller to alter its behavior. A “time event”, for instance, could trigger the scheduled maintenance at noon, September 1, 2019 (not shown in Figure 5.1), and in many occasions, “guard” like !sludge ok specifies the condition that must be true for a transition to happen. Finally, “effect” in Figure 5.2 represents an action invoked directly on the object that owns the state machine as a result of transitioning into a state [113], e.g., open drain or shut valves. The TIM of Figure 5.2 delineates our focuses, e.g., tracing a single safety property over multiple SMDs or over different types of SysML models like internal block and activity diagrams is beyond our current scope. With manual effort in specifying safety properties, our objective is to automatically and accurately slice 78

Figure 5.3: Overview of our mutation-driven traceability approach where mutants are created by modifying the tracing target in small ways to mimic typical modeling errors ( 1 ); mutants are then model checked ( 2 ) to identify the slice-trace ( 3 ). the SMD design to find the traceability links for the critical requirements.

5.4 Mutation-Driven Traceability

We tackle the low precision challenge faced by contemporary tracing algorithms from a new angle: Rather than striving for defining an accurate tracing mechanism which often ends up with many imperfect links, our core idea is to create many imperfect tracing targets and then take full advantage of them to discover the links. These imperfect tracing targets are mutants of the SMD design, and our entire approach shown in Figure 5.3 is driven by them. An important check is performed before the mutants are generated. This is represented by the decision node (diamond) in Figure 5.3. For a safety property P, making sure that the to-be-traced SMD M satisfies P is of great practical value. In 79 our running example of the water distiller, if the human analyst writes the following LTL formula:

[] ((state == “Level low” → !(<> (state == “Level low”)))) (5.1)

trying to express that, “it is always ([]) the case once the water level is low it will eventually (<>) not be low,” then the SMD of Figure 5.1 fails to satisfy this property. A counterexample, . . . s4 → s5 → s4 . . . , shows the looping structure in the SMD design, and thus formula (5.1) fails the LTL model checking. In these situations, the human analyst shall refine P or M, as shown in Figure 5.3. To capture the safety requirement of our running example, a new property is written:

[] ((state == “Level low” → (<> (state != “Level low”) /\

(state == “Level low”) ∪ (state != “Level low”)))) (5.2)

to assert: “once the water level is low, it always becomes not low eventually after being low for some time.” This property is now met by the SMD of Figure 5.1, leading our approach to the creation of SMD mutants (step 1 of Figure 5.3). The main reason of satisfying P on M is that, if M fails P, then M already does not implement the requirement, so no tracing should be performed. Our automated implementation is built with the help of the ProM tool [123], especially its LTL model checker operated on the logged events in a .csv file. In 80 Figure 5.3, model checking P on both M and M’ is therefore performed with ProM. Our own implementations include a Python script to mutate M in its xmi form and a diff procedure to generate the candidate traceability links. Next, we discuss in more detail the three major steps of our approach shown in Figure 5.3.

5.4.1 Creating SMD Model Mutants

In software testing, mutants are results of deliberately seeding faults into the original program. The mutants can then be used to assess the quality of a test set: the more faults detected (or the more mutants killed), the more effective the test set. In mutation testing, only faults constructed from several simple syntactic changes are applied. A key tenet here is that [42]: “Test data that distinguishes all programs differing from a correct one by only simple errors is so sensitive that it also implicitly distinguishes more complex errors.” One of the first set of mutation operators was implemented in the Mothra system [78] and contained 22 operators, ranging from logical connector replacement to statement deletion. At any rate, mutants of a program are created based on a few simple faults representing the mistakes that programmers often make [76]. Extending mutation analysis, we survey the literature to identify the mistakes commonly made in SMD modeling [4, 6]. Our survey focuses on practices over sizable models relevant to critical domains. We also favor the mistakes reported in common by different studies. We define 15 mutation operators and list them in Table 5.1. These operators are drawn from the SMD modeling mistakes discussed in five papers. Table 5.2 maps the sources with the operators. We group the 15 mutation operators into five categories according to the “tracing target” of our TIM in Figure 5.2. For “state”, “transition”, “guard”, “trigger”, and 81

Table 5.1: State Machine Diagram (SMD) Mutation Operators

Category ID Mistake Description Sample Mutation Operation on the SMD of Figure 5.1 adding an “ensuring off” state (s0) right after the initial mo1 a state is subsumed by another state state causes s0 to be subsumed by s1

state mo2 a state that should be modeled is missing removing s5

mo3 a state has incorrect transition(s) adding a self-looping transition to s1

mo4 a transition comes from or goes into a wrong place changing the direction of t5, i.e., flipping t5

mo5 a transition that should be modeled is missing removing t9 transition adding a “!water level low” transition (t16) from s6 to s5 mo6 a transition is subsumed by another transition causes t16 to be subsumed by t8

mo7 a transition is modeled without trigger removing “shutdown command” on t13

mo8 a guard has incorrect condition changing “!sludge ok” to “sludge ok” on t9 guard adding “humidity ok” to t9 with “humidity” undefined in mo9 a guard refers to an undefined variable SysML’s block definition diagram

mo10 expression is incorrect changing “water temp = 100” to “water temp < 100” on t4 changing scheduled maintenance from “noon, September mo11 time event is incorrect 1, 2019” to “every 30 minutes” (not shown in Figure 5.1) trigger changing “power on command” to “power off command” mo12 signal event is incorrect on t2

mo13 change event is incorrect changing “water temp = 100” to “water temp != 100” on t4 adding “open humidifier” to s6 with “humidifier” undefined mo14 effect refers to an undefined variable effect in SysML’s block definition diagram

mo15 state invariant, do, entry and/or exit are incorrect changing “open drain” to “close drain” on s7

“effect”, there exist three, four, two, four, and two operators respectively. These categories show where to mutate, whereas the “mistake description” column of Table 5.1 explains how to mutate. The rightmost column of Table 5.1 illustrates each mutation operator with a sample operation performed on the SMD design of the water distiller running example. Similar to mutating a program, the SMD mutation operators of Table 5.1 are syntactic modifications of insertion (adding), replacement (changing), or deletion (removing). Different from mutating a program that is textual, we automatically mutate the graphical SMD by first exporting the model into an xmi file. We perform this step in the Cameo MagicDraw tool [110]. 82

Table 5.2: Mutation Operators of Table 5.1 Grounded in the Literature of SMD Modeling Domain Mutation Source (model size) Operators Aichernig Automotive (19 states, mo8, mo10, mo11, et al. [1] 39 transitions) mo15 Video Conferencing (13 mo —mo , mo , states, 18 transitions) & 1 6 8 Ali et al. [10] mo , mo , mo , Elevator Control (10 10 12 13 mo states, 14 transitions) 15 mo —mo , mo , Briand Production Cell System 2 5 7 mo , mo —mo , et al. [22] (8 states, 10 transitions) 8 11 13 mo15 Choppy and Library System (6 mo1, mo2, mo7, Reggio [28] states, 9 transitions) mo9, mo10, mo15 Mi and Control System (32 mo1, mo3—mo6, Ben [98] states, 45 transitions) mo8, mo13, mo15

… Figure 5.4: Event log snippet showing: (1) the SMD of Figure 5.1 (top records whose case ID=“original”), (2) the mutant resulted from flipping t5 (shaded records whose case ID=“mo4 t5”), and (3) the syntactic change of t5 flipping (dotted box). 83 We developed a Python script to modify M.xmi by removing an existing model element or changing its syntactic property. Due to the rather complex subsumption

relations involved in mo1 and mo6, they are not implemented in our current Python script. Each resulting M’.xmi corresponds to one single syntactic change in one location (i.e., applying only one mutation operator), though the same operator

at one location may generate more than one mutant, e.g., mo10 applied to t4 (“water temp = 100”) of Figure 5.1 outputs five mutants by replacing “=” with “<”, “≤”, “!=”, “≥”, and “>”. Our current SMD mutation implementation is trying to be comprehensive as our goal is to use M’ for tracing; selective mutation is investigated experimentally in Section 5.5. Our tool building is informed by our state-of-the-practice survey on in-place traceability for engineering automated production systems [153]. Moreover, our tool [3] has successfully converted the mutated xmi into an event log fully compatible with the ProM process mining tool.

5.4.2 Verifying Model Mutants

Once the SMD mutants are created, they undergo model checking so as to automatically verify the safety property P. An innovative aspect of our implemen- tation is to leverage LTL model checking within process mining (i.e., the ProM tool [123, 147]). Process mining employs data mining algorithms to extract op- erational knowledge from event logs [146]. These event logs record instances (or “cases”) of some underlying process (e.g., that of granting sabbatical), but automat- ically extracting that process is difficult when there is a lot of flexibility [147]. Our model mutants are a good fit to process mining in that flexibility of each mutant is restricted to a single, simple, and syntactic change over the original SMD M. Figure 5.4 shows a sample event log recording our running example’s SMD, and 84

for comparison purposes, one mutant’s records are also displayed (namely, mo4 applied to t5). Figure 5.4 highlights the way that our implementation uses “case ID” to group all the activities belonging to the same SMD. With this “case ID” mechanism, we convert M’.xmi into an event log file M’.csv without including the original SMD model M in it. Although process mining techniques such as the alpha algorithm can extract one model directly from M’.csv in forms like a Petri net [146], we are interested in obtaining two models based on ProM’s LTL model checking of P on M’.csv [147]: one underlying all the killed mutants (K) and the other for all the survived ones (S). We mark a mutant is killed if its model checking fails (i.e., the injected fault causes P to no longer be satisfied2); otherwise, the mutant is survived. One of the killed mutants in the running example is “flipping t5” shown in the bottom of Figure 5.4. Compared to M that allows the water level to be low for some time before not being low eventually, the “flipping t5” mutant fails to meet the safety property P expressed in formula (5.2). For instance, a traversal containing “. . . s4 → s5 → s4 → s7 . . . ”, which is permitted in M, is no longer valid due to the injected fault. Thus, model checking P on M’ effectively distinguishes the faults directly violating the safety property from the remaining faults whose negative effects are not observed via automated verification.

5.4.3 Identifying Slice-Trace

Recognizing killed versus survived mutants allows for our approach’s final step to identify the candidate traceability links. For a given requirements specification

2Recall that the original SMD M satisfies P with the same ProM-based LTL model checking mechanism; otherwise, no mutant will be created. Such a control is elaborated by the decision node of Figure 5.3. 85 Algorithm 1 Identifying model elements from the SMD design to be the candidate traceability links. Input: original SMD M, process model of killed mutants K, process model of survived mutants S Output: set of candidate traceability links L

Procedure 1. L ← all the states s of M 2. For each pair of states ∈ M, with correlation K () > 0 3. If correlation K() > threshold K AND correlation K() − correlation S() > 1 4. mark both states {si, sj} with “remove” 5. Else 6. Mark both states {si, sj} with “do-not-remove” 7. L ← L \ states being marked AND being marked only with “remove” 8. Return L

expressed in LTL, we refer to its links as the set of corresponding states in the SMD. Algorithm1 presents our steps to slice M by contrasting K and S. We rely on ProM’s correlation analysis over a mined process model3: For a pair of states , a correlation score between −1 and 1 is produced, and if the score is greater than 0, less than 0, or equal to 0, then it indicates the co-occurrence of si and sj is strong, weak, or unknown respectively. Figure 5.5 visualizes the correlation analysis results over our running example’s K and S. We illustrate our slice-trace identification with the correlation analysis results of Figure 5.5. To maintain a high recall value, our tracing algorithm initializes L with all the states of M, i.e., after line #1 of Algorithm1, L=s1, s2, . . . , s9. We then check the pair of states having only positive correlation in the killed mutants

3ProMs correlation calculation automatically decides four relations between each state pair : (i) sj directly follows si , (ii) sj sometimes follows si but never the other way around, (iii) sj sometimes follows si and sometimes the other way around, and (iv) si and sj do not follow each other [145]. 86

(a) Killed mutants K

(b) Survived mutants S

Figure 5.5: Correlation analysis of the running example’s SMD mutants (black cell shows the correlation is unknown).

(K) and ignore all the other pairs. The rationale is to focus only on the commonly occurred state-pair causing the property to fail. For Figure 5.5a, line #2 selects

the following pairs to examine: , , , , ,

, , , , , and . For each of the above selected state pairs, we mark both states with “remove” if the two conditions shown in line #3 of Figure 5 are met:

• The correlation score of in K is greater than a positive degree (i.e.,

threshold K), implying that si and sj tend to co-occur in the killed mutants; 87 and

• Such co-occurrence is significantly weaker among the survived mutants S.

We operate the latter condition by checking if the difference between the correlation score of in K and that in S is larger than 1, as this difference is sufficient to reverse the correlation strength in the [−1, 1] scale. Note that if has a correlation score of 0 (unknown co-occurrence) in either K or S, then line #3 of

Algorithm1 is evaluated to be false, i.e., {si, sj} are marked with “do-not-remove”. For Figure 5.5, the results of adopting threshold K=0.5 as in ProM are:

• s1 is marked with “remove” from and with “remove” from ;

• s2 is marked with “remove” from and with “remove” from ;

• s3 is marked with “remove” from and with “remove” from ;

• s4 is marked with “remove” from and with “remove” from ,

and with “do-not-remove” from ;

• s5 is marked with “do-not-remove” from , with “remove” from

, and with “do-not-remove” from ;

• s6 is marked with “do-not-remove” from , with “do-not-remove” from

, with “do-not-remove” from , and with “do-not-remove”

from ;

• s7 is marked with “do-not-remove” from , and with “remove” from

; 88

• s8 is marked with “remove” from , with “remove” from ,

and with “remove” from ; and

• s9 is marked with “remove” from .

Having made the marks on the examined states, we remove from L those states received marks and only “remove” marks. Line #7 of Algorithm1 therefore removes s1, s2, s3, s8, s9 and line #8 returns L=s4, s5, s6, s7. The candidate traceability links returned by our algorithm lead to recall to recall=100% and precision=75%; however, this performance is achieved only for tracing the property of formula (5.2) to the SMD design of Figure 5.1. The next section evaluates our approach quantitatively.

5.5 Experimental Evaluation

5.5.1 Research Questions

We set out to answer three research questions.

RQ1: How accurate is our proposed mutation-driven traceability approach?

While overcoming the low precision challenge is our primary goal, we do not

want our approach to hurt recall. Our measures for RQ1 also include F1 which is the harmonic mean of recall and precision defined as:

P recision × Recall F 1 = 2 × (5.3) (P recision) + Recall

We use the state-of-the-art BS [81] and FS [103] algorithms introduced in Section 5.2 89

Table 5.3: Subject System Characteristics (integers represent total numbers whereas decimal numbers represent the averages)

Tracing Source Tracing Target states per Subject System properties variables states transitions properties hazards faults req.s SMD answer set per req. per property per SMD per SMD per SMD Adaptive Cruise 3 6 10 2.90 2.09 11 13.63 21.81 2.72 4.40 Control (ACC) Power System (PS) 5 11 17 1.88 2.28 15 14.25 17.33 2.07 4.85

Figure 5.6: One SMD design of the adaptive cruise control (ACC) under our study.

as baselines for accuracy comparisons.

RQ2: How to best operate our approach in practical settings?

As our automated implementation is built upon the ProM tool, we investigate here the influence of threshold K value on tracing accuracy. Discovering an optimal threshold range can readily transfer our approach into ProM tooling, especially in terms of delivering a new traceability service with a calibrated correlation analysis.

RQ3: How can selective mutation be instrumented in our approach?

Because mutation’s cost is not negligible, reducing effort is of practical value. 90

RQ3 thus examines one way toward selective mutation informed by feature abla- tion [55]: by removing the mutation operators in a specific category (e.g., “state”), we are interested in how the tracing accuracy changes.

5.5.2 Subject Systems

Our experiments are carried out in the context of two subject systems from the automotive domain. We choose these systems due to the relevant discussions of the safety requirements and the availability of the SMD design models.

• Adaptive Cruise Control (ACC) [100] of a vehicle consists of several components that interact in real time. A critical component of the ACC is the speed controller whose function is to take over the task of maintaining a constant speed at the driver’s request. Once the speed controller is adjusted, the ACC is activated and supports the throttle control. After the ACC is activated, it can be suspended and restarted by the driver through pressing the suspend/resume button, or the brake pedal. While suspended, the system must memorize the desired speed. In ACC, the user data and the sensor data are read at the input. These data are used to set the new value of the desired speed, which will be compared to the current speed. The result of this comparison is used to define the adjustment value of actuator output.

• Power System (PS) [137] of a vehicle consists of mechanical and electronic components. Among the critical parts under PS’s control are the gearbox and the switch between gears based on input from multiple sensors as well as data provided by the control module of the engine. The control module of the PS then processes these inputs to calculate how and when to shift gears in the transmission 91 and generates the signals that drive actuators to perform this shift.

The traceability-related characteristics of the two subject systems are shown in Table 5.3. On the “tracing source” side, we manually performed a FMEA analysis and identified a few hazards and several contributing faults of those hazards. We further derived functional requirements to mitigate the faults and formulated LTL properties based on the safety requirements. In ACC, for instance, []!(speed>160 ∧ state==“set speed”) assures the cruise value cannot be set while the vehicle’s speed is high, which mitigates the risk of accident. The “tracing target” parts of Table 5.3 show that the average size of the SMD designs of ACC and PS is about 14 states and 17–22 transitions. Compared to the SMD sizes listed in Table 5.2, our subject systems’ models are similar to the models studied in Ali et al. [10], and notably the video conferencing models of [10] were successfully applied at Cisco for the purpose of robustness testing. One of the representative SMD designs of ACC is shown in Figure 5.6. For each SMD in our study, Table 5.3 shows that around 2–3 safety properties are verified via model checking. Finally, a software and systems expert working in the safety engineering domain who has more than 15 years of industrial experience manually constructed the answer set for each of the checked properties. This was done not as a vetting task [37, 94] but as an independent design review task without input from any automated traceability tools. The rightmost column of Table 5.3 suggests that, averagely speaking, only about one third of the states are the real traces. The main objective of automated tracing methods is therefore to identify all the real traces and only the real traces. 92

Table 5.4: Tracing Accuracy (BS refers to backward slicing [81], FS refers to forward slicing [103], and MD refers to our mutation-driven approach)

ACC PS recall precision F1 recall precision F1 BS 92.3% 39.1% 54.90% 74.5% 45.8% 56.70% FS 100% 37.6% 54.65% 100% 34.1% 50.85% MD 100% 48.9% 65.68% 100% 50.6% 67.19%

5.5.3 Results

The answers to RQ1 on tracing accuracy are summarized in Table 5.4 where the average recall, precision, and F1 values are reported. The BS algorithm [81] misses a few true links in ACC and quite some in PS. One reason is that, when a state has multiple outgoing transitions, slicing backward from the exit state often fails to cover those branches that require forward tracing. For example, in PS’s tracing of a liveness property ensuring braking is applied when RPM is close to 4000, BS is unable to reach the braking paths of the SMD and covers only the paths by following RPM’s define-use dependencies. In contrast, the FS algorithm [103] achieves 100% recall based on its reachability analysis; however, its results are noisy by not halting the forward propagation early enough. Our approach does not rely on structural dependencies and therefore does not face the challenges of which direction to slice and when to stop slicing. As shown by the comparison with FS in Table 5.4, using model checking to dynamically verify the SMD and its mutants improves the average precision by over 10% without compromising the tracing coverage at recall=100%.

The key question RQ2 addresses is which value threshold K of Algorithm1 should be in practical settings when no answer set is available. To this end, we 93 90 ACC PS 80 70 60 50 40

30 precision 20 10 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1

threshold_K

Figure 5.7: Calibrating threshold K of our approach. calibrate threshold K and measure how it impacts the tracing accuracy. As re- call is maintained at the 100% level throughout the calibration, Figure 5.7 plots the impacts only on precision. Using ProM’s default value of 0.5 to identify the positively correlated states from the killed mutants K turns out not to be threshold K’s optimal value. Figure 5.7 suggests threshold K’s range of [0.6, 0.8] further improves the average precision to 75.4% for ACC and 72.8% for PS. An explanation is that higher values of threshold K facilitates the condition, correla- tion K()−correlation S()>1, to be met; however, too high of a value will cause few states to satisfy: correlation K()>threshold K. There- fore, to fully capitalize on our mutation-driven traceability approach, one can try to set threshold K∈[0.6, 0.8], especially practiced in the ProM tool chain.

RQ3 is aimed at exploring ways to reduce the cost of our approach. As our current implementation favors comprehensiveness, close to one million mutants are generated for the 26 SMD models in our study. In our experiments, creating a million mutants with our Python script took about 9 days, and model checking them within ProM took roughly 2 hours. If less numbers of mutants could deliver comparable tracing accuracies, then the findings will have both theoretical and 94 practical implications. To investigate selective mutation empirically, we adopt the idea of feature ablation. In machine learning, feature ablation is designed to assess the informativeness of a feature group by quantifying the change in predictive power when comparing the performance of an approach trained with all the feature groups versus the performance without a particular feature group [55]. Following the SMD mutation operators defined in Table 5.1, we remove the category (feature group) one at a time and the ablation results are shown in Figure 5.8. When all the operators are applied in ACC and PS, the average precision is at the 74% level with the optimal threshold K for each subject system. Using the same optimal values of threshold K, Figure 5.8 depicts the precision drop as well as the drop in the number of mutants. Unsurprisingly, these two drops are proportional: the less number of mutants generated, the lower precision the tracing results. This reinforces the cost-benefit tradeoff: the more savings in cost, the worse the performance becomes. Surprisingly, even though the performance is measured on finding the subset of states, removing “state” operators results in the least precision decrease. This shows the importance of tracing context, i.e., in order to find the model elements (e.g., subset of the states), it could be more cost-effective to mutate other elements (especially “transition” and/or “trigger” according to Figure 5.8) which provide the semantics of the interested ones (“states”).

5.5.4 Threats to Validity

We discuss some of the most important factors that must be considered when interpreting our experimental results. A threat to construct validity is our choice of measuring the tracing accuracy based only on the relevant states; in particular, we exclude the transitions in the answer set definition. The states and their transitions 95

100 precision (%) # mutants 1,200,000

80

800,000 60

40 400,000

20

0 0 All State Transition Guard Trigger Effect removed removed removed removed removed

Figure 5.8: Ablation results of removing one and only one category of mutation operators.

are clearly related, and our rationale is not to over-penalize an automated and scalable method for missing correct model elements or returning incorrect ones. The feedback from the expert devising the answer set recommends using transitions as anchoring constructs because “they inject the state machine with life stories.” A threat to internal validity concerns the quality of the safety properties expressed in LTL formulas. As we manually performed this task, we relied on the decision node of “model checking P on M” in Figure 5.3 to make sure the SMD design models [100, 137] met the LTL properties. A confounding factor is the number of variables (e.g., “speed”, “RPM”, etc.) expressed in the properties, as the variables play a key role in our operations of the BS [81] and FS [103] algorithms. Table 5.3 shows that each LTL property contains an average of two variables; however, the tracing accuracy of BS, FS, and our own approach is likely to be influenced by this number and our results must be interpreted with this in mind. As far as the external validity is concerned, our experimental subjects are both drawn from the automotive domain, and therefore applying our approach to systems 96 in other safety-critical domains will be valuable.

5.6 Summary

In this chapter, we have presented a mutation-driven approach to tracing safety requirements and SMD in SysML modeling. Not only is the conceptual framework depicted, but an automated implementation based on model checking within process mining is developed. Experimental results show the precision improvements. The novelty of our work lies in our creation of many imperfect tracing targets, leading to new ways of establishing traceability links. The next chapter will conclude the thesis, describe the limitations, and discuss future work. 97

Chapter 6

Conclusions and Future Work

In this chapter, we summarize our main contributions, describe current limita- tions, and provide directions for future work.

6.1 Thesis Summary

After reviewing the background and related work in Chapter2, and presenting our empirical investigation of using textual cues in the context of SysML modeling and the limitations of such approaches in Chapter3, we present the following contributions:

• In Chapter4, we conducted a systematic literature mapping of SysML common modeling mistakes. From 19 primary studies, we identified 42 mistakes and showed which SysML diagrams are subject to the modeling mistakes. Since our mapping study emphasizes industrial relevance, we adopted a hierarchy from our earlier literature review to assess the evidence level of all the selected primary studies. This hierarchy ranges from “no evidence” and “evidence 98 obtained from working out toy examples” on the weaker end to“evidence obtained from industrial case studies” and “evidence obtained from industrial practice” on the stronger end. Moreover, we discussed the implications of our findings to model-driven requirements engineering.

• In Chapter5, we presented our mutation-driven approach to tracing safety requirements and SMD in SysML modeling. Since there are many artifacts to trace, we first defined a traceability information model that underlies our work. TIM explicitly records what artifacts are important and what others are not under the current traceability consideration. M¨ader et al. [92] provided an actionable checklist to best practice requirements traceability in safety-critical projects. Among the checklist’s ten items, our approach explicitly considers six by clearly defining the TIM, offering tool support, and generating traces as slices. Our approach has three main components: creating mutants, verifying model mutants via model checking, and identifying trace links. The identified mistakes of SMD in Chapter4 are translated into mutation operators and then used to create mutants. We developed a tool that automatically create mutants of SMD in its xmi form. We then converted all these mutants to an event log where each mutant is represented by a case ID. We then leveraged LTL model checking within process mining to verify safety requirements. Checking the requirements over the mutants leads to the distinction between the killed and survived mutants. We leveraged the underlying killed-survived distinction and developed a correlation analysis procedure to identify the traceability links. Finally, we evaluated our approach using two subject systems from the automotive domain. 99 6.2 Limitations

In this section, we highlight the limitations of our work.

• Our approach focuses on identifying the trace links between a safety require- ment and SMD. Tracing requirements may span to different SysML diagrams, e.g., activity diagram. We have not yet attempted to include other behavioral (e.g., activity diagrams) or structural diagrams (e.g., block definition dia- grams). This limitation can be addressed by translating an activity diagram to an event log. As for structural diagrams, some effort is needed to translate them to a format acceptable by process mining tools. One could include the blocks in the event logs during translation. For example, a block in the block definition diagrams that executes the SMD can be included in the event log by extracting the package element (i.e., block name) in the xmi file which owns the SMD under test.

• Currently, the scope of our approach is to trace safety requirement to a single SMD. We believe the scope of work can be readily extended in future work to include multiple SMDs.

• Some requirements cannot be translated to LTL. Our intention is to focus only on the functional safety requirements that can be converted to an LTL formulas.

• Formalizing safety properties is often a non-trivial task. Although the use of model checking approaches has been shown to improve the dependability of high assurance systems, practitioners have yet to adopt them. One reason is that expressing software properties into LTL formulas can become difficult to 100 write and understand. However, Dwyer et al. [44] addressed this challenge concerning practitioners unfamiliar with temporal logic notations. To facilitate this process, the authors developed a pattern-based approach [97] to the presentation, codification and reuse of property specifications. Moreover, a method for analyzing the semantic roles of safety patterns, derived from OMG SysML specification, has been proposed to support practitioners in formulating safety properties [107].

6.3 Future Directions

In this section, we outline some of the research threads that we intend to follow in our future work. Supporting multiple SMDs and different types of SysML diagrams. One of the shorter-term focuses of our future research is to extend our approach to include multiple and different SysML diagrams. This is to support safety inspectors to trace requirements to not only state-based design models but to different types of SysML diagrams. Supporting visualization. We plan to improve the visualization of trace links in a form of a model fragment. Process mining based conformance checking approaches could be performed on both the survived and killed process models. For example, a process mining discovery for the survived process could be applied. The survived model is then “repaired” such that the model becomes more precise for the “killed” process. Such algorithms could help remove the model elements that are not impacted from the model. Improving usability. Tracing is a means, but not the end. We believe the 101 theories encapsulated in the survived and killed mutants will help discover relevant environmental assumptions; both positive assumptions from the requirements satisfaction cases and negative assumptions from the requirements violation cases. Our aim is to develop data mining algorithms such as decision trees where many decision points could be discovered for both positive and negative assumptions. 102

Bibliography

[1] B. K. Aichernig, J. Auer, E. J¨obstl,R. Korosec, W. Krenn, R. Schlick, and B. V. Schmidt. Model-based mutation testing of an industrial measurement device. In International Conference on Tests and Proofs (TAP), pages 1–19, York, UK, July 2014.

[2] B. K. Aichernig, H. Brandl, E. J¨obstl,W. Krenn, R. Schlick, and S. Tiran. Killing strategies for model-based mutation testing. Software Testing, Verifi- cation & Reliability, 25(8):716–748, December 2015.

[3] M. Alenazi, N. Niu, and J. Savolainen. A process mining approach to improving defect detection of SysML models. In International Conference on Automated Software Engineering (ASE): Late Breaking Results (LBR) Track, San Diego, CA, USA, November 2019.

[4] M. Alenazi, N. Niu, and J. Savolainen. SysML modeling mistakes and their impacts on requirements. In 27th IEEE International Requirements Engineering Conference Workshops, RE 2019 Workshops, Jeju Island, Korea (South), September 23-27, 2019, pages 14–23. IEEE, 2019.

[5] M. Alenazi, N. Niu, and J. Savolainen. A novel approach to tracing safety requirements and state-based design models. In International Conference on 103 Software Engineering (ICSE’20), pages 848–860, Seoul, South Korea, July 2020.

[6] M. Alenazi, N. Niu, and J. Savolainen. Data of SysML modeling mistakes. http://dx.doi.org/10.7945/sz4r-zx36, Last accessed: Jan 2021.

[7] M. Alenazi, N. Niu, W. Wang, and A. Gupta. Traceability for automated production systems: A position paper. In 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW), pages 51–55. IEEE, 2017.

[8] M. Alenazi, N. Niu, W. Wang, and J. Savolainen. Using obstacle analysis to support sysml-based model testing for cyber physical systems. In A. Moreira, G. Mussbacher, J. Ara´ujo,and P. S´anchez, editors, 8th IEEE International Model-Driven Requirements Engineering Workshop, MoDRE@RE 2018, Banff, AB, Canada, August 20, 2018, pages 46–55. IEEE Society, 2018.

[9] M. Alenazi, D. Reddy, and N. Niu. Assuring virtual-PLC in the context of SysML models. In International Conference on Software Reuse (ICSR), pages 121–136, Madrid, Spain, May 2018.

[10] S. Ali, T. Yue, and L. C. Briand. Does aspect-oriented modeling help improve the readability of uml state machines? Software and System Modeling, 13(3):1189–1221, July 2014.

[11] V. Alves, N. Niu, C. F. Alves, and G. Valen¸ca. Requirements engineering for software product lines: A systematic literature review. Information & Software Technology, 52(8):806–820, August 2010. 104 [12] P. E. Ammann, P. E. Black, and W. Majurski. Using model checking to generate tests from specifications. In International Conference on Formal Engineering Methods (ICFEM), pages 46–55, Brisbane, Australia, December 1998.

[13] P. E. Ammann, P. E. Black, and W. Majurski. Using model checking to generate tests from specifications. In Proceedings Second International Conference on Formal Engineering Methods, pages 46–54. IEEE, Dec 1998.

[14] K. Androutsopoulos, D. Clark, M. Harman, J. Krinke, and L. Tratt. State- based model slicing: A survey. ACM Computing Surveys, 45(4):53:1–53:36, August 2013.

[15] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 28(10):970–983, October 2002.

[16] O. B. Badreddin, A. Sturm, and T. C. Lethbridge. Requirement traceability: A model-based approach. In A. Moreira, P. S´anchez, G. Mussbacher, and J. Ara´ujo,editors, IEEE 4th International Model-Driven Requirements En- gineering Workshop, MoDRE 2014, 25 August, 2014, Karlskrona, Sweden, pages 87–91. IEEE Computer Society, 2014.

[17] N. Balakrishnan. An overview of system safety assessment. In Dependability in Medicine and Neurology, pages 33–81. Springer, 2015.

[18] M. E. Beato, M. Barrio-Sol´orzano, C. E. Cuesta, and P. de la Fuente. UML au- tomatic verification tool with formal methods. Electronic Notes in Theoretical , 127(4):3–16, 2005. 105 [19] P. Bhavsar, P. Das, M. Paugh, K. Dey, and M. Chowdhury. Risk analysis of autonomous vehicles in mixed traffic streams. Transportation Research Record: Journal of the Transportation Research Board, (2625):51–61, 2017.

[20] M. Borg, P. Runeson, and A. Ard¨o.Recovering from a decade: A systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering, 19(6):1565–1616, December 2014.

[21] C. Brecher, J. A. Nittinger, and A. Karlberger. Model-based control of a handling system with SysML. Procedia Computer Science, 16:197–205, 2013.

[22] L. C. Briand, D. Falessi, S. Nejati, M. Sabetzadeh, and T. Yue. Trace- ability and SysML design slices to support safety inspections: A controlled experiment. ACM Transactions on Software Engineering and Methodology, 23(1):9:1–9:43, February 2014.

[23] F. P. Brooks. The Mythical Man-Month. Addison-Wesley, 1975.

[24] M. Chami, P. Oggier, O. Naas, and M. Heinz. MBSE AT BOMBARDIER TRANSPORTATION. https://www.nomagic.com/mbse/images/images, 2015. Last accessed: Jan 2021.

[25] Change Vision. Astah SysML. http://astah.net/editions/sysml, 2018. Last accessed: Jan 2021.

[26] J. Chen, M. Goodrum, R. A. Metoyer, and J. Cleland-Huang. How do practitioners perceive assurance cases in safety-critical software systems? In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE), pages 57–60, Gothenburg, Sweden, May 2018. 106 [27] R. Chillarege. Orthogonal Defect Classification. McGraw-Hill, 1996.

[28] C. Choppy and G. Reggio. A method for developing uml state machines. In ACM Symposium on Applied Computing (SAC), pages 382–388, Honolulu, HI, USA, March 2009.

[29] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 2001.

[30] J. Cleland-Huang. Safety stories in agile development. IEEE Software, 34(4):16–19, July/August 2017.

[31] J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker. A machine learning approach for tracing regulatory codes to product specific requirements. In International Conference on Software Engineering (ICSE), pages 155–164, Cape Town, South Africa, May 2010.

[32] J. Cleland-Huang, O. Gotel, J. H. Hayes, P. M¨ader, and A. Zisman. Software traceability: Trends and future directions. In Future of Software Engineering (FOSE), pages 55–69, Hyderabad, India, May-June 2014.

[33] J. Cleland-Huang, O. C. Gotel, J. Huffman Hayes, P. M¨ader,and A. Zisman. Software traceability: trends and future directions. In Proceedings of the on Future of Software Engineering, pages 55–69. ACM, 2014.

[34] J. Cleland-Huang, M. Heimdahl, J. Hayes, R. R. Lutz, and P. M¨ader.Trace queries for safety requirements in high assurance systems. In International Working Conference on Requirements Engineering: Foundation for (REFSQ), pages 179–193, Essen, Germany, March 2012. 107 [35] J. Cleland-Huang and M. Vierhauser. Discovering, analyzing, and managing safety stories in agile projects. In International Requirements Engineering Conference (RE), pages 262–273, Banff, Canada, August 2018.

[36] H. Crisp. INCOSE systems engineering vision 2020. Technical report, INCOSE-TP-2004-004-02, September, 2007.

[37] D. Cuddeback, A. Dekhtyar, and J. H. Hayes. Automated requirements traceability: the study of human analysts. In International Requirements Engineering Conference (RE), pages 231–240, Sydney, Australia, September- October 2010.

[38] A. Czauderna, J. Cleland-Huang, M. C¸ inar, and B. Berenbach. Just-in- time traceability for mechatronics systems. In Second IEEE International Workshop on Requirements Engineering for Systems, Services, and Systems- of-Systems, RESS 2012, Chicago, IL, USA, September 25, 2012, pages 1–9. IEEE Computer Society, 2012.

[39] J. L. de la Vara, A. Ruiz, K. Attwood, H. Espinoza, R. K. Panesar-Walawege, A.´ L´opez, I. del R´ıo,and T. Kelly. Model-based specification of safety com- pliance needs for critical systems: A holistic generic metamodel. Information & Software Technology, 72:16–30, April 2016.

[40] A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Apply- ing a smoothing filter to improve IR-based traceability recovery processes: an empirical investigation. Information & Software Technology, 55(4):741–754, April 2013.

[41] A. De Lucia, R. Oliveto, and G. Tortora. IR-based traceability recovery 108 processes: an empirical comparison of “one-shot” and incremental processes. In ASE, pages 39–48, L’Aquila, Italy, September 2008.

[42] R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. IEEE Computer, 11(4):34–41, April 1978.

[43] Z. Dong and P. Zhang. Emerging techniques in power system analysis. Springer, 2010.

[44] M. B. Dwyer, G. S. Avrunin, and J. C. Corbett. Patterns in property specifications for finite-state verification. In B. W. Boehm, D. Garlan, and J. Kramer, editors, Proceedings of the 1999 International Conference on Software Engineering, ICSE’ 99, Los Angeles, CA, USA, May 16-22, 1999, pages 411–420. ACM, 1999.

[45] Eclipse. Case study. https://eclipse.org/papyrus/resources/sherpa- usecasestory.pdf, 2016. Last accessed: Jan 2021.

[46] E. N. Efthimiadis. Query expansion. Annual review of and technology (ARIST), 31:121–87, 1996.

[47] S. A. Elavarasi, J. Akilandeswari, and K. Menaga. A survey on semantic similarity measure. International Journal of Research in Advent Technology, 2(3):389–398, 2014.

[48] E. R. Eras, L. B. R. dos Santos, V. A. de Santiago J´unior,and N. L. Vi- jaykumar. Towards a wide acceptance of formal methods to the design of safety critical software: an approach based on UML and model checking. 109 In International Conference on Computational Science and Its Applications, pages 612–627. Springer, 2015.

[49] S. Feldmann, M. Wimmer, K. Kernschmidt, and B. Vogel-Heuser. A com- prehensive approach for managing inter-model inconsistencies in automated production systems engineering. In IEEE International Conference on Au- tomation Science and Engineering, CASE 2016, Fort Worth, TX, USA, August 21-25, 2016, pages 1120–1127. IEEE, 2016.

[50] M. Fockel and J. Holtmann. A requirements engineering methodology com- bining models and controlled natural language. In A. Moreira, P. S´anchez, G. Mussbacher, and J. Ara´ujo,editors, IEEE 4th International Model-Driven Requirements Engineering Workshop, MoDRE 2014, 25 August, 2014, Karl- skrona, Sweden, pages 67–76. IEEE Computer Society, 2014.

[51] S. Friedenthal, A. Moore, and R. Steiner. Water distiller example using functional analysis. In S. Friedenthal, A. Moore, and R. Steiner, editors, A Practical Guide to SysML (Second Edition), pages 393–429. The MK/OMG Press, 2012.

[52] S. Friedenthal, A. Moore, and R. Steiner. OMG SysML tuto- rial. https://user.eng.umd.edu/˜austin/enes489p/lecture-resources/SysML- Friedenthal-Tutorial-INCOSE2006.pdf, Last accessed: Jan 2021.

[53] A. Frisoli, L. Borelli, A. Montagner, S. Marcheschi, C. Procopio, F. Salsedo, M. Bergamasco, M. C. Carboncini, M. Tolaini, and B. Rossi. Arm reha- bilitation with a robotic exoskeleleton in virtual reality. In International 110 Conference on Rehabilitation Robotics (ICORR), pages 631–642, Noordwijk, The Netherlands, June 2007.

[54] N. Gali, R. Mariescu-Istodor, and P. Fr¨anti. Similarity measures for title matching. In Pattern Recognition (ICPR), 2016 23rd International Conference on, pages 1548–1553. IEEE, 2016.

[55] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 580–587, Columbus, OH, USA, June 2014.

[56] W. H. Gomaa and A. A. Fahmy. A survey of text similarity approaches. International Journal of Computer Applications, 68(13), 2013.

[57] M. Goodrum, J. Cleland-Huang, R. R. Lutz, J. Cheng, and R. A. Metoyer. What requirements knowledge do developers need to manage change in safety- critical systems? In International Requirements Engineering Conference (RE), pages 90–99, Lisbon, Portugal, September 2017.

[58] O. Gotel, J. Cleland-Huang, J. H. Hayes, A. Zisman, A. Egyed, P. Gr¨unbacher, A. Dekhtyar, G. Antoniol, J. I. Maletic, and P. M¨ader.Traceability funda- mentals. In J. Cleland-Huang, O. Gotel, and A. Zisman, editors, Software and Systems Traceability, pages 3–22. Springer, 2012.

[59] O. Gotel and A. Finkelstein. An analysis of the requirements traceability problem. In ICRE, pages 94–101, Colorado Springs, CO, USA, April 1994.

[60] O. Gotel and A. Finkelstein. An analysis of the requirements traceability 111 problem. In International Conference on Requirements Engineering (ICRE), pages 94–101, Colorado Springs, CO, USA, April 1994.

[61] R. B. Grady. Software failure analysis for high-return process improvement decisions. Hewlett-Packard Journal, pages 2:1–2:12, August 1996.

[62] M. F. Granda, N. Condori-Fern´andez,T. E. Vos, and O. Pastor. A model-level mutation tool to support the assessment of the test case quality. In Complexity in Information Systems Development, pages 17–37. Springer, 2017.

[63] M. F. Granda, N. Condori-Fern´andez,T. E. J. Vos, and O. Pastor. What do we know about the defect types detected in conceptual models? In International Conference on Research Challenges in Information Science (RCIS), pages 88–99, Athens, Greece, May 2015.

[64] M. F. Granda, N. Condori-Fern´andez,T. E. J. Vos, and O. Pastor. Using model checking to generate tests from specifications. In International Con- ference on Information Systems Development (ISD), pages 17–37, Katowice, Poland, August 2016.

[65] J. Guo, J. Cheng, and J. Cleland-Huang. Semantically enhanced software traceability using deep learning techniques. In International Conference on Software Engineering (ICSE), pages 3–14, Buenos Aires, Argentina, May 2017.

[66] L. E. Hart. Introduction to model-based system engineering (mbse) and sysml. In Delaware Valley INCOSE Chapter Meeting, volume 30. Ramblewood Country Club Mount Laurel, New Jersey, 2015.

[67] A. M. Hass. Guide to Advanced Software Testing. Artech House, 2014. 112 [68] J. Hayes, A. Dekhtyar, and J. Osborne. Improving requirements tracing via information retrieval. In International Requirements Engineering Conference (RE), pages 138–147, Monterey Bay, CA, USA, September 2003.

[69] J. H. Hayes and A. Dekhtyar. A framework for comparing requirements tracing experiments. International Journal of Software Engineering and Knowledge Engineering, 15(5):751–782, October 2005.

[70] J. H. Hayes, A. Dekhtyar, and J. Osborne. Improving requirements tracing via information retrieval. In Requirements Engineering Conference, 2003. Proceedings. 11th IEEE International, pages 138–147. IEEE, 2003.

[71] J. H. Hayes, A. Dekhtyar, and S. K. Sundaram. Advancing candidate link generation for requirements tracing: the study of methods. IEEE Transactions on Software Engineering, 32(1):4–19, January 2006.

[72] J. Holtmann, J. Stegh¨ofer,M. Rath, and D. Schmelter. Cutting through the jungle: Disambiguating model-based traceability terminology. In T. D. Breaux, A. Zisman, S. Fricker, and M. Glinz, editors, 28th IEEE International Requirements Engineering Conference, RE 2020, Zurich, Switzerland, August 31 - September 4, 2020, pages 8–19. IEEE, 2020.

[73] IBM. Rational Rhapsody Architect for systems engineers. http://www- 03.ibm.com/software/products/en/ratirhaparchforsystengi, 2018. Last ac- cessed: Jan 2021.

[74] IEEE Standard Board. IEEE standard classification for software anoma- lies. https://standards.ieee.org/standard/1044-2009.html, Last accessed: Jan 2021. 113 [75] International Electrotechnical Commission. Functional safety of electrical / electronic / programmable electronic safety-related system (iec 61508).

https://www.iec.ch/functionalsafety/, 2010. Last accessed: Jan 2021.

[76] Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering, 37(5):649– 678, September/October 2011.

[77] H. H. Kagdi, J. I. Maletic, and A. M. Sutton. Context-free slicing of UML class models. In 21st IEEE International Conference on (ICSM 2005), 25-30 September 2005, Budapest, Hungary, pages 635–638. IEEE Computer Society, 2005.

[78] K. N. King and J. Offutt. A fortran language system for mutation-based software testing. Software: Practice and Experience, 21(7):685–718, July 1991.

[79] B. A. Kitchenham, T. Dyb˚a,and M. Jørgensen. Evidence-based software engineering. In International Conference on Software Engineering (ICSE), pages 273–281, Edinburgh, UK, May 2004.

[80] A. Knapp and S. Merz. Model checking and code generation for UML state machines and collaborations. Proc. 5th Wsh. Tools for System Design and Verification, pages 59–64, 2002.

[81] B. Korel, I. Singh, L. H. Tahat, and B. Vaysburg. Slicing of state-based models. In International Conference on Software Maintenance (ICSM), pages 34–43, Amsterdam, The Netherlands, September 2003. 114 [82] W. Krenn, R. Schlick, S. Tiran, B. Aichernig, E. Jobstl, and H. Brandl. MoMut:: UML model-based mutation testing for UML. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), pages 1–8. IEEE, 2015.

[83] D. Latella, I. Majzik, and M. Massink. Automatic verification of a behavioural subset of UML statechart diagrams using the SPIN model-checker. Formal Aspects of Computing, 11(6):637–664, 1999.

[84] W.-S. Lee, D. L. Grosh, F. A. Tillman, and C. H. Lie. Fault tree analy- sis, methods, and applications a review. IEEE transactions on reliability, 34(3):194–203, 1985.

[85] N. G. Leveson, M. Heimdahl, H. Hildreth, and J. D. Reese. Requirements specification for process-control systems. IEEE Transactions on Software Engineering, 20(9):684–707, September 1994.

[86] J. Lilius and I. P. Paltor. vUML: A tool for verifying UML models. In 14th IEEE International Conference on Automated Software Engineering (ASE), pages 255–258. IEEE, 1999.

[87] S. Liu, Y. Liu, J. Sun, M. Zheng, B. Wadhwa, and J. S. Dong. USMMC: a self-contained model checker for UML state machines. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 623–626. ACM, 2013.

[88] F. Lorber, K. G. Larsen, and B. Nielsen. Model-based mutation testing of real-time systems via model checking. In 2018 IEEE International Conference 115 on Software Testing, Verification and Validation Workshops (ICSTW), pages 59–68. IEEE, 2018.

[89] P. M¨aderand J. Cleland-Huang. A visual traceability modeling language. In International Conference on Model Driven Engineering Languages and Systems (MoDELS), pages 226–240, Oslo, Norway, October 2010.

[90] P. M¨aderand J. Cleland-Huang. A visual language for modeling and executing traceability queries. Software and System Modeling, 12(3):537–553, July 2013.

[91] P. M¨ader,O. Gotel, and I. Philippow. Getting back to basics: Promoting the use of a traceability information model in practice. In International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE), pages 21–25, Vancouver, Canada, May 2009.

[92] P. M¨ader,P. L. Jones, Y. Zhang, and J. Cleland-Huang. Strategic traceability for safety-critical projects. IEEE Software, 30(3):58–66, May/June 2013.

[93] A. Mahmoud and N. Niu. On the role of semantics in automated requirements tracing. Requirements Engineering, 20(3):281–300, 2015.

[94] S. Maro, J.-P. Stegh¨ofer,J. Hayes, J. Cleland-Huang, and M. Staron. Vetting automatically generated trace links: What information is useful to human analysts? In International Requirements Engineering Conference (RE), pages 52–63, Banff, Canada, August 2018.

[95] S. Maro, J.-P. Steghofer, E. Knauss, J. Horkof, R. Kasauli, R. Wohlrab, J. L. Korsgaard, F. Wartenberg, N. J. Strøm, and R. Alexandersson. Managing traceability information models: Not such a simple task after all? IEEE Software, 2020. 116 [96] M. S. Martis. Validation of simulation based models: a theoretical outlook. The electronic journal of business research methods, 4(1):39–46, 2006.

[97] Matthew Dwyer. Temporal Specification Patterns. https://matthewbdwyer.github.io/psp/, Last accessed: Jan 2021.

[98] L. Mi and K. Ben. A method of software specification mutation testing based on uml state diagram for consistency checking. Procedia Engineering, 15:110–114, 2011.

[99] Modeliosoft. Modelio open source modeling environment. https://www.modelio.org/categories/about-modelio2.htm, 2018. Last accessed: Jan 2021.

[100] V. M. Monthe, L. Nana, G. E. Kouamou, and C. Tangha. A decision support framework for the choice of languages and methods for the design of real time embedded systems. Journal of Software Engineering and Applications, 9:353–397, 2016.

[101] P. Morrison, R. Pandita, X. Xiao, R. Chillarege, and L. Williams. Are vulnerabilities discovered and resolved like other defects? Empirical Software Engineering, 23(3):1381–1421, June 2018.

[102] B. Napole˜ao,K. R. Felizardo, E.´ F. de Souza, and N. L. Vijaykumar. Practical similarities and differences between systematic literature reviews and system- atic mappings: A tertiary study. In International Conference on Software Engineering and Knowledge Engineering (SEKE), pages 85–90, Pittsburgh, PA, USA, July 2017. 117 [103] S. Nejati, M. Sabetzadeh, C. Arora, L. C. Briand, and F. Mandoux. Auto- mated change impact analysis between SysML models of requirements and design. In International Symposium on Foundations of Software Engineering (FSE), pages 242–253, Seattle, WA, USA, November 2016.

[104] S. Nejati, M. Sabetzadeh, C. Arora, L. C. Briand, and F. Mandoux. Au- tomated change impact analysis between SysML models of requirements and design. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 242–253. ACM, 2016.

[105] S. Nejati, M. Sabetzadeh, M. Chechik, S. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In International Conference on Software Engineering (ICSE), pages 54–64, Minneapolis, MN, USA, May 2007.

[106] S. Nejati, M. Sabetzadeh, D. Falessi, L. C. Briand, and T. Coq. A SysML- based approach to traceability management and design slicing in support of safety certification: Framework, tool support, and case studies. Information & Software Technology, 54(6):569–590, June 2012.

[107] N. Niu, L. Johnson, and C. Diltz. Safety patterns for SysML: What does OMG specify? In Reuse in Emerging Software Engineering Practices - 19th International Conference on Software and Systems Reuse, ICSR 2020, Hammamet, Tunisia, December 2-4, 2020, Proceedings, volume 12541 of Lecture Notes in Computer Science, pages 19–34. Springer, 2020.

[108] N. Niu, W. Wang, and A. Gupta. Gray links in the use of requirements 118 traceability. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 384–395. ACM, 2016.

[109] N. Niu, W. Wang, A. Gupta, M. Assarandarban, L. D. Xu, J. Savolainen, and J. C. Cheng. Requirements socio-technical graphs for managing practitioners’ traceability questions. IEEE Trans. Comput. Soc. Syst., 5(4):1152–1162, 2018.

[110] No Magic. Cameo Systems Modeler MagicDraw. https://www.nomagic.com/products/cameo-systems-modeler, Last ac- cessed: Jan 2021.

[111] No Magic. Case studies. https://www.nomagic.com/mbse/resources/case- studies.html, Last accessed: Jan 2021.

[112] B. Nuseibeh and S. Easterbrook. Requirements engineering: A roadmap. In Future of Software Engineering (FOSE), pages 35–46, Limerick, Ireland, June 2000.

[113] Object Management Group. Systems Modeling Language (SysML). http://www.omgsysml.org, Last accessed: Jan 2021.

[114] Y. Oh, J. Yoo, S. Cha, and H. S. Son. Software safety analysis of function block diagrams using fault trees. & System Safety, 88(3):215–228, 2005.

[115] R. Oliveto, G. Antoniol, A. Marcus, and J. H. Hayes. Software artefact trace- ability: the never-ending challenge. In 23rd IEEE International Conference 119 on Software Maintenance (ICSM 2007), October 2-5, 2007, Paris, France, pages 485–488. IEEE Computer Society, 2007.

[116] R. Oliveto, M. Gethers, D. Poshyvanyk, and A. De Lucia. On the equivalence of information retrieval methods for automated traceability link recovery. In International Conference on Program Comprehension (ICPC), pages 68–71, Braga, Portugal, June 2010.

[117] R. K. Panesar-Walawege, M. Sabetzadeh, and L. C. Briand. Using model- driven engineering for managing safety evidence: Challenges, vision and experience. In International Workshop on Software Certification (WoSoCER), pages 7–12, Hiroshima, Japan, November-December 2011.

[118] A. Panichella, A. De Lucia, and A. Zaidman. Adaptive user feedback for IR-based traceability recovery. In SST, pages 15–21, Florence, Italy, May 2015.

[119] T. Pedersen, S. Patwardhan, and J. Michelizzi. Wordnet:: Similarity: mea- suring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004, pages 38–41. Association for Computational Linguistics, 2004.

[120] Peter Lieber. SparxSystems CE: Modular design with Enterprise Archi- tect. https://community.sparxsystems.com/case-studies/1099-pantec, Last accessed: Jan 2021.

[121] PivotPoint Technology. SysML Modeling Tool Reviews. https://sysml.tools/review-sparx-ea, Last accessed: Jan 2021.

[122] D. M. Powers. What the f–measure doesnt measure. Technical report, Tech- 120 nical report, Beijing University of Technology, China & Flinders University, Australia, 2014.

[123] Process Mining Group, Eindhoven University of Technology. ProM Tools.

http://www.promtools.org, 2016. Last accessed: Jan 2021.

[124] A. Raninen, T. Toroi, H. Vainio, and J. J. Ahonen. Defect data analysis as input for software process improvement. In International Conference on Product-Focused Software Process Improvement (PROFES), pages 3–16, Madrid, Spain, June 2012.

[125] P. Rempel, P. M¨ader,T. Kuschke, and J. Cleland-Huang. Mind the gap: Assessing the conformance of software traceability to relevant guidelines. In International Conference on Software Engineering (ICSE), pages 943–954, Hyderabad, India, May-June 2014.

[126] C. Robinson-Mallett. An approach on integrating models and textual specifica- tions. In Second IEEE International Workshop on Model-Driven Requirements Engineering, MoDRE 2012, Chicago, IL, USA, September 24, 2012, pages 92–96. IEEE Computer Society, 2012.

[127] E. Ruijters and M. Stoelinga. Fault tree analysis: A survey of the state-of- the-art in modeling, analysis and tools. Computer science review, 15:29–62, 2015.

[128] E. Ruijters and M. Stoelinga. Fault tree analysis: A survey of the state-of-the- art in modeling, analysis and tools. Computer Science Review, 15-16(3):29–62, February-May 2015. 121 [129] M. Sabetzadeh, S. Nejati, L. C. Briand, and A.-H. E. Mills. Using SysML for modeling of safety-critical software-hardware interfaces: guidelines and industry experience. In International Symposium on High-Assurance Systems Engineering (HASE), pages 193–201, Boca Raton, FL, USA, November 2011.

[130] R. Saini, G. Mussbacher, J. L. C. Guo, and J. Kienzle. Towards queryable and traceable domain models. In T. D. Breaux, A. Zisman, S. Fricker, and M. Glinz, editors, 28th IEEE International Requirements Engineering Conference, RE 2020, Zurich, Switzerland, August 31 - September 4, 2020, pages 334–339. IEEE, 2020.

[131] N. Sannier and B. Baudry. Toward multilevel textual requirements traceability using model-driven engineering and information retrieval. In Second IEEE International Workshop on Model-Driven Requirements Engineering, MoDRE 2012, Chicago, IL, USA, September 24, 2012, pages 29–38. IEEE Computer Society, 2012.

[132] W. Sch¨aferand H. Wehrheim. The challenges of building advanced mecha- tronic systems. In International Conference on the Future of Software Engi- neering (FOSE), pages 72–84, Minneapolis, MN, USA, May 2007.

[133] W. Sch¨aferand H. Wehrheim. The challenges of building advanced mecha- tronic systems. In Future of Software Engineering (FOSE), pages 72–84, Minneapolis, MN, USA, May 2007.

[134] W. A. F. Silva, I. F. Steinmacher, and T. U. Conte. Is it better to learn from problems or erroneous examples? In International Conference on Software 122 Engineering Education and Training (CSEE&T), pages 222–231, Savannah, GA, USA, November 2017.

[135] T. Slimani. Description and evaluation of semantic similarity measures approaches. arXiv preprint arXiv:1310.8059, 2013.

[136] G. Spanoudakis and A. Zisman. Software traceability: a roadmap. Handbook of Software Engineering and Knowledge Engineering, 3:395–428, 2005.

[137] Z. Strolia and S. Pavalkis. Building exe-

cutable sysml model. https://blog.nomagic.com/ building-executable-sysml-model-automatic-transmission-system-part-1/, 2017. Last accessed: Jan 2021.

[138] Z. Strolia and S. Pavalkis. Building executable SysML model automatic transmission system (part 1), 2017. https://blog.nomagic.com/building- executable-sysml-model-automatic-transmission-system-part-1/.

[139] H. Sun, M. Chen, M. Zhang, J. Liu, and Y. Zhang. Improving defect detection ability of derived test cases based on mutated uml activity diagrams. In IEEE Annual Computer Software and Applications Conference (COMPSAC), pages 275–280, Atlanta, GA, USA, June 2014.

[140] H. Sun, M. Chen, M. Zhang, J. Liu, and Y. Zhang. Improving defect detection ability of derived test cases based on mutated UML activity diagrams. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), volume 1, pages 275–280. IEEE, 2016.

[141] M. Unterkalmsteiner. Early requirements traceability with domain-specific taxonomies - A pilot experiment. In T. D. Breaux, A. Zisman, S. Fricker, 123 and M. Glinz, editors, 28th IEEE International Requirements Engineering Conference, RE 2020, Zurich, Switzerland, August 31 - September 4, 2020, pages 322–327. IEEE, 2020.

[142] U.S. Department of Defense. Procedures for performing a failure mode

effect and criticality analysis (mil-std-1629a). http://www.fmea-fmeca.com/ milstd1629.pdf, 1980. Last accessed: Jan 2021.

[143] T. Vale, E. S. de Almeida, V. Alves, U. Kulesza, N. Niu, and R. de Lima. Software product lines traceability: A systematic mapping study. Information & Software Technology, 84:1–18, April 2017.

[144] G. Valen¸ca,C. F. Alves, V. Alves, and N. Niu. A systematic mapping study on business process variability. International Journal of Computer Science & Information Technology, 5(1):1–21, February 2013.

[145] W. Van Der Aalst, K. M. Van Hee, and K. van Hee. Workflow management: models, methods, and systems. MIT press, 2004.

[146] W. M. van der Aalst and K. van Hee. Workflow Management: Models, Methods, and Systems. MIT Press, 2004.

[147] W. M. P. van der Aalst, H. T. de Beer, and B. F. van Dongen. Process mining and verification of properties: An approach based on temporal logic. In International Conferences “On the Move to Meaningful Internet Systems” (OTM), pages 130–147, Agia Napa, Cyprus, October-November 2005.

[148] A. Van Lamsweerde and E. Letier. Handling obstacles in goal-oriented require- ments engineering. IEEE Transactions on software engineering, 26(10):978– 1005, 2000. 124 [149] H. Wang, D. Zhong, T. Zhao, and F. Ren. Integrating model checking with SysML in complex system safety analysis. IEEE Access, 7:16561–16571, 2019.

[150] W. Wang, F. Dumont, N. Niu, and G. Horton. Detecting software security vulnerabilities via requirements dependency analysis. IEEE Transactions on Software Engineering, 2020.

[151] W. Wang, A. Gupta, N. Niu, L. Xu, J.-R. C. Cheng, and Z. Niu. Automatically tracing dependability requirements via term-based relevance feedback. IEEE Transactions on Industrial Informatics, 14(1):342–349, 2018.

[152] W. Wang, N. Niu, M. Alenazi, J. Savolainen, Z. Niu, J. C. Cheng, and L. D. Xu. Complementarity in requirements tracing. IEEE Trans. Cybern., 50(4):1395–1404, 2020.

[153] W. Wang, N. Niu, M. Alenazi, and L. D. Xu. In-place traceability for automated production systems: A survey of PLC and SysML tools. IEEE Trans. Industrial Informatics, 15(6):3155–3162, 2019.

[154] C. Wohlin, P. Runeson, M. H¨ost,M. C. Ohlsson, B. Regnell, and A. Wessl´en. Experimentation in software engineering. Springer Science & Business Media, 2012.

[155] F. Xie, V. Levin, and J. C. Browne. Model checking for an executable subset of UML. In Proceedings 16th Annual International Conference on Automated Software Engineering (ASE), pages 333–336. IEEE, 2001.

[156] E. Yu. Towards modeling and reasoning support for early-phase requirements engineering. In International Symposium on Requirements Engineering (RE), pages 226–235, Annapolis, MD, USA, January 1997.