Building Confidence in Scientific Computing Software Via Assurance Cases

Spencer Smith, Mojdeh Sayari Nejad, Alan Wassyng Computing and Software Department, McMaster University. 1280 Main Street West, Hamilton, Ontario, Canada, L8S 4K1

Abstract Assurance cases provide an organized and explicit argument for correctness. They can dramatically improve the certifi- cation of Scientific Computing Software (SCS). Assurance cases have already been effectively used for safety cases for real time systems. Their advantages for SCS include engaging domain experts, producing only necessary documentation, and providing evidence that can be verified/replicated. This paper illustrates assurance cases for SCS through the correctness case for 3dfim+, an existing Medical Imaging Application (MIA) for analyzing activity in the brain. This example was partly chosen because of recent concerns about the validity of fMRI (Functional Magnetic Resonance Imaging) studies. The example justifies the value of assurance cases for SCS, since the existing documentation is shown to have ambi- guities and omissions, such as an incompletely defined ranking function and missing details on the coordinate system. A serious concern for 3dfim+ is identified: running the software does not produce any warning about the necessity of using data that matches the parametric statistical model employed for the correlation calculations. Raising the bar for SCS in general, and MIA in particular, is both feasible and necessary when software impacts safety, an assurance case methodology (or an equivalently rigorous confidence building methodology) should be employed. Keywords: assurance cases, software quality, software requirements specification, scientific computing, medical imaging software

1. Introduction 1. The external body cannot rely on all their staff hav- ing deep expertise in the physical problem the soft- Are we currently putting too much trust in the quality ware simulates or analyses, or in the numerical tech- of Scientific Computing Software (SCS)? For instance, for niques employed. This tends to lead the external Functional Magnetic Resonance Imaging (fMRI), concerns body to request a large quantity of documentation, exist with respect to the statistical analysis models com- as shown in standards for medical software [5]. monly employed [28, 9]. Since medical professionals use 2. SCS developers tend to dislike documentation. Sci- the output of fMRI, and other Medical Imaging Appli- entists do not view rigid, process-heavy approaches, cations (MIA), for diagnosis and treatment planning, we favourably [3]. Moreover, they often consider reports need to have confidence in the software. Often the devel- for each stage of software development as counter- opers of SCS, such as MIA, are medical physicists, scien- productive [30, p. 373]. Although documentation is tists and engineers, not software engineers. Although SCS generated when required, the normal work flow has developers do excellent work, are there currently enough the documentation as a “necessary evil” at the end checks and balances, from a software development per- of the process. spective, for confidence in correctness? The usual ap- 3. Historically when software engineers work with sci- proach employed when correctness is critical is to impose arXiv:1912.13308v1 [cs.SE] 31 Dec 2019 entists there are challenges for communication and requirements for official software certification, where the collaboration [19, 32]. goal for certification is to: “...systematically determine, 4. Conventional documentation (requirements, design, based on the principles of science, engineering and mea- etc.) relies on an implicit argument for correctness; surement theory, whether a software product satisfies ac- a tacit assumption is made that completing the doc- cepted, well-defined and measurable criteria” [14, p. 12]. umentation will improve quality, but no explicit ar- Unfortunately, five significant problems exist for SCS in gument is given to show that correctness will be a general, and MIA in particular, in completing a conven- consequence of the documentation. Perhaps more tional certification exercise through an external body, like important, there is no explicit recognition that ad- the Food and Drug Administration (FDA): equate and effective documentation is necessary for correctness of complex software applications. Email addresses: [email protected] (Spencer Smith), 5. Verification of SCS is challenging because of the or- [email protected] (Alan Wassyng) acle problem [20] – testing is difficult because we do

Preprint submitted to Reliability Engineering & Safety Systems January 1, 2020 not always know the expected correct output for a ware, since some of the common fMRI statistical analyses given set of inputs. data have not yet been validated [28] and because a recent study [9] has shown a potentially serious flaw in software A potential solution to these problems is to have the commonly used to analyze fMRI data. More detail on SCS developers create, or partially create, an assurance 3dfim+ can be found in Section 3. case before, or while they develop their software. Assur- The scope of our work does not include redeveloping ance case techniques have been developed and successfully or reimplementing 3dfim+. Our goal is to build an assur- applied for real time safety critical systems [29, 31, 50]. ance case for the existing software by treating it as black An assurance case presents an organized and explicit ar- box. We consider only the executable for 3dfim+ and the gument for correctness (or whatever other software quality existing documentation. We produce new documentation is deemed important) through a series of sub-arguments and testing results, but not new code. Excluding the code and evidence. Putting the argument in the hands of the makes the case study more realistic, since, if an assurance experts means that they will work to convince themselves, case exercise were to be conducted in industry, there would along with the regulators. They will use the expertise be little appetite for reimplementation. Considerable ef- that the regulators may not have; they will be engaged. fort has already gone into writing medical image analysis This engagement will hopefully help bridge the current (and other scientific software); it is not feasible for the chasm between and scientific com- community to start over. puting [19, 46], by motivating scientists toward documen- To argue for the correctness of 3dfim+, we developed tation and correcting the problem of software engineers an assurance case with the top claim of “ Program 3dfim+ failing to meet scientists’ expectations [33]. Significant delivers correct outputs when used for its intended pur- documentation will still likely be necessary, but through pose in its intended environment, and within its assumed assurance cases the developers now decide the content of operating assumptions.” Part of the explicit argument in- the documentation. What is created will be relevant and volved developing a Software Requirements Specification necessary. More details on the current literature on assur- (SRS) document that contains all the necessary informa- ance cases is given in Section 2. tion and mathematical background needed to understand Arguing in favour of assurance cases for SCS does not 3dfim+. This document can be used for validation and imply that SCS developers have not, or do not currently verification activities; sections of it appear many times as treat correctness seriously. They have developed many evidence in our assurance case. The SRS was reviewed by successful theories, techniques, testing procedures and re- a domain expert as part of the assurance case evidence. view processes. In fact, an assurance case will likely use We also developed a test case to illustrate how the results much of the same evidence that SCS developers currently from 3dfim+ can be checked to provide additional evidence use to convince themselves of the correctness of their soft- of correctness. An early version of the full assurance case ware. The difference is that the argument will no longer be for 3dfim+ can be found in Nejad 2017 [24, Appendix B]. ad hoc, or incompletely documented. The argument will Excerpts from the full case are given in Section 4. A brief now be explicitly presented for review by third parties; overview of this work is provided in Smith et al 2018 [42]. we will no longer be implicitly asked to trust the devel- Besides providing a means to illustrate assurance cases oper. The act of creating the assurance case may also lead for MIA, the 3dfim+ example provides an opportunity to the developer to discover subtle edge cases, which would justify the value of assurance cases for certification. Al- not have been noticed with a less rigorous and systematic though no errors were found in the output of the existing approach. This is particularly true when testing is compli- software, the rigour of the proposed approach did lead to cated by the lack of a test oracle. The developer needs to discovering ambiguities and omissions in the existing doc- overcome this challenge and their solution to the problem umentation. Moreover, the assurance case highlighted a should be open to external scrutiny. potential safety concern when running the software itself. While the eventual goal is to develop a template for as- Most importantly, the explicit arguments and artifacts in- surance cases for any SCS, our initial approach is to learn cluded in the assurance case provide evidence that can by first building an assurance case for one particular ex- be independently judged for sufficiency. The specific evi- ample. We ask ourselves what an assurance case would dence for the validation of assurance cases for MIA is given look like for an MIA example, and then assess the poten- in Section 5, while the generalization of the approach for tial value of this assurance case. Our case study focuses other SCS applications is presented in Section 6. on 3dfim+, an MIA software package that supports Func- tional Magnetic Resonance Imaging (fMRI). 3dfim+ was selected because it is a reasonably (approximately 2. Overview of Assurance Cases 1700 lines of code) and easy to understand example of An assurance case is “[a] documented body of evidence medical image analysis software. 3dfim+ also has the ad- that provides a convincing and valid argument that a spec- vantage that testing is straightforward, because indepen- ified set of critical claims about a system’s properties are dent implementations of the calculations exist that provide adequately justified for a given application in a given en- a pseudo-oracle. We targeted medical image analysis soft- vironment” [29, p. 5]. 2

ISSN(Online): 2320-9801 ISSN (Print): 2320-9798

International Journal of Innovative Research in Computer and Communication Engineering

(An ISO 3297: 2007 Certified Organization)

Vol. 3, Issue 10, October 2015

The idea of assurance cases (or safety cases) began af- G: System Goal / Sub-Goal G1 (System top level C: Context Information ter a number of serious accidents, starting with the Wind- Assurance Goal) A: Assumption ST: Strategy to meet goal scale Nuclear Accident in the late 1950s. This incident S: Solution to support goal C1: System Requirement : Remain to be supported was the United Kingdom’s most serious nuclear power ac- Specifications cident [26] and was instrumental in the government setting ST1 (Strategy for G4 (System Sub-Goal to up new safety regulations incorporating assurance cases. Meeting Goal) be addressed later) Although there had not previously been an ignorance of safety concerns, and safety standards and regulatory ap- G2 (System Sub-Goal G3 (System Sub-Goal A1: Assumption proaches had been applied as the norm, the previous ap- supported by Evidence) supported by Evidence) made proaches proved to be insufficient. They lacked interaction between regulators and developers, especially since the de- S1 (Test S2 velopers were often more knowledgeable than the regula- Results) (Simulation tors about the safety of their products. Assurance cases Results)

(safety cases) do not just focus on verifying and validating Fig. 1 Assurance Case with its basic elements the parts, but also on the interaction between the parts Figure 1: A basic GSN structure [11] that may cause something unexpected to emerge. An Assurance Case presents an argument that a system is acceptably safe, secure, reliable, etc. in a given context. Where, a system could be physical or a combination of hardware and software. Based on the system goals identified in Assurance cases have been widely used in the Euro-an Assurance Case, Assurance Case can also be referred as security case, dependability case, and safety case or by pean safety community for over 20 years to ensureother sys- relevant andname as regulatory per goals applicability. approval. While applying such approvals For better clarity, uses, critical engineering decisions and to ensure consistency, it is required to meet some tem safety [21]. They have been applied in industriesminimum requirementsand standards for the contents has and hadstructure a of beneficial an Assurance Case. effect These on minimum system requirements qual- are specified such as aerospace, transportation, nuclear power, andby de- an Internationality, it Standard does ISO/IEC not provide 15026-2:2011. good To present tracking an Assurance of the Case development in a way to make it easy for visualization, understanding and reviewing purpose, following Graphical notation tools are used fence [1]. Other examples include the energy sector, avia- stages, as the compliance with the standards are mostly tion infrastructure, aerospace vehicles, railways, automo- Goal checkedStructuring Notation after (GSN) the systemand development. Once a system is Claims-Arguments-Evidence (CAE) biles, and medical devices, such as pacemakers, and infu- implemented, its documentations must be approved by the sion pumps [29]. Attempts have also been made to developCAE defines regulators. nodes for Claims, This Arguments process and Evidence is lengthy whereasand GSN expensive. uses goal oriented In presentation con- style and defines nodes for Goals (claims), Strategy (arguments) and Solutions (evidence). Both these graphics notations are assurance cases in the security sectors [51]. mostly similartrast,, with some assurance difference of case progression development approach. GSN should follows Top progress –Down approach at least while creating the In North America, the medical domain is showingAssurance an Casein starting parallel with with top level the goal system of the system construction where as CAE (assupports recommended Bottom-UP view starting with evidence to determine the possible claim, while preparing Assurance Case [10]. There is no thumb rule as such to increased interest in assurance cases. Safety cases aredecide con- which by approach the should FDA be [47]),followed, resulting it can be decided in a by traceable, developers based detailed on their choice argu- and information sidered to have “the potential to support healthcare orga-available in handment before for proceeding the desired ahead with propertycreating of Assurance (or properties). Case. Arguments presented Moreover, using GSN can help provide assurance of critical properties of systems, services or organizations (such as safety or security properties). nizations in the implementation of structured and trans-Such argumentsassurance can form a key cases part of takean overall a assurance more direct,Case [11]. flexible Refer figure and 1, which explicit is showing the typical parent systems for patient safety management” [6].structure This of anapproach. Assurance Case Theyrepresented are with flexible Goal Structuring enough Notations. to incorporate all ex- potential is reflected in the Food and Drug Administra-Assurance Caseisting in its simple assurance form basically activities consists of following and artifacts main components. in any step of the tion’s (FDA’s) strong recommendation that manufactur- procedure. With the aid of a template and with experi- Claim or Goal: This is generally some functionality, characteristics, requirement or behavior of the system ers submit a safety assurance case for any new infusion thatence, needs to we be fulfilled. believe Thisthat can include developing all the essential an requirements, assurance functionalities case does and behavior of the pumps [47]. systemnot which necessarily is supposed requireto be met toas ensure much that system additional is fit for use. effort All the as goals/claims people are required to be supported by valid arguments based on valid evidences. The higher level goal/claim can be further Safety cases, and in general assurance cases, require a fear, and it potentially reduces costs, saves time and gives greater freedom in accommodating different standards. clearly articulated argument, supported by evidence.Copyright An to IJIRCCE DOI: 10.15680/IJIRCCE.2015. 0310003 9122 assurance case consists of a claim that we make about the properties of a product, that is then supported by sub- 3. Overview of 3dfim+ claims that are eventually grounded in evidence derived from the product itself and its development. 3dfim+ [49] is a tool in the Analysis of Functional For our work we have chosen the popular Goal Struc- NeuroImages (AFNI) package (https://afni.nimh.nih. turing Notation (GSN), developed by Kelly [22] to make gov/). 3dfim+ analyzes the activity of the brain by com- our arguments clear, easy to read and, hence, easy to puting the correlation between an ideal signal and the mea- challenge. To develop the assurance case, we used Astah sured brain signal. The ideal signal is defined by the user. (http://astah.net/) to create and edit our GSN assur- For instance, the ideal signal could be a square wave, as ance cases. GSN starts with a Top Goal (Claim) that shown in Figure 2. For ease of comparison, the correspond- is then decomposed into Sub-Goals, and terminal Sub- ing value of the measured activity in Figure 2 is scaled be- Goals are supported by Solutions (Evidence). Strategies tween 0 and 1. This figures shows high correlation between describe the rationale for decomposing a Goal or Sub-Goal ideal and measured signals. into more detailed Sub-Goals. There are other constructs Figure 2 shows the ideal signal versus the brain activ- in GSN; a full overview, with examples, can be found in ity for one voxel in the full 3D image of the brain. This Sprigg’s book [44]. Figure 1 shows what an assurance case analysis is completed for every voxel. The results can be might look like, using Goal Structuring Notation (GSN) visualized using the tools in the AFNI. Figure 3 shows the for goals, context and assumptions. AFNI environment, in which we can see the brain from Focusing on the assurance case from the start of a different perspectives with areas of high (negative and pos- project can improve the efficiency of the development pro- itive) correlation highlighted. cess. SCS, such as MIA, is often subject to standardization 3 4. Assurance Case for 3dfim+

We have used the guidance provided in “General Prin- ciples of Software Validation; Final Guidance for Indus- try and FDA Staff” [4] to develop our assurance case. This guide outlines generally recognized validation prin- ciples that are FDA acceptable for the medical software validation. It was prepared by the International Medi- cal Device Regulators Forum (IMDRF) to provide globally harmonized principles concerning medical device software. The general principles document includes software, like 3dfim+, that is itself considered a medical device. The presentation of the assurance case starts with an overview of assurance arguments using GSN. This is fol- lowed by summaries and excerpts from the evidence used to support the argument. The evidence includes the Soft- ware Requirements Specification (SRS), Test Cases, and Figure 2: Ideal signal versus activity of the voxel at position Domain Expert Review. (23,27,22) over time 4.1. Assurance Case Our assurance case consists of many sub-claims, which means it cannot be represented legibly on a single page. Therefore, we will only include a representative subset of the argument, which has been split to separately show the sub-structures. We have to label all parts of the assurance case struc- ture, i.e. all goals, evidence, and contexts, so that our arguments can be discussed and reviewed unambiguously. A number of strategies exist to do this [44, p. 32–33]. For the ease of navigation, we prefer a hierarchical scheme; top goals in each sub-structure are labeled with a word or a letter but without a number (for example G) and then their sub-goals are labeled as G.1, G.2, ... and the subgoals of G.1 and G.2 are labeled, respectively, as G.1.1, G.1.2, ... and G.2.1 and G.2.2, ... and so on. The evidence is la- beled in a similar way. Contexts, strategies, evidence and justifications are labeled alphabetically if more than one context, strategy or justification is used for an argument; Figure 3: AFNI environment and visualizing the active parts for example, C Ga, C Gb, C Gc, ... for contexts and S Ga, of the brain (from https:// commons.wikimedia.org/ wiki/ File: S Gb, S Gc for strategies of the Goal G and so on. AFNI screenshot.png) When splitting a goal into its sub-goals, the rationale behind the choice of sub-goals is explained using strate- As mentioned in Section 2, assurance cases are usually gies. In cases where the rationale is straightforward, it is developed in parallel with the system construction. Given excluded for space consideration. We have confined our- that 3dfim+ already exists, this was not an option for our selves to traditional GSN, although we believe that GSN current case study. This means that we had to be partic- should be augmented to include reasoning, not just a strat- ularly vigilant to avoid problems with confirmation bias. egy for decomposition. The original intent for assurance We did not want to prove correctness of 3dfim+ with a cases was to make the argument explicit. Without the rea- flawed argument. Since we do not have a vested interest soning that demonstrates that sub-goals act as premises in the correctness of 3dfim+, this is less likely to be a prob- for their parent goal, the argument remains implicit. lem than it might generally be. Our initial ignorance of We have defined our top goal as “Program 3dfim+ de- the domain area also helps, since we acquired the domain livers correct outputs when used for its intended use/purpose knowledge in parallel with constructing the assurance case. in its intended environment, and within its assumed op- erating assumptions.” The truth of a claim depends on its context; therefore, we must be explicit about what we mean by each term in our goal statement. We could in- clude the details with the goal statement itself, but then it 4 would be too long and would lose its focus. The solution rectness were combined together in goal G 3C. These qual- is to declare the context separately. We have defined each ities were grouped because, according to some publica- term in the top goal in several contexts. We have also tions, such as “ The Three Cs of Requirements: con- made an assumption that the 3dfim+ will only be used sistency, completeness, and correctness” [52], there is an for its intended purpose in its intended environment. The important relationship between completeness, consistency assumption and contexts are shown in Figure 4. and correctness for software requirements. Improving one As previously done for medical device assurance cases [50], of these three qualities may diminish the others. From we have divided the top goal into four sub-goals, as shown another perspective, correctness is a combination of con- in Figure 5. The first sub-goal (GR) argues for the quality sistency and completeness. So it is important to consider of the documentation of the requirements. To make an these 3 qualities together. The argument for completeness overall argument for correctness, we need a specification is partially based on the argument for the readiness of a to judge correctness against. The second sub-goal (GI) business plan from Spriggs [44, p. 30]. Due to space limi- says that the implementation complies with the require- tations, the full details of this argument are not included ments and the third sub-goal (GBA) states that, to the here. They can be found in [24]. extent possible, the relevant operational assumptions have A sample expansion for the sub-goal of modifiability been identified. The fourth sub-goal GA also relates to as- from GR (Figure 6) is shown in Figure 7. Modifiability is sumptions, claiming that the inputs to 3dfim+ will satisfy a quality attribute of the software architecture that relates the operational assumptions; we need valid input to make to “the cost of change and refers to the ease with which a an argument for the correctness of the output. The strat- software system can accommodate changes” [25]. Modifi- egy section of the GSN diagram presents the reasoning ability generally requires the requirements documentation for decomposing the top-goal in this way: “If the require- to have a coherent and easy-to-use organization with a ments correctly and adequately describe the application table of contents, an index, and explicit cross-referencing. to be built, and the implementation faithfully implements Moreover, requirements should not be redundant and they the requirements, then the only possibility that the ap- must be expressed separately. As for the other qualities, plication does not deliver correct outputs is if known or the argument for modifiability makes use of the generic unknown assumptions are not met. These assumptions evidence template (Figure 8) discussed below. may relate to environmental conditions or usage. We thus The content of the documentation of the requirements have to consider whether we have adequately defined all must be reviewed and verified by domain experts. This relevant assumptions, and whether those assumptions are is particularly important in SCS because of the special satisfied.” role of assumptions. We cannot hope to develop models The top level of the assurance case in Figure 5 does that include all of the complexities of the real world, so the not imply that up-front requirements are needed. This is adopted simplifying assumptions need to be judged for ap- fortunate, since scientists have the view that requirements propriateness. Spriggs [44, p. 37] gives a decomposition for are impossible to determine up-front, since they believe arguments that end with domain expert review. We have that details can only emerge as the work progresses [3, 34]. developed a similar decomposition in our template mod- The assurance case needs requirements, but they can come ules, called GenericEvidence, as shown Figure 8. Gener- out of the development process in any way appropriate for icEvidence is a generic argument. The generic argument the developers. That is, the documentation can be “faked” is often called a “pattern”. “A pattern in this context like it is part of a rational design process [27]. is an argument that applies to a class of things, which The main focus in our assurance case is arguing for you can use as the basis of an argument for a specific in- GR (quality requirements). The decomposition of GR into stance” [44, p. 103]. We have developed this module to its sub-goals is shown in Figure 6. This decomposition is re-use it for several arguments in our assurance case. We based on the IEEE standard 830-1993 [2]. This standard have an argument that a particular quality of the require- states that good documentation of requirements should be ments documentation has been met; the main evidence correct, unambiguous, complete, consistent, ranked for im- items are the acceptance report and the addressed com- portance and/or stability, verifiable, modifiable and trace- ments submitted by the reviewers. If we want to ensure able (J GRa). Using the IEEE resource increases confi- that another quality has been met, we do not want to start dence in the argument and makes it more compelling. Our our argument again from scratch. We prefer to use the sub-goals address correctness, unambiguity, completeness, same module (sub-structure), but bring in a new evalua- consistency, verifiability, modifiability and traceability of tion, comments and sections in the report as evidence. In the requirements documentation. “Ranked for importance that case, we can have the name of the quality in the mod- and/or stability” is excluded from the sub-goals in Figure 6 ule, but publish the argument stating exactly which qual- (as shown in J GRb) because our domain is MIA, where ity is reviewed. For instance, for the sake of completeness, for the software to function properly, all requirements are we verified that all statements made in the original doc- considered of equal importance. This is shown as justifi- umentation are reflected in the new documentation. This cation J GRb in Figure 6. comparison is mentioned as GenericEvidence.3 in Figure 8. The arguments for consistency, completeness, and cor- E GenericEvidence.1 in Figure 8 mentions the accep- 5 Figure 4: Contexts and Assumption in Top Goal tance criteria for reviewers’ resumes. This information is do automated checks, like verify that the measured activi- included in the assurance case to mitigate against the bias ties are positive, but the software can never tell if 3dfim+ problem mentioned in Section 2. Reviewers need to be is the right tool for the job. For instance, the statistical qualified, and we should say what qualified means before model for 3dfim+ is parametric, if a non-parametric model we start looking for a reviewer. When we document the would be more appropriate, the user will have to select assurance case, we verify that the experts satisfy these another tool. Although not currently part of 3dfim+, we criteria. If not, we have to make an argument why they added a warning message, as part of the assurance case, should still be considered experts. This draws attention to that users be explicitly reminded of their responsibilities the fact that there is something “unusual” here that may while running the software, similar to how movies or video typically be overlooked. games with flashing lights warn of the possibility of trig- In Figure 5 we defined GA as “Inputs to 3dfim+ satis- gering seizures. Argument GA demonstrates the value of fies the defined operational assumptions.” Achieving this assurance cases requiring a complete argument. The re- goal partially relies on the software to check if the input is sponsibility of the user can easily be forgotten without an valid, but not all inputs can be validated by the software. explicit coverage requirement to check that no cases are The user of 3dfim+ also has responsibility, in the same missing. sense that an automobile driver has responsibility to oper- ate their vehicle safely. This argument for GA is shown in 4.2. Software Requirements Specification Figure 9. This argument explicitly states that the user has Having a Software Requirements Specification (SRS) responsibility for validating the input. The software can is critical for software validation [4]. The requirements

6 Figure 5: Top Goal of the assurance case and its sub-goals mentioned in the assurance case for goal GR (Figure 6) in [24, Appendix A]. are documented in the SRS. This document is necessary to verify software correctness, since it provides a specifi- 4.2.1. SRS Template cation against which correctness can be judged. As a con- Writing an SRS generally starts with a template, which sequence, the SRS is mentioned in the sub-goals and evi- provides guidelines and rules for documenting the require- dence for several goals. For instance, in Figure 7, for mod- ments. The assurance case supports the need for a tem- ifiability, the SRS is mentioned in goal Modifiable.1. Fig- plate through the modifiability goal (Figure 7) Modifi- ure 8 for generic evidence references the SRS in E Generic able.1.1: “A standard/correct well-structured template has Evidence.4, by calling for a requirements acceptance re- been followed.” Several existing templates contain sugges- port. Goal GA (Figure 9) for the operational assump- tions on how to avoid complications and how to achieve tions imposes several requirements on the SRS, such as in qualities such as verifiability, maintainability and reusabil- E GA.2, which mentions SRS content related to assump- ity [10, 16, 23]. However, no template is universally ac- tions and data constraints. cepted. For the MIA example, the choice was a template Due to space limitations, only some representative ex- specifically designed for scientific software [39, 40], as il- cerpts from the SRS for 3dfim+ can be reproduced here. lustrated via the table of contents shown below. The rec- The excerpts selected are intended to give an overall feel ommended template is suitable for science, because of its for the document and to highlight some areas where the hierarchical structure, which decomposes abstract goals to rigour of the assurance case provides benefits for the doc- concrete instance models, through the support of data def- umentation and software quality. The full SRS is available initions, assumptions and terminology. The document’s

7 Figure 6: GR decomposition structure facilitates its maintenance and reuse [39], by us- iv. Physical System Description ing separation of concerns, abstraction and traceability. v. Goal Statements (b) Solution Characteristics Specification 1. Reference Material i. Assumptions (a) Table of Units ii. Theoretical Models (b) Table of Notations iii. Data Definitions (c) Table of Symbols iv. Instance Models (d) Abbreviations and Acronyms v. Data Constraints 2. Introduction vi. Properties of a Correct Solution (a) Purpose of Document 5. Requirements (b) Scope of Requirements (a) Functional Requirements (c) Organization of Document (b) Non-functional Requirements 3. General System Description 6. Other System Issues (a) System Context 7. Traceability Matrix (b) User Characteristics (c) System Constraints 8. Likely Changes 4. Specific System Description 4.2.2. Goals (a) Problem Description The high level objectives of the software are docu- i. Background mented in the goals (Section 4.a.v of the SRS template ii. Terminology Definition shown in Section 4.2.1). A sample goal for 3dfim+ is: iii. Coordinate Systems 8 9

Figure 7: Argument for modifiability of documentation of requirements Figure 8: Generic evidence module used as a pattern in our assurance case

G1: Estimate the Pearson correlation coefficients between the (best) ideal time series and the fMRI time series Number T1 at each voxel over time. Label Calculating Pearson Correlation Coef- ficient 4.2.3. Assumptions n P ¯ Assumptions (Section 4.b.i of the SRS template) high- (ai−a¯)(bi−b) i=1 light a simplification made for the purpose of the mathe- Equation ρ(A, B) = n n 1 P 2 P 2 [ (ai−a¯) (bi−¯b) ] 2 matical modelling. A significant responsibility of the SRS i=1 i=1 is to document the assumptions. As mentioned above, Description The equation calculates Pearson correlation making the assumptions explicit facilitates expert review. coefficients ρ applied to two datasets A : n Sample assumptions for 3dfim+ include: R and B : Rn both of size n.a ¯ and ¯b are sam- A1: The variables should be either of type interval or ple means (DD1) of A and B, respectively. ratio. ρ is the Pearson correlation coefficient be- tween A and B. Assumptions A1–A5 must A2: There is a linear relationship between the two vari- hold when calculating this correlation. ables. A3: The variables are bivariately normally distributed. 4.2.5. Coordinate Convention Section 4.a.iii of the SRS documents describes the co- 4.2.4. Theoretical Models ordinate system for the fMRI images. This information The theoretical models are sets of governing equations is necessary to make the requirements unambiguous, since or axioms that are used to model the problem described in there are several choices for coordinate system for medical the problem definition section (SRS Section 4.b.ii). Trace- images. As an example, the SRS defines the Anatomical ability exists between the theoretical model and the other Coordinate System, which describes the standard anatom- components of the documentation. For instance, the de- ical position of a human being using 3 orthogonal planes: scription for the T1 (Pearson Correlation Coefficient), which axial/transverse (plane parallel to the ground that sepa- is given below, references the definition for mean (DD1) rates the body into head (superior) and tail (inferior) posi- and several assumptions, including the three listed above. tions), coronal/frontal (plane perpendicular to the ground 10 Figure 9: Argument for inputs satisfying the defined operational assumptions that divides the body into front (anterior) and back (pos- umentation of the rank function. The rank function is terior) positions), and sagittal/median (plane that divides defined in the SRS as a data definition. the body into right and left positions. 3dfim+ uses NIfTI The rank of data points is determined by sorting them data files (https://nifti.nimh.nih.gov/) that store vox- in an ascending order and assigning a value according to els from right to left to create rows, rows from anterior to their position in the sorted list. If ties exist, the average posterior to create slices and slices from superior to infe- of all of the tied positions is calculated as the rank. Math- rior to create volumes. This information was only partially ematically, the rank of element a in dataset A is defined provided in the original documentation for 3dfim+. as follows:

4.2.6. Rank Function rank(a, A): R × Rn → R Calculating the Spearman and Quadrant correlation rank(a, A) ≡ avg(indexSet(a, sort(A))) coefficients [49] requires the use of the rank function. The original documentation for 3dfim+ had incomplete doc- indexSet(a, B): R × Rn → set of N

11 indexSet(a, B) ≡ {j : N|j ∈ [1..|B|] ∧ Bj = a : j} took a considerable amount of time to achieve, because the coordinate systems conventions for Matlab and AFNI sort(A): Rn → Rn are different. Since this information was not documented sort(A) ≡ B : Rn, such that in the original 3dfim+ manual, we were unaware of this ∀(a : R|a ∈ A : ∃(b : R|b ∈ B : b = a) ∧ count(a, A) = subtly. The coordinate system description for 3dfim+ was count(b, B)) ∧ ∀(i : N|i ∈ [1..|A| − 1] : Bi ≤ Bi+1) added to the SRS (as described in Section 4.2.5) after our struggles with achieving test case agreement. count(a, A): R × Rn → N In the case of 3dfim+ a pseudo oracle was available. count(a, A) : +(x : N|x ∈ A ∧ x = a : 1) For other MIA and SCS software, other techniques may be needed for the verification that the implementation avg(C) : set of N → R matches the requirements [36]. Where appropriate, use avg(C) ≡ +(x : N|x ∈ C : x)/|C| can be made of the Method of Manufactured Solutions [30] and metamorphic testing [17]. For testing purposes, the The above equations use the Gries and Schneider no- slower, but guaranteed correct, interval arithmetic [15] can tation [13, p. 143] for set building and evaluation of an be used to ensure that calculated answers lie within the operator applied over a set of values. Specifically, the ex- guaranteed bounds. Verification tests can also include pression (∗x : X|R : P ) means application of the operator plans for convergence studies. The discretization used ∗ to the values P for all x of type X for which range R is in the numerical algorithm should be decreased (usually true. In the above equations, the ∗ operators ∀, ∃ and + halved) and the change in the solution assessed. Although are used. Using this formal notation, we can ensure that not used in the current example, verification can also use cases are not left out in the documentation. non-testing techniques, such as code walkthroughs, code inspections, and correctness proofs etc. [12, 48]. 4.3. Test Cases To verify the implementation of 3dfim+, we developed 4.4. Domain Expert Review test cases based on the functional requirements documented An important piece of evidence for an assurance case is in the SRS. The results of the test cases are used as ev- the domain expert review. Review of the SRS is important idence for goal GI (Figure 5), which argues that the im- to reach a common understanding between the software plementation matches the SRS. Since our case study is engineers and scientists. As mentioned in the introduction for SCS, verification through testing is challenging. The (Section 1), building an assurance case facilitates bridging source of the challenge is that SCS differs from most other the gap between software engineers and scientists. The do- software because the quantities of interest are continuous, main expert also addresses the oracle problem outlined in as opposed to discrete. As shown for the calculation of the the introduction. Since the correct answer is not known in Pearson correlation coefficient (Section 4.2.4), the inputs general, an expert is needed to determine whether the test- and outputs are continuously valued real variables. Val- ing and other evidence is sufficient for building sufficient idating the requirements is difficult because there are an confidence in the software. infinite number of potential input values, many of which Domain expert review appears in our assurance case cannot be represented as floating point numbers. In gen- as “Domain experts/customers approve the of eral for SCS, the correct value for the output variable is the documentation of the requirements.” This corresponds unknown. That is, SCS problems typically lack a test or- to GenericEvidence.2.3 in Figure 8. To ensure our SRS acle [20]. Fortunately for this MIA example, 3dfim+, the is of high quality, a task-based inspection approach was correlation calculations are based on finite sets of real num- used [18, 22]. For the review process we assigned a set bers, so constructing a pseudo oracle using Matlab was of tasks asking questions about each section of the SRS. relatively straightforward. We used Github (https://github.com/) issue tracking We developed one Matlab test case per each functional for assigning the tasks and for discussion. Two sample requirement, to compare their results with the results of review questions are reproduced below. 3dfim+. As an example, we had a test case to check the correctness of the Pearson correlation coefficient, which is Q7: Please let us know if all symbols in Theoretical Model one of the main functionalities of 3dfim+. We used our T1 (from Section 4.2.4) are defined. Is enough infor- Matlab pseudo oracle and AFNI to visualize the results mation provided that you could calculate the Pear- and obtain the indices of voxels. Our input consisted of son correlation coefficient if you are given datasets 180 frames of 64×64×28 images. In this test case, we A and B. found the minimum and the maximum Pearson correlation coefficients and their locations. For this test and others, Q10: Please let us know if Data Definition DD4 (Rank we achieved the same results for both 3dfim+ and our Function) (from Section 4.2.6) is explained clearly independently developed Matlab script. or needs any additional information. Please let us However, the testing was not without its challenges. know if the notation we are using for this function is Agreement between the Matlab pseudo oracle and 3dfim+ clear and understandable. 12 A domain expert that completed the review for 3dfim+ written for researchers, not for clinicians. The users for has a degree in engineering and over 10 years experience 3dfim+ and readers of its documentation are likely to be in medical imaging. He therefore meets the acceptance domain experts. However, even for the existing audience criteria given in E GenericEvidence.1 in Figure 5. The re- for 3dfim+, there will likely be some novices. The improve- viewer went through all the assigned tasks and provided ments noted below would likely interest new users, since answers/suggestions. For the most part, the SRS did not the new documentation is more complete and less ambigu- need to be modified as a result of the expert review. How- ous than the original. This benefit of improving software ever, some of the symbols in the SRS, such as N for the and its documentation is also observed when retroactively set of natural numbers, were clarified as a result of the writing an SRS for nuclear safety analysis software [38]. discussion with the expert reviewer. Given the different audience that was envisioned, the The value of expert review is known in SCS. For in- original documentation would not satisfy the GR goal (Fig- stance, the High Energy Physics (HEP) software for the ure 6) for high quality requirements. The existing doc- Compact Muon Solenoid (CMS) detector, which is part of umentation is not fully complete, unambiguous, correct, the Large Hadron Collider, has a GitHub pull request pro- consistent, verifiable, modifiable or traceable. One of the cess that involves automated testing, code quality checks main ambiguities is through the absence of documentation and code review (http://cms-sw.github.io/PRWorkflow. on the coordinate system (Section 4.2.5). As mentioned html. This example is not unique in SCS. Applying such previously, the absence of any specification related to the best practices contributes to building confidence. How- coordinate system meant that comparing the 3dfim+ re- ever, the reviews for the CMS software and for other SCS sults to an independent calculation of the correlation was software are still often ad hoc. Questions like the follow- difficult. Additional investigation was necessary to find ing do not seem to be explicitly answered: What are the the necessary details so that both results were expressed reviewer’s qualifications? What software artifacts (code, in the same coordinate system. Another ambiguity, due documentation and test cases) are they reviewing? What to incomplete documentation, was for the definition of issues/concerns are the reviewers checking? A review of the rank function (Section 4.2.6). In the original docu- the HEP community roadmap for future software and com- mentation the specific definition of the rank function is puting research and development is completely silent on not given. This creates ambiguity when there are ties the idea of formalizing the review process [45]. Informal in the data because there are multiple ways to deal with reviews are certainly better than no reviews, but formal ties. For instance, all ties could be given the same rank, reviews have been demonstrated to be more effective. For and then a gap could be left in the ranking numbers. instance, systematic code inspections of embedded soft- Alternatively, ties could get the same rank, but no gap ware have a defect removal effectiveness of 85% [8]. As- could be included before listing the next ranking num- surance cases answer the questions listed above; they make ber. Five different ranking algorithms can be found on the review requirements rigorous and defendable. More- https://en.wikipedia.org/wiki/Ranking. The one ac- over, building an assurance case, like the one for 3dfim+, tually used by 3dfim+ gives the same ranking number to shows explicitly that the code is not the only work prod- all ties, with the rank being equal to the mean of what uct. The design, test plan, requirements etc., should also they would have under ordinal ranking. This fact was not be reviewed. As the domain expert review highlights, the determined from the documentation, but by investigating techniques for building confidence are already employed in the C code implementation of 3dfim+. The C code is the SCS; the shift to using assurance cases just means telling basis for the specification given in Section 4.2.6. a more complete and compelling story. Although the software for 3dfim+ did not show any errors in its output, some subtle concerns were raised by 5. Validation of Assurance Case Approach considering the assurance case GA for satisfying the opera- tional assumptions (Figure 9). As shown in GA.1, 3dfim+ The 3dfim+ case study provides evidence of the suit- should not proceed if the input does not match the neces- ability of assurance cases for SCS. Although the original sary assumptions. However, the actual software does not software was certainly built with care, problems were still check the input data. GA.2 (“User is aware of what inputs uncovered as a consequence of the systematic and rigorous are valid”) is also not considered for 3dfim+. The input process of building a complete and defendable argument assumptions are not made explicit in the documentation for correctness. The need for documentation, review and and the user is not warned that it is their responsibility to testing not only provides a means to improve the software, provide valid data. As mentioned previously, the user has it provides as a byproduct an explicit argument for correct- the responsibility of determining whether the statistical ness that can be verified/replicated by third parties. model used by 3dfim+ provides the right tool for them. Before summarizing the assurance case improvements The need for an explicit warning highlights a significant to 3dfim+, we should note that we are not criticizing the benefit of the assurance case methodology - building an original 3dfim+ software and its documentation. The goal assurance case forces the developers to ask questions that of the assurance case is to provide certifiable software, they might not otherwise ask. We would likely not have but the original software did not have this goal. It was addressed “how does a user know the inputs are valid?” if 13 the methodology had not forced us to build an argument to apply to all software, not just medical imaging, or that the inputs satisfy the defined operational assumptions SCS. (Figure 9). Further evidence for the validity of applying assurance • The arguments for modifiability, generic evidence cases to SCS is the success they have found for real time and operational assumptions (Figures 7, 8 and 9, safety critical systems [29, 31, 50]. From the perspective of respectively) could be used as a starting point for assurance cases, SCS and real-time systems are not that other examples. The main difference will be in the different. Both domains require the qualities of correct- required evidence. ness and reliability. For qualities where specific examples • The SRS template adopted for the requirements is may differ in importance, such as the quality of portabil- not specific to MIA; it was developed for SCS in ity, the general approach of building an assurance case is general and has been applied to cases such as ther- the same, no matter the quality. The specific differences mal analysis of a nuclear fuel pin [38], mesh genera- for different qualities will be in terms of the arguments tion [43], and others [37]. and the evidence, not in the overall approach. In terms of the relationship between developers, domain experts and The need for building confidence in scientific software regulators, the motivating argument described at the be- is not unique to medical imaging analysis software, there ginning of this paper will usually apply equally to SCS and are many cases where we need assurance, such as for nu- real time software. That is, in both cases, the regulators clear safety, computational medicine, climate modelling, will likely have less expert knowledge than the developers, etc. The verification of other scientific software, that pro- which implies that the case for building an argument for vides the evidence at the bottom of the assurance case, quality should be in the hands of the developers. If as- will likely vary from one problem to the next. However, surance cases are suitable for real-time systems, then they the tools and techniques for verification on scientific soft- are at least as suitable for SCS, since SCS is arguably sim- ware already exist; they do not need to be invented. What pler than real-time software. SCS does not generally have is needed is a push to the scientific software developers to the same complexities in the external environment, nor use the existing techniques and document their results so the same number of hazards, or concerns with emergent that others can build confidence in the software. An ex- behaviour from the interaction of multiple systems. pectation of supplying an assurance case could provide this push. Based on the current work and our review of past work 6. Generalization of Approach and Future Work on assurance cases, we have identified a number of direc- Our example has focused on MIA, specifically 3dfim+, tions for the future development of assurance cases, as fol- but generalizing the assurance case approach to other SCS lows: applications is straightforward. Most of the developed as- • Additional Examples to Create a Template: As men- surance case is not medical imaging specific, as illustrated tioned above, there are significant commonalities be- by the following points: tween SCS problems and predictable variabilities. • The top level goal (Figure 5) can be viewed as generic Recording this information would make the creation if it is parameterized by the software name. That is, of new assurance cases easier. Since testing was rel- the name 3dfim+ could be replaced with any other atively easy for 3dfim+, further exploration will be SCS software application and the argument at this necessary for testing options, like the strategies men- level would be unchanged. This is not surprising as tioned in Section 4.3. this top level itself was borrowed from an assurance • Build an Assurance Case for a Family of SCS Pro- case for medical devices [50]. grams: Implicit in the previous discussion is that • The context and assumptions for the top goal (Fig- we have a single SCS program to build an assurance ure 4) would change for different SCS applications, case, but in many situations, we are interested in a but the type of questions to answer for the context family of related SCS programs [41], such as a fam- would be similar, such as “what is the intended func- ily of linear solvers or ODE solvers. In this case, tionality of the software?”, “what is the intended we would investigate whether we should build an as- software environment?” etc. surance case for the family, or a family of assurance cases? • The argument for the required qualities of the re- quirements is based on the IEEE standard, which • Work on the Assurance Case from the Start: The allows decomposition of the quality concerns into ar- current work produced the assurance case, SRS and guments for correctness, completeness, consistency, test plan a posteriori. As mentioned in Section 2, as- unambiguity, verifiability, modifiability and trace- surance cases work best when they are used from the start of a project. A particular benefit for research ability (Figure 6). The IEEE standard is intended purposes will be a likely increase in the workload 14 that can be assigned to domain reviewers, since they partly chosen because of recent concerns about the va- will likely have a greater vested interest in the suc- lidity of fMRI (Functional Magnetic Resonance Imaging) cess of the project than when it is a purely academic studies. The concerns centre around whether a parametric exercise. model is appropriate for fMRI data. Although the soft- ware itself cannot determine whether it is the appropriate • Tool Support Improvement: Currently, there is no model, the user should ask themselves this question. The tool that provides an abstraction of goals and sub- assurance case highlighted how the user can be informed goals to handle the complexity of the assurance case of the mathematical assumptions of the model through structure. For instance, it would be nice to hide the the SRS, and the details of their responsibility through details of a goal (or a context, justification, evidence in-program warnings. or assumption) and only show the title. This would The value of assurance cases for MIA was justified. Al- improve readability. Details could be revealed via though no errors were found in the software outputs from clicking on the goals, or context etc. 3dfim+, the exercise did highlight problems with the orig- • Publishing Examples of Practical Assurance Cases: inal documentation and software. The existing documen- Currently, many existing assurance cases are not re- tation was shown to have ambiguities and omissions, such leased due to proprietary rights. The more presenta- as an incompletely defined ranking function and missing tions on adoption of assurance cases and case studies, details on the coordinate system convention adopted. In the better resources we have to learn about assurance addition, a potential concern for the software itself was cases. identified. As mentioned above, running the software does not produce any warning about the obligation of the user • Adding Formality to Assurance Cases: The means of to provide data that matches the parametric statistical expressing confidence in assurance cases and the top- model employed for the correlation calculations. Further level claims may benefit from further formality and evidence for the validity of applying assurance cases to rigour, as presented in [7]. Adding formality could SCS is the success they have found for real time safety justify the completeness and consistency of claim de- critical systems. From the perspective of assurance cases, composition and the credibility of the evidence. A SCS and real-time systems have much in common, with formal model will help, even if not all evidence will SCS generally having the advantage of a operating within be mathematical. The formal model will show the a simpler environment. ideal situation and will clarify all of the requirements Our example has focused on MIA, but generalizing for a complete assurance case, even if some of the ev- the assurance case approach to other SCS applications idence itself has to be informal. is straightforward. Each of the assurance case diagrams shown had mostly generic content. The main place where the examples become specific are at the bottom of the ar- 7. Concluding Remarks gument, where the evidence is presented. Although the This work has motivated assurance cases for SCS. As- evidence will be problem specific, the type of evidence surance cases have already been effectively used for safety needed, like test reports, domain expert resumes, etc, will cases for real time systems. For SCS their advantages in- be similar between problems. clude engaging domain experts, producing only necessary Although a concerted effort was made to make the as- documentation, and providing evidence that can poten- surance case for 3dfim+ convincing and complete, the spe- tially be verified/replicated by a third party. The engage- cific argument for 3dfim+ is not the point of this paper. ment of the domain experts is noteworthy because scientist The important revelation about assurance cases is that end user developers have historically shown a distrust of they are for communication, between experts, especially software engineering techniques and principles. In particu- experts in different domains. They make what was previ- lar, SCS developers tend not to favour full documentation ously implicit, explicit. They force developers to ask ques- of requirements. However, their motivation should im- tions that they might not otherwise ask. Is the evidence prove because an assurance case shows the necessity and complete? Do we have an explicit argument that nothing value of an SRS. As more examples and tools become avail- important has been missed? Much of the needed evidence able, adoption of assurance cases in SCS in general, and for (test cases, expert reviews, etc) for an assurance case, with the specific example of MIA, should become more preva- the usual exception of requirements documentation, is al- lent. The FDA already strongly advises assurance cases ready produced when developing SCS. The evidence is gen- for newly developed infusion pumps [47]. erally presented in an ad hoc way. With assurance cases, How to document an assurance case for SCS was illus- a third party does not have to use incomplete evidence to trated via the MIA example of the medical image analysis form an opinion on the quality of a given work, they can software, 3dfim+. The 3dfim+ software analyzes activ- use the full story, and judge whether the story is convinc- ity in the brain by computing the correlation between the ing. If a reviewer disagrees, they can point to the portion measured and an ideal brain signal. This example was of the argument that they feel is weak, and the developers will have an opportunity to strengthen their argument. 15 The developed assurance case relies on the presence of [9] Anders Eklunda, Thomas Nichols, and Hans Knutssona. 2016. requirements documentation. However, many in the SCS A methodology for safety case development. Proceedings of the community believe that upfront requirements are impos- National Academy of Sciences of the United States of America (PNAS) 113, 28 (2016), 7900–7905. sible, or at least infeasible [35]. As a consequence of this [10] ESA. February 1991. ESA Software Engineering Standards, view, requirements are rarely explicitly recorded for SCS. PSS-05-0 Issue 2. Technical Report. European Space Agency. Can a convincing assurance case for correctness be pro- [11] Dipak Gade and Santosh Deshpande. 2015. Assurance Driven Software Design using Assurance Case Based Approach. In- duced without requirements documentation as part of the ternational Journal of Innovative Research in Computer and evidence? We do not believe so, since verification exercises, Communication Engineering 3, 10 (October 2015), 9121–9127. like testing, require an explicit objective against which the [12] Carlo Ghezzi, Mehdi Jazayeri, and Dino Mandrioli. 2003. Fun- results can be judged. However, we leave the challenge damentals of Software Engineering (2nd ed.). Prentice Hall, Upper Saddle River, NJ, USA. of producing a requirements document free assurance case [13] and Fred B. Schneider. 1993. A logical approach to open to the SCS community. discrete math. Springer-Verlag Inc., New York. [14] John Hatcliff, Mats Heimdahl, Mark Lawford, Tom Maibaum, Alan Wassyng, and Fred Wurden. 2009. A Software Certifica- 8. Acknowledgments tion Consortium and its Top 9 Hurdles. Electronic Notes in Theoretical 238, 4 (2009), 11–17. https: The assistance of Dr. Michael Noseworthy, from St. //doi.org/10.1016/j.entcs.2009.09.002 Joseph’s hospital and McMaster University, is gratefully [15] Timothy Hickey, Qun Ju, and Maarten H. Van Emden. 2001. acknowledged. The 3dfim+ case study came directly from Interval Arithmetic: From Principles to Implementation. J. ACM 48, 5 (Sept. 2001), 1038–1068. https://doi.org/10. his advice and fMRI demonstration. Thanks also to Dr. 1145/502102.502106 Dean Inglis for fulfilling the role of expert reviewer. [16] IEEE. 1998. Recommended Practice for Software Requirements Specifications. Technical Report IEEE Std 830-1998. The in- stitute of Electrical and Electronics Engineers, Inc. 1–40 pages. References https://doi.org/10.1109/IEEESTD.1998.88286 [17] Upulee Kanewala and Anders Lundgren. 2016. Automated [1] Peter G. Bishop and Robin E. Bloofield. 1989. A methodol- Metamorphic Testing of Scientific Software. In Software En- ogy for safety case development. In Industrial Perspectives of gineering for Science, Jeffrey C. Carver, Neil Chue Hong, and Safety-critical Systems: Proceedings of the Sixth Safety-critical George Thiruvathukal (Eds.). Taylor & Francis, Boca Raton, Systems Symposium, Birmingham 1998. Springer-Verlag, Lon- FL, Chapter Examples of the Application of Traditional Soft- don, UK, 1–10. ware Engineering Practices to Science, 151–174. [2] Fletcher J. Buckley, A.M. Davis, and J.W. Horch. 1993. IEEE [18] Diane Kelly and Terry Shepard. 2000. Task-directed software Recommended Practice for Software Requirements Specifica- inspection technique: an experiment and case study. In CAS- tions. Technical Report. The institute of Electrical and Elec- CON ’00: Proceedings of the 2000 conference of the Centre tronics Engineers, Inc., New York, USA. for Advanced Studies on Collaborative research. IBM Press, 6. [3] Jeffrey C. Carver, Richard P. Kendall, Susan E. Squires, and http://portal.acm.org/citation.cfm?id=782040# Douglass E. Post. 2007. Software Development Environments [19] Diane F. Kelly. 2007. A Software Chasm: Software Engineering for Scientific and Engineering Software: A Series of Case Stud- and Scientific Computing. IEEE Software 24, 6 (2007), 120– ies. In ICSE ’07: Proceedings of the 29th International Confer- 119. https://doi.org/10.1109/MS.2007.155 ence on Software Engineering. IEEE Computer Society, Wash- [20] Diane F. Kelly, W. Spencer Smith, and Nicholas Meng. 2011. ington, DC, USA, 550–559. https://doi.org/10.1109/ICSE. Software Engineering for Scientists. Computing in Science & 2007.77 Engineering 13, 5 (October 2011), 7–11. [4] CDRH. 2002. General Principles of Software Validation; Final [21] T.P. Kelly. 1999. Arguing Safety – A Systematic Approach to Guidance for Industry and FDA Staff. Technical Report. US Safety Case Management. Ph.D. Dissertation. York University, Department Of Health and Human Services Food and Drug Ad- Department of Computer Science Report YCST. ministration FDA, Center for Devices and Radiological CDRH, [22] Tim Kelly. 2003. A Systematic Approach to Safety Case Man- Health Center for Biologics Evaluation and Research, Rockville, agement. Technical Report 04AE-149. SAE International. MD. [23] NASA. 1989. Software requirements DID, SMAP-DID-P200- [5] Center for Devices and Radiological Health, CDRH. 2002. Gen- SW, Release 4.3. Technical Report. National Aeronautics and eral Principles of Software Validation; Final Guidance for In- Space Agency. dustry and FDA Staff. Technical Report. US Department Of [24] Mojdeh Sayari Nejad. 2017. A Case Study in Assurance Case Health and Human Services Food and Drug Administration Development for Scientific Software. Master’s thesis. McMaster Center for Devices and Radiological Health Center for Biologics University, Hamilton, ON, Canada. http://hdl.handle.net/ Evaluation and Research, York, . 11375/23075. [6] G.M. Cleland, M.A. Sujan, I. Habli, and J. Medhurst. 2012. Ev- [25] L. Northrop. 2004. Achieving Product Qualities Through idence: Using Safety Cases in Industry and Healthcare. Health Software Architecture Practices. http://www.sei.cmu.edu/ Foundation, London. https://books.google.ca/books?id= architecture/cseet04.pdf o8z-Ms9o3DMC [26] Office for Nuclear Regulation. 2016. A guide to Nuclear Reg- [7] Zinovy Diskin, Tom Maibaum, Alan Wassyng, Stephen Wynn- ulation in the UK. http://www.onr.org.uk/documents/ Williams, and Mark Lawford. 2018. Assurance via model a-guide-to-nuclear-regulation-in-the-uk.pdf transformations and their hierarchical refinement. In Proceed- [27] David L. Parnas and P.C. Clements. 1986. A Rational Design ings of the 21st International Conference on Models Driven Process: How and Why to Fake it. IEEE Transactions on Soft- Engineering Languages and Systems, MODELS 2018, Copen- ware Engineering 12, 2 (February 1986), 251–257. hagen, Denmark, October 14-19, 2018 (2018-11-21). ACM, 426 [28] Cyril Pernet and Tom Nichols. 2016. Has a software – 436. https://www.mcscert.ca/wp-content/uploads/2018/ bug really called decades of brain imaging research into 12/p426-diskin-1.pdf question? https://www.theguardian.com/science/ [8] C. Ebert and C. Jones. 2009. Embedded Software: Facts, head-quarters/2016/sep/30/ Figures, and Future. Computer 42, 4 (April 2009), 42–52. [29] David J. Rinehart, John C. Knight, and Jonathan Rowanhill. https://doi.org/10.1109/MC.2009.118 16 2015. Current Practices in Constructing and Evaluating As- Methodology for Improving the Quality of a Parallel Mesh surance Cases with Applications to Aviation. Technical Report Generation Toolbox. Advances in Engineering Software 40, CR-2014-218678. National Aeronautics and Space Administra- 11 (Nov. 2009), 1155–1167. https://doi.org/10.1016/j. tion (NASA), Langley Research Centre, Hampton, Virginia. advengsoft.2009.05.003 [30] Patrick J. Roache. 1998. Verification and Validation in Com- [44] John Spriggs. 2012. GSN - The Goal Structuring No- putational Science and Engineering. Hermosa Publishers, Al- tation, A Structured Approach to Presenting Arguments. buquerque, New Mexico. Springer, Hayling Island, UK. https://doi.org/10.1007/ [31] John Rushby. 2015. The Interpretation and Evaluation of 978-1-4471-2312-5 Assurance Cases. Technical Report SRI-CSL-15-01. Com- [45] Graeme Stewart et al. 2017. A Roadmap for HEP Soft- puter Science Laboratory, SRI International, Menlo Park, CA. ware and Computing R&D for the 2020s. arXiv (2017). Available at http://www.csl.sri.com/users/rushby/papers/ arXiv:physics.comp-ph/1712.06982 sri-csl-15-1-assurance-cases.pdf. [46] Tim Storer. 2017. Bridging the Chasm: A Survey of Software [32] Judith Segal. 2005. When Software Engineers Met Research Engineering Practice in Scientific Programming. ACM Comput. Scientists: A Case Study. Empirical Software Engineering Surv. 50, 4, Article 47 (Aug. 2017), 32 pages. https://doi. 10, 4 (October 2005), 517–536. https://doi.org/10.1007/ org/10.1145/3084225 s10664-005-3865-y [47] U.S. Food and Drug Administration. 2014. Infusion Pumps [33] Judith Segal. 2008. Models of Scientific Software Development. Total Product Life Cycle: Guidance for Industry and FDA Staff. In Proceedings of the First International Workshop on Soft- on-line. ware Engineering for Computational Science and Engineer- [48] Hans van Vliet. 2000. Software Engineering (2nd ed.): Prin- ing (SECSE 2008). In conjunction with the 30th International ciples and Practice. John Wiley & Sons, Inc., New York, NY, Conference on Software Engineering (ICSE), ACM, Leipzig, USA. Germany, 1–6. http://www.cse.msstate.edu/~SECSE08/ [49] B. Douglas Ward. 2000. Program 3dfim+. Biophysics Research schedule.htm Institute, Medical College of Wisconsin. [34] Judith Segal and Chris Morris. 2008. Developing Scientific Soft- [50] Alan Wassyng, Neeraj Kumar Singh, Mischa Geven, Nicholas ware. IEEE Software 25, 4 (July/August 2008), 18–20. Proscia, Hao Wang, Mark Lawford, and Tom Maibaum. 2015. [35] Spencer Smith, Malavika Srinivasan, and Sumanth Shankar. Can Product-Specific Assurance Case Templates Be Used as 2019. Debunking the Myth that Upfront Requirements are Medical Device Standards? IEEE Design & Test 32, 5 (2015), Infeasible for Scientific Computing Software. In 2019 Interna- 45–55. https://doi.org/10.1109/MDAT.2015.2462720 tional Workshop on Software Engineering for Science (held in [51] Charles B. Weinstock, Howard F. Lipson, and John Good- conjunction with ICSE’19). 1–8. enough. 2007. Arguing Security – Creating Security Assur- [36] W. Spencer Smith. 2016. A Rational Document Driven Design ance Cases. Technical Report. Software Engineering Insti- Process for Scientific Computing Software. In Software Engi- tute Carnegie Mellon University, 4500 Fifth Avenue, Pitts- neering for Science, Jeffrey C. Carver, Neil Chue Hong, and burgh, PA. http://resources.sei.cmu.edu/asset_files/ George Thiruvathukal (Eds.). Taylor & Francis, Boca Raton, WhitePaper/2013_019_001_293637.pdf FL, Chapter Examples of the Application of Traditional Soft- [52] Didar Zowghi and Vincenzo Gervasi. 2013. The Three Cs of ware Engineering Practices to Science, 33–63. Requirements: Consistency, Completeness, and Correctness. [37] W. Spencer Smith, Thulasi Jegatheesan, and Diane F. Kelly. Technical Report. Faculty of Information Technology Univer- 2016. Advantages, Disadvantages and Misunderstandings sity of Technology, Sydney , Australia. About Document Driven Design for Scientific Software. In Pro- ceedings of the Fourth International Workshop on Software En- gineering for High Performance Computing in Computational Science and Engineering (SE-HPCCE). 8 pp. [38] W. Spencer Smith and Nirmitha Koothoor. 2016. A Document- Driven Method for Certifying Scientific Computing Software for Use in Nuclear Safety Analysis. Nuclear Engineering and Technology 48, 2 (April 2016), 404–418. https://doi.org/10. 1016/j.net.2015.11.008 [39] W. Spencer Smith and Lei Lai. 2005. A New Requirements Template for Scientific Computing. In Proceedings of the First International Workshop on Situational Requirements Engi- neering Processes – Methods, Techniques and Tools to Sup- port Situation-Specific Requirements Engineering Processes, SREP’05, J. Ralyt´e, P. Agerfalk,˙ and N. Kraiem (Eds.). In conjunction with 13th IEEE International Requirements En- gineering Conference, IEEE, Paris, France, 107–121. [40] W. Spencer Smith, Lei Lai, and Ridha Khedri. 2007. Require- ments Analysis for Engineering Computation: A Systematic Approach for Improving Software Reliability. Reliable Com- puting, Special Issue on Reliable Engineering Computation 13, 1 (February 2007), 83–107. [41] W. Spencer Smith, John McCutchan, and Fang Cao. 2007. Pro- gram Families in Scientific Computing. In 7th OOPSLA Work- shop on Domain Specific Modelling (DSM’07), Jonathan Sprin- kle, Jeff Gray, Matti Rossi, and Juha-Pekka Tolvanen (Eds.). Montr´eal,Qu´ebec, 39–47. [42] W. Spencer Smith, Mojdeh Sayari Nejad, and Alan Wassyng. 2018. Assurance Cases for Scientific Computing Software (poster). In ICSE 2018 Proceedings of the 40th International Conference on Software Engineering. Association for Comput- ing Machinery (ACM), Gothenburg, Sweden, 1–2. 2 pp. [43] W. Spencer Smith and Wen Yu. 2009. A Document Driven

17