information

Article A Safety Analysis Method for Control Software in Coordination with FMEA and FTA

Masakazu Takahashi 1,* , Yunarso Anang 2 and Yoshimichi Watanabe 1

1 Department of and , University of Yamanashi, 400-8511 Kofu, Japan; [email protected] 2 Department of Computational Statistics, Politeknik Statistika STIS, 13330 Jakarta, Indonesia; [email protected] * Correspondence: [email protected]; Tel.: +81-55-220-8585

Abstract: In this study, we proposed a method to improve the safety of control software (CSW) by managing the CSW’s information and safety analysis results, and combining failure mode and effects analysis (FMEA) and fault tree analysis (FTA). Here, the CSW is developed using structured analysis and design methodology. In the upper stage of the CSW’s development process, as the input of the preliminary design information (data flow diagrams (DFDs) and control flow diagrams (CFDs)), the causes of undesirable events of the CSW are clarified by FMEA, and the countermeasures are reflected in the preliminary design information. In the lower stage of the CSW’s development process, as the inputs of the detailed design information (DFDs and CFDs in the lower level) and programs, the causes of the specific undesirable event are clarified by FTA, and the countermeasures are reflected in the detailed design specifications and programs. The processes are repeated until the impact of undesirable events become the acceptable safety level. By applying the proposed method to the CSW installed into a communication control equipment on the system,  we clarified several undesirable events and adopted adequate countermeasures. Consequently, a  safer CSW is developed by applying the proposed method. Citation: Takahashi, M.; Anang, Y.; Watanabe, Y. A Safety Analysis Keywords: failure mode and effects analysis (FMEA); fault tree analysis (FTA); safety analysis; Method for Control Software in control software; structured analysis and design; Coordination with FMEA and FTA. Information 2021, 12, 79. https:// doi.org/10.3390/info12020079 1. Introduction Academic Editor: Enrico Denti Industrial products are controlled by a computerized system. The software installed into the control system is called control software (CSW). Therefore, these days, accidents Received: 29 December 2020 caused by the CSW occur unexpectedly. In this paper, we propose a method for developing Accepted: 8 February 2021 Published: 12 February 2021 safer CSW. The characteristics of the proposed method are as follows: by maintaining the design information and the safety analysis results unitarily in the whole CSW’s develop-

Publisher’s Note: MDPI stays neutral ment process, one can develop safer CSW by conducting two kind of safety analysis. The with regard to jurisdictional claims in proposed method contributes to realizing safer industrial products along with developing published maps and institutional affil- safer CSW. iations. First, we describe the background that the proposed method is required. Undesirable events of CSW have caused many industrial products troubles recently, such as an accident caused by the incorrect steering operation of an autonomous car [1], a loss of human lives caused by malfunction of radiation therapy equipment [2], and destruction of the spin satellite caused by abnormal hi-speed spin [3], resulting in enormous damages to Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. human life and industrial products. Thus, a safe method for CSW is required. As the This article is an open access article functional configuration of CSW becomes complex according to the high functionality distributed under the terms and of CSW, it becomes difficult to realize sufficient safety when just conducting individual conditions of the Creative Commons safety analysis. Attribution (CC BY) license (https:// The proposed method adopts multiple safety analysis methods in adequate steps in creativecommons.org/licenses/by/ the CSW development process, and the result of the safety analysis in a timely way reflects 4.0/). the design information. We propose a safer CSW development method by conducting

Information 2021, 12, 79. https://doi.org/10.3390/info12020079 https://www.mdpi.com/journal/information Information 2021, 12, 79 2 of 29

those tasks. In the upper stages of the CSW’s development process (upper development process), we applied failure mode and effects analysis (FMEA) [4] to the CSW’s design information, clarified the causes of undesirable events comprehensively, and adopted countermeasures for realizing safety. In the lower stages of the CSW’s development process (lower development process), we applied fault tree analysis (FTA) [5] to the CSW’s design information and code, clarified the causes of the specific undesirable events that could not be treated in the upper development process but were found in the use of CSW, and modified the design information and CSW. By repeating the design, development, FMEA, and FTA, we can clarify the undesirable events injected by the modifications and then identify those as modified (repaired). By executing the processes mentioned above, we can develop a safer CSW. In this way, the proposed method enables seamless sharing of design information and safety analysis results throughout the development process, and improves the safety of CSW by repeating designing and safety analysis. By developing safer CSWs, safer industrial products can be developed, and a safer society can be realized. Here, we describe the CSW that is the target for this study. The target CSW has the following characteristics in the proposed method. The CSW is developed by applying structured analysis and design methodology (SADM) [6] and CSW is written in the C language. As discussed in Section3, SADM is applied because the SADM and the V model of the software development process (this will be explained in Section 3.1) have high-affinity applications. On the other hand, the reason for assuming the C language is that the C language and SADM have high affinity, and 70% of CSWs are written in the C language. CSW works on a single chip, single-core Central Processing Unit (CPU). Those are based on the white paper of the embedded system development [7]. Large-scale CSW is realized by connecting small-scale control pieces of equipment via network and data bus. If small-scale CSW with low coupling and cohesion is developed, a safety analysis report for large-scale CSW is realized by gathering and combining the safety analysis reports of small-scale CSW. Thus, the proposed method can be applied to large-scale CSW. The rest of the paper is organized as follows. Section2 describes related works to the CSW’s safety. Section3 describes the proposed method’s outline and the support environment. Section4 describes the application and evaluation of the proposed method for actual CSW. Section5 describes the conclusion and future works.

2. Related Works We classified the related works as the established standards related to software safety, FMEA for software, FTA for software, other safety analysis methods, and cooperation with multiple safety analysis methods. First, we describe the following established standards related to software safety. The International Electrotechnical Commission (IEC) 61,508 defined the requirements for the functional safety of programmable electric systems [8], recommending the V model of the software development process and requiring high coverage testing in checking the func- tions. In the automotive domain, the International Organization for Standardization (ISO) 26,262 provided a standard for a car’s functional safety [9], which proposed a safety analy- sis method when considering undesirable events and improved CSW’s safety by applying hazard and operability study (HAZOP) [10], FMEA, and FTA. IEC 82304-1 (health software: part 1), as the general requirements for product safety, was established as the standard of CSW’s safety for medical equipment [11]. Good automated manufacturing practice ver. 5 (GAMP5) was established as the standard of CSW’s safety for pharmaceutical production facilities [12]. These standards described the safety requirements for industrial products, analysis methods, and validation methods, but did not describe specific implementation methods. Therefore, we have to define the concrete procedures that met these standards. Second, we explain related works for FMEA. Takahashi et al. [13] proposed an FMEA method of CSW for pharmaceutical production. This research made the execution of FMEA to the various kind of CSWs by defining the failure modes (failure modes will be explained in Section 3.2) in the unit of the function. Morita [14] proposed a method for Information 2021, 12, 79 3 of 29

finding the bugs in the CSW by dividing the CSW into plural blocks, listing up the failure modes, and predicting the causes. Niwa [15] conducted countermeasures to improve CSW’s reliability by listing up the function units’ failure modes in the preliminary design specification. Goddard et al. [16] conducted FMEA by defining the failure modes in the CSW’s instruction level. Snooke et al. [17] conducted FMEA by translating the CSW to the equivalent circuit. Lazarus et al. [18] proposed a failure analysis method by conducting FMEA to the class diagrams developed based on an object-oriented design methodology. Kim et al. [19] proposed a supporting method for detecting failure modes of various software installed into car equipment. Batbayar et al. [20] proposed a risk evaluation method for the CSW of medical equipment by combining a fuzzy model and FMEA. Yang et al. [21] developed a hybrid expert system of failure diagnosis for the embedded software by aggregating both case-based reasoning and diagnosis methods based on a Bayesian network. These researches proposed an FMEA method that focused on a narrow (specific) domain. This showed that the failure modes of these researches were not versatile. To resolve this problem, we have to develop failure modes applicable for many kinds of CSWs. Third, we explain related work on FTA. Weber et al. [22] analyzed the cause of fault for avionics CSW written in assembler language by conducting FTA. Friedman et al. [23] proposed an automatic Fault Tree (FT) development method for software written in the Pascal language. Leveson et al. [24] prepared an FT template corresponding to the pro- gramming language instruction and developed an FT by combining those FT templates. Takahashi et al. [25] proposed an automatic FT development method by expanding Leve- son’s idea. Kumar et al. [26] showed that in the development of safety-critical ball position control systems, adequate CSW design could be achieved when the FTA was conducted in the software development life cycle’s proper timing. Oveisi et al. [27] proposed a safety evaluation method by applying FTA to the software’s sequence diagrams according to the object-oriented development methodology. Junga et al. [28] proposed an automatic FT development method from software specifications written in formal specification language called NuSCR. From the results of [24] and [25], we found that the causes of the undesirable events are detected at the program level by using FT. By improving this function, FTA for various types of CSWs becomes able to conduct. Fourth, we explain other safety analysis methods. Hansen et al. [29] showed a method that detected the hazards comprehensively by applying the HAZOP to the software speci- fications written in the unified modeling language (UML). Guiochet et al. [30] proposed a method that clarified the hazard when the robot behaved by applying HAZOP to the robot’s CSW specification written in UML. Kaleeswaran et al. [31] proposed a method to detect hazards written in domain-specific language by applying HAZOP if critical software was developed using model-based development. Abdulkhaleq et al. [32] clarified the system hazards of an adaptive cruise control system in the car by applying system-theoretic process analysis (STPA) and pointed out the constraint and problems of STPA applica- tion. Yang [33] proposed a method that derived the avionics system’s safety validation testing requirements using STPA. Nakano et al. [34] showed that a safer system could be developed by clarifying crew return vehicles’ hazards using STPA and reflecting the countermeasures corresponding to the hazard to CSW in the concept development phase. The completeness of the analytic result depends on the skills of the analyst because these methods have a high degree of freedom. Therefore, these methods are not suitable for use in the CSW’s development. Finally, we explain cooperation with multiple safety analysis methods. Hong et al. [35] proposed the following safety improving method: calculating the minimum cut-set of FT by conducting FTA, identifying the fault with the highest risk, and planning the counter- measure by conducting FMEA. Oveisi [36] proposed an approach to improve the CSW’s safety throughout the software development lifecycle using hazard analysis in the upper development process, FTA and FMEA in the middle development process, and detailed Information 2021, 12, 79 4 of 29

FTA and FMEA in the lower development process. From those cases, a safety analysis method coordinating with FMEA and FTA is valid because it has a high affinity. Many related works on CSW have been conducted; however, no method in developing safer CSW in the whole development process has been reported. Our proposed method will be described in Section3.

3. Outline of the Proposed Method Here, we explain the outline of the proposed development method for a safer CSW. Section 3.1 explains the outline of the proposed method. Section 3.2 explains the FMEA for a CSW, analyzing causes of undesirable events comprehensively in the upper development process. Section 3.3 explains the FTA for CSW, analyzing the specific undesirable event’s causes in the lower development process. Section 3.4 explains the safer CSW development environment by combining FMEA and FTA. The technical term “fault” is defined as “the state that causes the degeneration or loss of ability for conducting a required function,” and the technical term “failure” is defined as “the loss of ability for conducting a required function” (ISO/IEC 2382-14) [37]. Therefore, failure occurs when the fault occurs under specific conditions, resulting in undesirable events.

3.1. Outline of the Development Method for a Safer CSW This subsection explains an analysis method for developing a safer CSW in the whole development process. The proposed method manages the CSW’s design information and safety analysis results unitarily and uses that information by cooperating with FMEA and FTA. The safer CSW is developed by adequately reflecting the CSW’s design information and safety analysis results. First, we explain the development processes and the outputs. Figure1 shows the CSW’s development processes and outputs in each step. The CSW’s development process shown in Figure1 is called the V model of the software development process. The squares in Figure1 show the individual processes. The left side of Figure1 shows the development process, and the right side shows the verification process. The dotted line between the development process and the verification process shows the correspondence. We apply SADM to the development process of CSW. The upper development process consists of the planning step, requirement definition step, and preliminary design step. In the planning step, we define the outline of the CSW’s functions and data interfaces (data and control signals sent and/or received from/to CSW; external entities (hardware, software, or persons)). The development plan is created as a document describing those data and control signals. In applying SADM, the documents corresponding to the development plan are the data context diagram (DCD) and control context diagram (CCD). In the requirement definition step, we define the requirements for CSW and create the requirement specification. Each requirement corresponds to the data flow diagrams (DFDs) process in the second and/or third layer. (DCD is considered as the first-layer DFD. DFDs are considered as the divided design of DCD. The number of the layer is a guide). We also define the large-grained functions. The preliminary design specifications are created as the document that describes large-grained functions. The large-grained functions correspond to the processes in the DFD of the third to fifth layer. After developing the DFD, control flow diagrams (CFDs) correspond to those DFDs, and control specifications (CSPECs) are developed consisting of state transition diagrams, decision tables, and activation tables. CFDs define the control signals used for invocating the processes and newly created control signals. The state transition diagrams define the inner state of the CSW, the transition conditions between the states, and the group of the processes activated when the transitions occur. The decision tables define the control signals issued by the ON/OFF combination of control signals. The activation tables define the groups of the processes activated by the control signals. In the lower development process, the large-grained functions are divided into several modules, and the algorithm of the module is defined. The detailed Information 2021, 12, x FOR PEER REVIEW 5 of 31

decision tables, and activation tables. CFDs define the control signals used for invocating the processes and newly created control signals. The state transition diagrams define the inner state of the CSW, the transition conditions between the states, and the group of the processes activated when the transitions occur. The decision tables define the control signals issued by the ON/OFF combination of control signals. The activation tables define Information 2021, 12, 79 5 of 29 the groups of the processes activated by the control signals. In the lower development process, the large-grained functions are divided into several modules, and the algorithm designof the module specifications is defined. are createdThe detailed as the design modules’ specifications list and define are created the module’s as the modules concrete’ processes.list and define The modulesthe module correspond’s concrete to processes. the processes The in modules the DFDs correspond in the fourth to the to sixth processes layer. Afterin the developing DFDs in the the fourth DFDs, to CFDs sixthand layer. CSPECs After correspondingdeveloping the to DFDs, the DFDs CFDs are and developed. CSPECs Finally,corresponding in the programming to the DFDs are step, developed. CSWs are Finally, developed in the according programming to the detailedstep, CSWs design are specifications.developed according Those to items, the detailed such as DFDs,design CFDs,specifications. and CSPECs, Those are items registered, such as into DFDs, the databaseCFDs, and (DB), CSPECs as explained, are registered in Section into 3.4 the. database (DB) , as explained in Section 3.4.

Feedback results of countermeasures

Upper Stage of the CSW’s Development Process FMEA Check FMEA Results Define Development Plan Test Result Analyze all FMEA Check FMEA Results Develop Requirement Spec. Requirement Test points FMEA Check FMEA Results Develop Preliminary design Spec. Functional Test

Analyze FTA Check FTA Results Develop Detailed Design Spec. Combined Test priority points FTA Check FTA results Coding Module Test

Lower Stage of the CSW’s Development Process

Figure 1. The control software (CSW)’s(CSW)’s developmentdevelopment process.process.

Next, wewe explainexplain the the safety safety analysis analysis methods methods used used in in the the CSW’s CSW’ developments development process. pro- Incess. the In upper the upper development development process, process, the the failures failures with with the the possibility possibility of of occurrence occurrence areare examined comprehensively,comprehensively, and and the the necessary necessary countermeasures countermeasures are are reflected reflected in the in .the de- FMEAsigns. FMEA is used is for used the analysisfor the analysis of the failures of the and failures their causes.and their The causes. details The of the details FMEA of willthe beFMEA explained will be in explained Section 3.2 in. AsSection the result 3.2. As of conductingthe result of FMEA, conducting the development FMEA, the plan,develop- the requirementment plan, specifications,the requirement and specifications, the preliminary and design the specifications preliminary shown design on specifications the left side ofshown the area on enclosedthe left side by theof the upper area dotted enclosed bold by line the in upper Figure dotted1 are modified bold line in in relation Figure to 1 the are failuresmodified considered in relation to to be the critical. failures As aconsidered result of modifications, to be critical. the As negative a result impactsof modifications, of failure arethe degradednegative impacts to an acceptable of failure level. are degraded Because the to additionan acceptable of the level. complex Because functions the addition and the countermeasures on the program cannot be conducted in this step, the countermeasures are of the complex functions and the countermeasures on the program cannot be conducted conducted in the lower development process, in which the countermeasures, such as the in this step, the countermeasures are conducted in the lower development process, in addition of complex functions and programs, are conducted. Additionally, the necessary which the countermeasures, such as the addition of complex functions and programs, are countermeasures are conducted on the newly found faults. FTA is used for the analysis conducted. Additionally, the necessary countermeasures are conducted on the newly of the causes of the fault. FTA will be explained in Section 3.3. As shown in the left lower found faults. FTA is used for the analysis of the causes of the fault. FTA will be explained parts in Figure1, the modification of the detailed design specifications, the modifications of in Section 3.3. As shown in the left lower parts in Figure 1, the modification of the de- the programs, and the addition of the programs are conducted according to the results of tailed design specifications, the modifications of the programs, and the addition of the FTA. When the modifications of the detailed design specifications, the modification of the programs are conducted according to the results of FTA. When the modifications of the programs, and the addition of the programs affect the negative impact on the documents detailed design specifications, the modification of the programs, and the addition of the in the upper development process, the documents in the upper development process on programs affect the negative impact on the documents in the upper development pro- the left side of the area enclosed by the lower dotted bold line in Figure1 are modified. cess, Bythe conducting documents the in processesthe upper mentioneddevelopment above, process we haveon the completed left side the of the first area investi- en- gationclosed by of thethe CSW’slower dotted safety. bold However, line in theFigure possibility 1 are modified of injections. of new failure modes and/or a fault occur in applying those countermeasures to the CSW is increased. Therefore, after completing the first investigation for the CSW’s safety, FMEA and FTA are conducted to the CSW, and additional countermeasures are applied to the CSW. This investigation will repeat until the safety level of all failures and faults become acceptable. A safer CSW will be realized by those works. Information 2021, 12, 79 6 of 29

3.2. Outline of FMEA In this subsection, we explain the FMEA for investigating the CSW’s safety in the upper development process. First, we explain FMEA for CSWs. Here, we discuss the CSW’s failure mode. Generally, a failure mode is defined as the change and/or alteration (violation) of the component in an adequate (healthy) state. As bugs were injected when developing the software, the component that contains the bugs is not in an adequate state originally. Therefore, the bugs are not considered as failure modes. The bugs should be removed by conducting the tests. The bug-removing task is not regarded as the target of FMEA in the proposed method. Accordingly, the failure modes dealt with in the proposed method are considered as follows: the deviation (violation) of usages of CSW functions (module, functions, etc.) and the deviations (violation) of the operation method. The difference between a bug and a failure mode is explained using a sample of stack overflow when an interruption occurs. Stack overflow occurs when many interruptions occur in the case that the values of the registers and variables pushed into the stack are not popped adequately, or when interruptions occur over the number that is defined in the requirement specification. As in the former case, when the gavages (the values of registers and variables) in the stack resulting from the inadequate are the cause, this is regarded as the bug. The cause of the latter case is too much push operation. In this case, the software is designed adequately, but more interruptions than expected occur. Because this is a violation of the requirement, this is regarded as a failure mode (violation of software usage). The benefit that violation of the usage and operation is regarded as the failure mode is that these failure modes can apply to the various types of function, and the number of failure modes can be reduced. If the bugs were regarded as failure modes, the number of failure modes would be enormous because the types of the bugs are various. In the proposed method, the common failure modes used for whole CSWs and standard countermeasures for each common failure mode are derived by analyzing more than twenty existing CSWs [10]. Table1 shows a list of common failure modes and standard countermeasures. Here, a sample of common failure modes are explained. The common failure mode regarding the start-up (execution) function is that the conditions for executing the function are not satisfied, and when this situation occurs, the function becomes unable to execute. The countermeasures for this problem are considered as follows: add the execution-condition check to the standard operation procedure (SOP), and add the function that sets the conditions.

Table 1. Common failure modes and standard countermeasures.

Group Common Failure Mode Failure Example Countermeasure Policy Standard Countermeasures Add the confirmation Review the procedure for the startup startup conditions Related operations cannot conditions to the Standard The startup conditions for Startup be conducted, an improper Operation Procedure (SOP), functions are not prepared Conduct multiple checks system status exists set the conditions as to when startup whether or not to start Conduct the startup check Display the startup status Add the confirmation Review the termination procedure for termination conditions for functions conditions to the SOP, set the conditions whether or not to The termination Related operations cannot terminate, multiplex the Conduct multiple checks termination Termination conditions for functions be conducted, an improper are not prepared system status exists upon termination confirmation procedure Conduct Display the termination check termination status Transit to the safe status Add the emergency for top priority stop function Information 2021, 12, 79 7 of 29

Table 1. Cont.

Group Common Failure Mode Failure Example Countermeasure Policy Standard Countermeasures Conduct multiple checks Conduct double checks Improper results are on SOP on SOP Instructions on calculated, an improper SOP misread Improve the visibility of system status exists Integrate the SOP format SOP indications Conduct multiple checks Conduct double checks on on HMI HMI Indications on Human Improper results are Improve the visibility Machine Interface calculated, an improper Integrate the HMI format of HMI (HMI) misread system status exists Add the Check the content of HMI Input/Output reconfirmation function Mistake in Improper results Conduct multiple checks Conduct double checks checking products are calculated on products on products Add a warning function for Past data are lost Notify when data are lost Data related to quality past data loss are lost Notify if there is a data Add a warning function for Latest data are lost loss risk the latest data loss Improper results are Multiple checks on Conduct double checks on An inputting error calculated, an improper input data setting data system status exists A wrong measurement is Long time intervals for Shorten time intervals for Calibration done, improper results Conduct periodic reviews function calibration function calibration are calculated Confirm the qualification Confirm operation authority Proper operations cannot before operation before operation Qualification Wrong operation authority be done, improper results are calculated Do not set Review authority improper authority periodically Conduct proper Organize the backup Data disappear, data backup operations procedure in the SOP Backup Insufficient backup related to quality are lost Shorten backup Shorten backup intervals time intervals Realize faster Unexpected data Realize faster processing Data cannot be updated update processing update occurs Develop faster devices Install faster memory devices The upper limit of Improper results are Utilize double-precision calculation precision Increase significant digits calculated variables is confirmed The lower limit of Improper results Utilize double-precision calculation precision is Increase significant digits are calculated variables confirmed Give a warning of division Add a warning function for Divided by zero Operation is suspended by zero a small divisor Unexpected CPU Load Add a restriction function Refuse data Unexpected amount of for available data data is accepted Add the number of available Do not input data Abnormal program data to the SOP shutdown Restrict interruption tasks Restrict interruption tasks Unexpected interruption Add the restriction function tasks occur Prohibit interruption tasks for interruption tasks to the SOP Unexpected execution Add the function of requests are not sent displaying CPU usage Program does not Unexpected CPU Add the restriction function response, a slow response load occurs Refuse unexpected for accepting execution execution requests requests under CPU overload Information 2021, 12, 79 8 of 29

Table 1. Cont.

Group Common Failure Mode Failure Example Countermeasure Policy Standard Countermeasures No identification for Take measures so that data Introduce Data Loss important data are not removed Prevention (DLP) tools Data are removed Take measures so that data Add access control for data No access control for data are not accessed according to each user Take measures so that data Add e-signature, add Data could be rewritten Data are falsified are not falsified time stamp Disconnect from the Vast amounts of data sent Data acceptance is blocked Related operations cannot external network Vast amounts of be conducted Malicious operations Data are selected Install fire walls or attacks requests sent Disconnect from the Disconnect external network Illegally accessed from System is invaded Introduce Intrusion the outside Detection System (IDS), Discover illegal access Introduce Intrusion Prevention System (IPS) Remove computer virus Introduce antivirus software System malfunctions, Introduce virus Data with virus attached improper results Take measures so that protection software are received are calculated virus does not invade Conduct virus check on USB memory devices connected

Figure2 shows the process of conducting the proposed FMEA. First, the CSW is designed, and the requirement specifications and preliminary design specifications are registered to the Design Safety Information DB, as described in Section 3.4. Second, the CSW’s functions are extracted from the DB, and the common failure modes shown in Table1 are investigated as to whether they apply or not to the extracted functions. Third, if the common failure mode can apply to the function, the function–failure mode correspondence table is developed. Fourth, the function, the common failure mode, and the negative impact for the CSW are decided using the FMEA sheet (Table2). The usage of the FMEA sheet is as follows: the function is described (entered) on the 1st column of the sheet, the common failure mode applicable to the function is clarified and described on the 2nd column of the sheet, and the impact on the system when the common failure mode occurs is clarified and described on the 3rd column of the sheet. Then, the method described below is used to determine whether the effect is acceptable. Fifth, severity, incidence, risk class, and detection rate for the failure mode unit are decided, and the risk priority is calculated. When deciding the risk priority, the risk evaluation matrix is used. Figure3 shows the risk evaluation matrix. On the left side of Figure3, the value of the risk class is calculated using severity and incidence. On the right side of Figure3, the value of the risk priority is calculated using risk class and detection rate. Sixth, based on the value of the risk priority, it is judged whether the CSW can accept the failure or not. If the risk cannot be accepted, it is investigated to apply the standard countermeasures described in Table1. Finally, the severity, incidence, risk class, and detection rate when applying the standard countermeasures are evaluated, and the risk priority is calculated again. As a result, if the CSW can accept the failure, the CSW’s safe tasks are completed using FMEA. If the CSW cannot accept the failure, the same tasks are repeated until CSW can accept the failure. Information 2021, 12, x FOR PEER REVIEW 9 of 31

scribed in Table 1. Finally, the severity, incidence, risk class, and detection rate when applying the standard countermeasures are evaluated, and the risk priority is calculated again. As a result, if the CSW can accept the failure, the CSW’s safe tasks are completed Information 2021, 12, 79 using FMEA. If the CSW cannot accept the failure, the same tasks are repeated until CSW9 of 29 can accept the failure.

List of Common Failure Mode Group Common Failure Mode Failure Example Countermeasure Policy Standard Countermeasures Related operations cannot be Review the startup conditions Add the confirmation procedure for the startup conditions to the SOP, The startup conditions for Startup conducted, an improper Conduct multiple checks when startup set the conditions as to whether or not to start functions are not prepared system status exists Conduct the startup check Display the startup status Review the termination conditions for functions Add the confirmation procedure for termination conditions to the SOP, Related operations cannot be set the conditions whether or not to terminate, multiplex the termination The termination conditions for Conduct multiple checks upon termination Termination conducted, an improper confirmation procedure functions are not prepared system status exists Conduct termination check Display the termination status Transit to the safe status for top priority Add the emergency stop function ------refer & use Function - Failure mode Correspondence Table Requirement Spec. / Gropu Common Falure Modes Preliminary Design Spec. The startup conditions for the Analyze startup function1 function are not prepared. correspondence input Function A The termination conditions for the termination s between function2 function are not prepared. - - - - input/output Instruction on SOP missed. functions and The startup conditions for the startup common failure function are not prepared. Function B modes Design Safety input/output Instruction on SOP missed. considering information DB input/output Mistake in checking product. ------similarity. develop Result of Failure Mode and Effects Analysis using FMEA Sheet Common Failure Accept Detection Function Impact to System Severity Incidence Risk Class Priority Countermeasures Mode /Reject rate Startup condition The machine does Middle Low 3 High Low Add SOP for Checking ○ X is not prepared. not work. Middle Low 3 High Low Startup Conditions. Function A Termination The machine use Middle Low 3 High Low Add SOP for cheking condition Y is not electric power ○ termination Condition. prepared. continously. Middle Low 3 High Low ------

FigureFigure 2.2. The procedureprocedure of the proposedproposed failure mode and effects analysis (FMEA).(FMEA).

Table 2. Sample of failure mode and effects analysis (FMEA) sheet. Table 2. Sample of failure mode and effects analysis (FMEA) sheet. Common Common Function ImpactImpact to to System Accept Accept/Reject SeverityInci- IncidenceRisk Risk Detection Class Detection Rate Priority Countermeasures Function Failure Mode Severity Priority Countermeasures System /Reject dence Class rate Mode Add Standard Startup condition Middle Low 3 High Low Operation Procedure The machine Accept XStartup is not prepared. does not work. Middle Low 3 High Low Add Standard(SOP) for Checking Opera- Middle Low 3 High Low Startup Conditions. Function A condition X The machine tion Procedure (SOP) Termination The machine use AcceptAccept Middle Low 3 High Low Add SOP for iscondition not pre- Y is does electricnot work. power Middle Low 3 High Low for Checkingchecking termination Startup Information 2021, 12, x FOR PEER REVIEW Middle Low 3 High Low 10 of 31 Function notpared. prepared. continuously. ConditionsCondition.. ------––––––---- A Termina- The machine Accept Middle Low 3 High Low Add SOP for checking tion condi- use electric termination Condi- tion Y is not power Incidence Middle Low 3 High DetectionLow rate tion. prepared. continuously. low middle high high middle low

------Severity Severity of impact

high Risk class 1 1 Risk priority high Risk Risk class

middle Risk class 2 2 Risk priority middle

low Risk class 3 3 Risk priority low

Severity of impact = affect to the system Detection rate = probability that can be detected before occurrence of accident Incidence = probability of occurrence of accidents Risk priority = Risk class x Detection rate Risk class = Severity of impact x Incidence Figure 3. Risk evaluation matrix.Figure 3. Risk evaluation matrix.

3.3. Outline of FTA3.3. Outline of FTA This subsectionThis describes subsection the FTA describes used for the clarifying FTA used the specific for clarifying causes ofthe the specific CSW’s causes of the fault in the lowerCSW’ developments fault in the processes. lower development processes. First, FTA for CSW is explained [38]. The components of CSW are instructions, and the interfaces to other components are the execution sequence of the instructions and the data exchanged. The fault is propagated via the sent/received data between the instruc- tions according to the instruction’s execution sequence. Therefore, to identify the causes of the fault, we need to trace the execution sequence of instructions and calculation pro- cess in the opposite direction. We explain this reason using the sample program shown in Figure 4. The function of this program is to input the initial value and add the value of the index in the for-loop to the initial value five times. We consider a case that the value of variable res1 in line 07 becomes six (this is a focused result). The instruction in line 05 is executed before the execution of the instruction in line 07, and the value of res1 is five before the execution of the instruction in line 05. As the instruction in line 05 exists in the loop instruction in line 04, the instruction is executed five times. The value of res1 is one before executing the loop instruction in line 04. The instruction in line 02 is executed be- fore the execution of line 04, and the value of one is assigned into res1. As the result that the value of one is assigned into res1 in line 02, we found that res1 becomes six in line 07. To clarify the calculation process between the instruction executing before and after, the relationship (calculation process) between the instruction executing before and after is defined as the Fault Tree template (FT template; explained later). Here, because it is inef- ficient to develop FT templates for all instructions of the C language, FT templates for the instructions that are used frequently are developed based on the result of instruction appearance frequency in existing CSWs. The FT is developed by clarifying the instruction that is executed before the focused instruction and combining the FT templates corre- sponding to the instruction. This procedure is defined as FT development rules (ex- plained later). Figure 5 shows the proposed FTA procedure. In the proposed method, first, the CSW’s design information (processes in the fourth and/or fifth layer’s DFDs, CSPECs, codes, etc.), fault, and instruction that causes the fault are obtained from the DB, as de- scribed in Section 3.4. Second, it extracts the FT template corresponding to the instruction that causes the fault from the FT template library. Third, the FT template calculation is modified according to the CSW’s execution status, and the FT template is regarded as the temporary FT. Fourth, the instruction executed immediately before is clarified according to the FT development rules; the corresponding FT template is extracted from the FT template library; the corresponding FT template is added (combined) to the temporary FT; the contents of the temporary FT are modified according to the actual status of the CSW. By conducting those tasks, we can develop the FT for the fault.

Information 2021, 12, 79 10 of 29

First, FTA for CSW is explained [38]. The components of CSW are instructions, and the interfaces to other components are the execution sequence of the instructions and the data exchanged. The fault is propagated via the sent/received data between the instructions according to the instruction’s execution sequence. Therefore, to identify the causes of the fault, we need to trace the execution sequence of instructions and calculation process in the opposite direction. We explain this reason using the sample program shown in Figure4. The function of this program is to input the initial value and add the value of the index in the for-loop to the initial value five times. We consider a case that the value of variable res1 in line 07 becomes six (this is a focused result). The instruction in line 05 is executed before the execution of the instruction in line 07, and the value of res1 is five before the execution of the instruction in line 05. As the instruction in line 05 exists in the loop instruction in line 04, the instruction is executed five times. The value of res1 is one before executing the loop instruction in line 04. The instruction in line 02 is executed before the execution of line 04, and the value of one is assigned into res1. As the result that the value of one is assigned into res1 in line 02, we found that res1 becomes six in line 07. To clarify the calculation process between the instruction executing before and after, the relationship (calculation process) between the instruction executing before and after is defined as the Fault Tree template (FT template; explained later). Here, because it is inefficient to develop FT templates for all instructions of the C language, FT templates for the instructions that are used frequently are developed based on the result of instruction appearance frequency in existing CSWs. The FT is developed by clarifying the instruction that is executed before the focused instruction and combining the FT templates corresponding to the instruction. This procedure is defined as FT development rules (explained later). Figure5 shows the proposed FTA procedure. In the proposed method, first, the CSW’s design information (processes in the fourth and/or fifth layer’s DFDs, CSPECs, codes, etc.), fault, and instruction that causes the fault are obtained from the DB, as described in Section 3.4. Second, it extracts the FT template corresponding to the instruction that causes the fault from the FT template library. Third, the FT template calculation is modified according to the CSW’s execution status, and the FT template is regarded as the temporary FT. Fourth, the instruction executed immediately before is clarified according to the FT development rules; the corresponding FT template is extracted from the FT template library; the corresponding FT template is added (combined) to the temporary FT; the contents of Information 2021, 12, x FOR PEER REVIEW 11 of 31 the temporary FT are modified according to the actual status of the CSW. By conducting those tasks, we can develop the FT for the fault.

the value of res1 becomes six when void main(){ one is assigned into res1 in line 02. 01 int i, res1, res2; (3) res1 is one before entering the loop instruction. 02 scanf (%d, &res1); res1 is assigned inline 02. 03 scanf(%d, &res2); 04 for (i = 1; I <=5; i++){ (2) res1 is calculated in line 05. 05 res1 = res1 + i; res1 is repeated five times by the loop 06 res2 = res2 + i; instruction in line 04. } 07 printf (“res1 = ¥n“, res1); 08 printf (“res2 = ¥n“, res2); (1) the value of res1 is six in line 07. } res1 is calculated in line 05. FigureFigure 4. 4. SampleSample p programrogram for for e explanationxplanation of of the the proposed proposed method. method.

FT template library Branch Loop Sequence if-then-else for{i=0; n; i++} inst. Do 1st inst. or or

Do Then clause Do else clause Not do inst. Do inst. Do 2nd inst. and and and and ・ ・ DO nth・ isnt. If stat, Event in If stat. Event in Cond. is Not Cond. is Do is true then clause is false Else clause false do inst. true n times

Use Event occurs in loop inst. Design Safety or Information DB FT Not do inst. Do n times Development and and for(;;) rules {Do module A; FTA Cond. is Not Do module C Do false do isnt. n times Do module B; Input Support Tool Output Do module C;} Do module B Print X FT template for Loop Do module A Control software FT template for Sequence Figure 5. The procedure of the proposed fault tree analysis (FTA).

Second, we explain the FT templates. In the proposed method, we develop the FT templates. The FT templates are developed based on the result of the appearance fre- quency of the instructions in the existing twenty CSWs. Figure 6a–i shows each FT tem- plate. These FT templates are described using logical symbols. In each FT template, it only shows the relationship between the event after the execution of an instruction (TE- vent) and the event before the execution of that instruction (BEvent). Figure 6a shows the FT template for the assigned statement. This template explains that the causes of the undesirable event are the case of “input value is incorrect” or the case of “operator is in- correct” when this instruction is executed. Figure 6b shows the FT template for the if-then-else statement. This template explains that the causes of the undesirable event are the case that “n-th clause is executed and the instruction in the clause produces the event” or the case that “else clause is executed and the instruction in the clause produces the event” when this instruction is executed. Figure 6c shows the FT template for the whole statement. This template explains that the causes of the undesirable event are the case that “as a result of not executing this instruction, the undesirable events occur” or the case that “as a result of executing this instruction n-th times, the undesirable event occurs” when this instruction is executed. Figure 6d shows the FT template for the func- tion call. This template explains that the cause of the undesirable event is the case that “arguments are incorrect” or the case that “the function cannot be executed” when this instruction is executed. Figure 6e shows the FT template for interruption. This template explains that the causes of the undesirable event are the case that “the interruption oc-

Information 2021, 12, x FOR PEER REVIEW 11 of 31

the value of res1 becomes six when void main(){ one is assigned into res1 in line 02. 01 int i, res1, res2; (3) res1 is one before entering the loop instruction. 02 scanf (%d, &res1); res1 is assigned inline 02. 03 scanf(%d, &res2); 04 for (i = 1; I <=5; i++){ (2) res1 is calculated in line 05. 05 res1 = res1 + i; res1 is repeated five times by the loop 06 res2 = res2 + i; instruction in line 04. } 07 printf (“res1 = ¥n“, res1); 08 printf (“res2 = ¥n“, res2); (1) the value of res1 is six in line 07. } res1 is calculated in line 05. Information 2021, 12, 79 11 of 29 Figure 4. Sample program for explanation of the proposed method.

FT template library Branch Loop Sequence if-then-else for{i=0; n; i++} inst. Do 1st inst. or or

Do Then clause Do else clause Not do inst. Do inst. Do 2nd inst. and and and and ・ ・ DO nth・ isnt. If stat, Event in If stat. Event in Cond. is Not Cond. is Do is true then clause is false Else clause false do inst. true n times

Use Event occurs in loop inst. Design Safety or Information DB FT Not do inst. Do n times Development and and for(;;) rules {Do module A; FTA Cond. is Not Do module C Do false do isnt. n times Do module B; Input Support Tool Output Do module C;} Do module B Print X FT template for Loop Do module A Control software FT template for Sequence FigureFigure 5.5. TheThe procedureprocedure ofof thethe proposedproposed faultfault tree analysis (FTA).(FTA).

Second,Second, wewe explainexplain the FTFT templates.templates. In the proposed method, we we develop develop the the FT FT templates.templates. TheThe FT FT templates templates are are developed developed based based on the on resultthe result of the of appearance the appear frequencyance fre- ofquency the instructions of the instruction in the existings in the twenty existing CSWs. twenty Figure CSW6a–is. Figure shows 6 eacha–i shows FT template. each FT These tem- FTplate. templates These FT are template describeds areusing described logical symbols. using logical In each symbols. FT template, In each it onlyFT template, shows the it relationshiponly shows betweenthe relationship the event between after the the execution event after of anthe instruction execution (TEvent)of an instruction and the event (TE- beforevent) and the the execution event before of that the instruction execution of (BEvent). that instruction Figure6 (BEvent).a shows theFigure FT template6a shows forthe theFT assignedtemplate statement. for the assi Thisgned template statement. explains This template that the causesexplains of that the undesirable the causes of event the areundesi ther caseable ofevent “input are valuethe case is incorrect”of “input value or the is case incorrect of “operator” or the iscase incorrect” of “operator when is thisin- instructioncorrect” when is executed. this inst Figureruction6b shows is executed. the FT Figuretemplate 6b for shows the if-then-else the FT template statement. for This the templateif-then-else explains statement. that theThis causes template of the explains undesirable that the event cause ares of the the case undesirable that “n-th event clause are is executedthe case and that the“n- instructionth clause is in executed the clause and produces the instruction the event” in or the the clausecase that produces “else clause the isevent executed” or the and case the that instruction “else clause in the is executed clause produces and the theinstruction event” when in the this clause instruction produces is executed.the event” Figure when6 c this shows instruction the FT template is executed. for the Figure whole 6 statement.c shows the This FT template template explains for the thatwho thele statement. causes of theThis undesirable template explains event are that the the case causes that “asof the a result undesir ofa notble executingevent are thisthe instruction,case that “as the a result undesirable of not eventsexecuting occur” this orinstruction, the case thatthe undesir “as a resultable ofevents executing occur” this or instructionthe case that n-th “as times, a result the of undesirable executing this event instruction occurs” whenn-th times, this instruction the undesir isable executed. event Figureoccurs6”d when shows this the instr FTu templatection is executed. for the function Figure 6 call.d shows This the template FT template explains for the that func- the causetion call. of the This undesirable template explains event is that the casethe cause that “arguments of the undesirable are incorrect” event is or the the case case that “the“arguments function are cannot incorrect be executed”” or the case when that this “the instruction function is cannot executed. be executed Figure6e” showswhen this the FTinstruction template is for executed. interruption. Figure This6e shows template the FT explains template that for the interrupt causes ofion the. This undesirable template event are the case that “the interruption occurs and the interruption module produces the explains that the causes of the undesirable event are the case that “the interruption oc- undesirable event”, the case that “the interruption does not occur, and the none-execution of the interruption module produces the undesirable event”, or the case that “the inhibition of the interrupt produces the undesirable event”. Figure6f shows the FT template for the global variables. This template explains that the cause of the undesirable event is the case that “one or more values of the global-variables in the whole program being incorrect produce the undesirable event”. Figure6g shows the FT template for the local variables. This template explains that the cause of the undesirable event is that “one or more values of local variables in the focused scope being incorrect produce the undesirable event”. Figure6h shows the FT template for the array. This template explains the cause of the undesirable event is the case that “the index of an array is under zero”, the case that “the element that the index points to does not exist”, or the case that “the value of the element that is pointed to by the index is incorrect”. Figure6i shows the template for the pointer. This template explains that the cause of the undesirable event is the case that “the address does not exist”, the case that “the address is incorrect”, or the case that Information 2021, 12, x FOR PEER REVIEW 12 of 31

curs and the interruption module produces the undesirable event”, the case that “the in- terruption does not occur, and the none-execution of the interruption module produces the undesirable event”, or the case that “the inhibition of the interrupt produces the un- desirable event.” Figure 6f shows the FT template for the global variables. This template explains that the cause of the undesirable event is the case that “one or more values of the global-variables in the whole program being incorrect produce the undesirable event.” Figure 6g shows the FT template for the local variables. This template explains that the cause of the undesirable event is that “one or more values of local variables in the focused scope being incorrect produce the undesirable event.” Figure 6h shows the FT template for the array. This template explains the cause of the undesirable event is the case that “the index of an array is under zero”, the case that “the element that the index points to does not exist”, or the case that “the value of the element that is pointed to by the index is incorrect.” Figure 6i shows the template for the pointer. This template explains that the cause of the undesirable event is the case that “the address does not ex- Information 2021, 12, 79 12 of 29 ist”, the case that “the address is incorrect”, or the case that “the stored value is incor- rect.” Here, the instruction that has the hierarchical instruction structure is called the hi- erarchical“the stored instruction. value is incorrect”. The FT template Here, the for instruction hierarchical that hasinstruction the hierarchical is deve instructionloped by com- biningstructure the existingis called the FT hierarchical templates. instruction. When a frequently The FT template used for instruction hierarchical appears, instruction the in- struction’sis developed FT template by combining is developed the existing and FT templates.added to Whenthe FT a template frequently library. used instruction Since the FT templateappears, only the instruction’sshows the relationship FT template isbetween developed before and added and after to the the FT templateexecution library. of an in- Since the FT template only shows the relationship between before and after the execution struction, there is no negative impact on the other FT template and FT development rules of an instruction, there is no negative impact on the other FT template and FT development by rulesadding by addinga new FT a new template. FT template.

assignment stmt. causes event

Inputted value causes event operator causes event

value is incorrect operand is incorrect

(a)

Block if stmt. causes event

・・・

1st if clause nth if clause else clause causes event causes event causes event

1st cond. true 1st inst. gr. nth cond. true nth inst. gr. else inst. true else inst. gr. prior to if inst. cause event ・・・ prior to if inst. cause event prior to if inst. cause event

(b)

while statement causes event

statement does not statement executes execute n-times causes event

cond. false event prior to cond. true nth iteration causes event prior to while while stmt. prior to while

(c)

Figure 6. Cont.

Information 2021, 12, x FOR PEER REVIEW 13 of 31 Information 2021, 12, 79 13 of 29

function (p1, p2, - - -)causes event

arguments in-execution of cause event function cause event

p1, p2, - - - are incorrect fail of function causing event

(d)

interrupt routine causes event

interrupt occurs interrupt does not occur cause event cause event

interrupt execution of module interrupt occurs causes event disabled

interrupt does not occur non-execution of module causes event

(e)

global variables causes event

・・・. global variable-1 global variable-2 global variable-n causes event causes event causes event

global variable-1 is set value global variable-2 is set value ・・・ global variable-n is set value

(f)

local variables causes event

・・・. local variable-1 local variable-2 local variable-n causes event causes event causes event

local variable-1 is set value local variable-2 is set value ・・・ local variable-n is set value

(g)

Element N of Array causes event

Element N does not exist Element N exist causes N < 0 causes event causes event event Value of Element N is Number of Elements is Value of Element N is incorrect calculated incorrectly less than N

(h)

Figure 6. Cont.

Information 2021, 12, x FOR PEER REVIEW 14 of 31

Information 2021, 12, 79 14 of 29

pointer causes event

Address does not exist address exists

Value of address is incorrect Content pointed by address is incorrect

Value of address is incorrect Stored content is incorrect

(i)

FigureFigure 6. (a)6. Fault(a) Fault tree tree(FT (FT)) template template for for the the Assigned Assigned Statement. Statement. ((bb)) FT FT template template for for the the If-Then-Else If-Then-Else Statement. Statement. (c) FT (c) FT templatetemplate for the for While the While Statement. Statement. (d ()d FT) FT template template forfor thethe Function Function Call. Call. (e) FT(e) templateFT template for the for Interrupt. the Interrupt. (f) FT template (f) FT template for for thethe Global Global Variables. Variables. (g(g)) FT FT template template for for the the Local Local Variables. Variables. (h) FT template(h) FT template for the Array. for the (i) FTArray. template (i) FT for template the Pointer. for the Pointer. Third, we explain the FT development rules. Figure7a–d shows the FT development rules,Third, and we Figure explain7a shows the FT the development outline of the FTrules. development Figure 7a rules.–d shows Here, the dFT FT means development de- rules,veloping and Figure FT (i.e., 7 thea shows FT to be the targeted), outline and of wFTthe meansFT development the temporary rules. FT (under-working Here, dFT means FT) used in the “Develop FT” module. As shown in Figure7a, FT development rules set the developing FT (i.e., the FT to be targeted), and wFT means the temporary FT (un- fault to the top event in dFT, and the “Develop FT” module develops the remaining part derof-working dFT. The FT) “Develop used in FT” the module “Develop clarifies FT the” module. instruction As executedshown in immediately Figure 7a, before FT devel- opmentexecuting rules the set instruction the fault to and the combines top event the in corresponding dFT, and the FT “Develop template. FT The” module tasks are devel- opsrepeated the remaining until it is part no longer of dFT. possible The to“Develop be back inFT the” module execution clarifies order of the the instructions.instruction exe- cutedFinally, immediately the instruction before that executing exists at the the forefront instruction of the and FT iscombines the fault. the Next, correspond the contentsing FT template.of the “Develop The tasks FT” are module repeated are until explained. it is no Figure longer7b showspossible the to modules be back consisting in the execution of orderthe of “Develop the instructions. FT” module. Finally, The “Develop the instruction FT” module that exists develops at the the forefront dFT corresponding of the FT is the fault.to theNext,Ig contained the contents in the of BEvent the “Develop of dFT. Additionally,FT” module the are “develop explained. wFT Figure corresponding 7b shows the to instruction (I )” module shows the wFT development procedure corresponding to the modules consistingx of the “Develop FT” module. The “Develop FT” module develops the instruction Ix; the “operation for global variables” module shows the procedure detecting dFTglobal corresponding variables in to BEvent the I andg contained adding an in FT the template BEvent for of globaldFT. Additionally, variables to the the dFT; “develop the wFT“operation corresponding for local to variables” instruction module (Ix)” showsmodule the shows procedure the detectingwFT development local variables procedure in correspondingBEvent and addingto the aninstruction FT template Ix; forthe local“operation variables for to global the dFT. variables” Figure7c showsmodule the shows theoutline procedure of the detecting “2.1 develop global dFT variables when Ig is in a hierarchicalBEvent and instruction”, adding an FT whereas template Figure for7d global variablesshows the to theoutline dFT; of the “2.2 “operation develop dFTfor local when variableIg is nots” a hierarchical module shows instruction”. the procedure By detectingcombining local the variables FT template in accordingBEvent and to the adding reverse an execution FT template sequence for from local the variables instruction to the that causes the fault to the instruction that makes the fault occur, we clarify the causes of dFT. Figure 7c shows the outline of the “2.1 develop dFT when Ig is a hierarchical in- the fault. struction”, whereas Figure 7d shows the outline of the “2.2 develop dFT when Ig is not a hierarchical instruction.” By combining the FT template according to the reverse execu- tion sequence from the instruction that causes the fault to the instruction that makes the fault occur, we clarify the causes of the fault.

Information 2021, 12, x FOR PEER REVIEW 15 of 31 Information 2021 , 12, 79 15 of 29

START

Φ -> dFT

Decide top event Event1 : operation and instruction I1 : module

Top event Event1 -> BEvent in dFT : judgment

Yes. All BEvents in dFT are null ? No. dFT -> FT

Develop FT (dFT) STOP

(a) Develop FT(dFT) 1. initialize wFT.

2. develop dFT corresponding to Ig included in all BEvents. 2.1. develop dFT when Ig is hierarchical instruction. 2.2. develop dFT when Ig is not hierarchical instruction. 3.return dFT

develop wFT corresponding to instruction (Ix) : develop wFT for Ix (1) regard FT template corresponding to Ix as wFT. (2) set BEvent in dFT to TEvent in wFT. (3) calculate all BEvents in wFT.

operation for global variables: development of dFT when global variables are included in BEvents (1) FT that is added FTT for global variables to BEvent in wFT regards new wFT (2) write all BEvent in wFT (3) combine dFT to wFT

operation for local variables: development of dFT when local variables are included in BEvents (1) FT that is added FTT for local variables to BEvent in wFT regards new wFT (2) write all BEvent in wFT (3) combine dFT to wFT

(b)

Figure 7. Cont.

Information 2021, 12, x FOR PEER REVIEW 16 of 31

Information 2021, 12, 79 16 of 29

2.1. develop dFT when Ig is hierarchical instruction. (1) obtain hierarchical instruction that Ig belongs, and get maximum layer Lmax and layer L that Ig belongs (2) k = Lmax

(3) Develop dFT at the layer that hierarchical instruction Ig (3-1) develop wFT corresponding to instruction (Ig) (3-2) following tasks are conducted for all BEvent in wFT (3-2-1) when global variables are contained in BEvent conduct operation of global variables, and k = k - 1 (3-3-2) when local variables are contained in BEvent conduct operation of local variables, and k = k - 1 (3-2-3) when global variables and local variables are contained in BEvent conduct operation of global variables and local variables, and k = k - 1

(4) Develop dFT at except hierarchical instruction that Ig belongs (4-1) when L < k < Lmax

(4-1-1) develop wFT corresponding to instruction (instruction II at k-th layer) (4-1-2) following tasks are conducted for BEvent in wFT (4-1-2-1) when global variables are contained in BEvent conduct operation of global variables, k = k – 1, and go to (4) (4-1-2-2) when local variables are contained in BEvent conduct operation of local variables,k = k - 1,and go to (4) (4-1-2-3) when global variables and local variables are contained in BEvent conduct operation of global variables and local variables, k = k - 1,and go to (4) (4-2) when k = L

(4-2-1) develop wFT corresponding to instruction (instruction Ii at k-th layer) (4-2-2) following tasks are conducted for BEvent in wFT (4-2-2-1) when global variables are contained in BEvent conduct operation of global variables, and k = 0 (4-2-2-2) when local variables are contained in BEvent conduct operation of local variables, and k = 0 (4-2-2-3) when global variables and local variables are contained in BEvent conduct operation of global variables and local variables, and k = 0

(c)

2.2. develop dFT when Ig is not hierarchical instruction. (1) develop wFT corresponding to instruction (Ig) (2) conduct following tasks for all BEvents in wFT (2-1) when global variables are contained in BEvent conduct operation of global variables (2-2)when local variables are contained in BEvent conduct operation of local variables (2-3) when global and local variables are contained in BEvent conduct operation of global variables and local variables

(d)

Figure 7. (a)Figure FT development 7. (a) FT development rules—Outline rules—Outline.. (b) FT development (b) FT development rules rules—Detail—Detail of Develop of Develop FT FT module mod- . (c) FT develop- ment rules—ule.Detail (c) FTof development“Develop dFT rules—Detail” when Ig is of hierarchical “Develop dFT”. (d when) FT development Ig is hierarchical. rules (d).— FTDetails development of “Develop dFT” when Ig is not hierarchicalrules.—Details. of “Develop dFT” when Ig is not hierarchical. 3.4. Safety Analysis Support Environment 3.4. Safety Analysis Support Environment This subsection explains the safety analysis support environment combined with the contents described inThis Sections subsection 3.1–3.4. explains the safety analysis support environment combined with the Figure8 showscontents an outlinedescribed of the in Se proposedctions 3.1 safety–3.4. analysis support environment, consisting of the FMEAFigure support 8 shows tool, anthe outline FTA support of the tool, proposed and the designsafety andanalysis safety support environment, information databaseconsisting (DSDB). of the FMEA support tool, the FTA support tool, and the design and safety information database (DSDB).

Information 2021, 12, x FOR PEER REVIEW 17 of 31 Information 2021, 12, 79 17 of 29

FMEA preparation FTA preparation Analysis of existing FMEA results Analysis of existing control software

Derive common failure modes Investigate standard instructions and standard measures Develop common failure mode list Develop FT templates for and standard measure list each instruction

FMEA procedure using lists FTA procedure using FT templates

Risk management procedure

FMEA Support tool FTA support tool

Safety Analyzing Safety Analysis environment Design and Safety information database procedure Feedback of assessment result

Figure 8. OutlineFigure of the 8. O proposedutline of safetythe proposed analysis safety support analysis environment. support environment.

The FMEA supportThe FMEA tool is support used in developingtool is used thein developing requirement the specifications requirement andspecifications the and the preliminary design specifications in Figure1. The inputs of the FMEA support tool are preliminary design specifications in Figure 1. The inputs of the FMEA support tool are all all functions extracted from DSDB. The outputs of the FMEA support tool are the results functions extracted from DSDB. The outputs of the FMEA support tool are the results of of FMEA. FMEA. The FTA support tool is used in the step for developing the detailed design specifica- The FTA support tool is used in the step for developing the detailed design specifi- tions and program in Figure1. The FTA support tool’s inputs are the fault and the CSW’s cations and program in Figure 1. The FTA support tool’s inputs are the fault and the design information extracted from DSDB. The faults are items that cannot be modified CSW’s design information extracted from DSDB. The faults are items that cannot be when FMEA is conducted and items clarified after completing the CSW development. The modified when FMEA is conducted and items clarified after completing the CSW de- outputs of the FTA support tool are the FTs for the fault. velopment. The outputs of the FTA support tool are the FTs for the fault. DSDB manages the data necessary for conducting FMEA and FTA. Figure9 shows DSDB manages the data necessary for conducting FMEA and FTA. Figure 9 shows the outline of the DSDB data structure. DSDB consists of the following tables: require- the outline of the DSDB data structure. DSDB consists of the following tables: require- ment specification, preliminary design specification, detailed design specification, module, ment specification, preliminary design specification, detailed design specification, mod- common failure mode and countermeasure, function–common failure mode correspon- ule, common failure mode and countermeasure, function–common failure mode corre- dence, actual failure and impact, an actual countermeasure for failure, fault, module–fault spondence, actual failure and impact, an actual countermeasure for failure, fault, mod- correspondence, fault tree, and actual countermeasure for fault. Those tables are created ule–fault correspondence, fault tree, and actual countermeasure for fault. Those tables are in the unit of the CSW’s version. The tables for each specification include information created in the unit of the CSW’s version. The tables for each specification include infor- about CSW’s functions. The module table maintains the source code of the CSW. The mation about CSW’s functions. The module table maintains the source code of the CSW. common failure mode and countermeasure table include all common failure modes and all The common failure mode and countermeasure table include all common failure modes countermeasures for the failure. The function–common failure mode correspondence table and all countermeasures for the failure. The function–common failure mode corre- manages all common failure modes applied to each function. The actual failure and impact spondence table manages all common failure modes applied to each function. The actual table includes concrete failure and its negative impact. The actual countermeasure for fail- ure table managesfailure the actual and impact countermeasures table include applieds concrete to the failure CSW. The and fault its negative table registers impact. The actual all investigatedcountermeasure faults. The module–fault for failure correspondence table manages table the actualmanages countermeasures the information applied to the related to the actualCSW. conduction The fault tab of FTA.le registers The fault all treeinvestigated table manages faults. the The result module of FTA–fault (FTs) correspondence calculated by thetable FTA manages support the tool. information The actual related countermeasure to the actual for theconduction fault table of manages FTA. The fault tree ta- the concrete countermeasuresble manages the for result the fault. of FTA (FTs) calculated by the FTA support tool. The actual countermeasure for the fault table manages the concrete countermeasures for the fault.

Information 2021, 12, x FOR PEER REVIEW 18 of 31 Information 2021, 12, 79 18 of 29

develop tables for each version unit

FigureFigure 9.9. OutlineOutline ofof designdesign andand safetysafety informationinformation databasedatabase (DSDB)(DSDB) datadata structure.structure.

4.4. Application and Evaluation ThisThis sectionsection describes describes the the application application and and evaluation evaluation of the of proposed the proposed method. method.Section Sec- 4.1 describestion 4.1 describes the outline the of outline the target of the system. target Sectionsystem. 4.2 Section describes 4.2 describes the design the infor- of CSW. Section 4.3 describes the result of the safety analysis. Section 4.4 describes the benefits mation of CSW. Section 4.3 describes the result of the safety analysis. Section 4.4 de- of the proposed method. Section 4.5 describes the list of limitations of the proposed method. scribes the benefits of the proposed method. Section 4.5 describes the list of limitations of 4.1.the proposed Outline of themethod. Target System We applied the proposed method to the CSW conducting communication between 4.1. Outline of the Target System the control equipment installed into the space station according to the MIL-STD-1553B protocol.We applied The communication the proposed between method the to control the CSW equipment conducting is conducted communication via the databetween bus. Thethe communicationcontrol equipment control installed is managed into the by space BUS-61553B station (Device according Corporation to the MIL Inc.,-STD Bohemia,-1553B NY,protocol. USA), The and communication data setting tasks between and obtaining the control tasks equipment are managed is conducted by a CPU via madethe data by Intelbus. TheInc. communication(Santa Clara, CA, control USA) isvia managed the shared by memory BUS-61553B between (Device the CorporationBUS-61553B andInc., CPU.Bohemia, Most NY, tasks USA), conducted and dat bya BUS-61553Bsetting tasks are and executed obtaining automatically, tasks are managed whereas by the a tasksCPU conductedmade by byIntel the Inc. CPU (Santa are conducted Clara, CA, by the USA) program via written the shared in the memory C language. between The lines the ofBUS code-61553B (LOC) and of CPU. CSW Most is approximately tasks conducted 800. by BUS-61553B BUS-61553B conducts are executed the communication automatically, betweenwhereas thethe controllertasks conducted and the by remote the CPU terminal are conducted (RT). The by controller the program has a masterwritten roll,in the and C thelanguage. RT has The a slave lines roll. of Thecode communication (LOC) of CSW startsis approximately after the controller 800. BUS turns-61553B on the conducts power switchthe communication of RT (interrupt between of power the controller on). The control and the signals remote that terminal RT receives (RT). areThe as controller follows: interruptionhas a master of roll, broadcast and the command RT has a (BC) slave reception, roll. The interruption communication of RT starts response after reception, the con- andtroller interruption turns on the of thepower control switch cycle of (data RT (interrupt set, command of power judgment, on). The and control Health signals and Status that (H&S)RT receives data creation).are as follows: When interrupt there is anion interruption of broadcast of command BC reception, (BC) BUS-61553Breception, interrup- writes thetion contents of RT response of the received reception, command and interrupt to theion stack of andthe revisescontrol thecycle data (data reference set, command pointer. Whenjudgment, there and isan Health interruption and Status of RT(H&S) response data creation). reception, When CSW there creates is andan interrupt sends theion RT of data.BC reception, When there BUS is- an61553B interruption writes the of thecontents control of cycle, the received RT investigates command the to communication the stack and statusrevises stored the data in the reference memory poin andter. conducts When anthere adequate is an interrupt task correspondingion of RT response to the received recep- commandtion, CSW variation.creates and The sends reception the RT frequency data. When of the there command is an interrupt is approximatelyion of the one control time percycle, 125 RT ms investigates (8 Hz), and the the communication cycle of interruptions status stored of the in control the memory cycle isand 125 conducts ms, which an occursadequate asynchronously. task corresponding to the received command variation. The reception fre- quency of the command is approximately one time per 125 ms (8 Hz), and the cycle of interruptions of the control cycle is 125 ms, which occurs asynchronously.

Information 2021, 12, x FOR PEER REVIEW 19 of 31

Information 2021, 12, 79 19 of 29 4.2. Design Information on the CSW

4.2. DesignFigure Information 10a–c shows on the CSWdesign result of the communication CSW. Here, we focus on and discuss the contents of process 3 in Figure 10b,c. Generally, SADM describes DFDs, Figure 10a–c shows the design result of the communication CSW. Here, we focus on CFDs, and CSPECs in a separate sheet, but we describe this information in the same sheet and discuss the contents of process 3 in Figure 10b,c. Generally, SADM describes DFDs, to save space. Figure 10a explains the DCD and CCD. Figure 10b shows the DFD, CFD, CFDs, and CSPECs in a separate sheet, but we describe this information in the same sheet and CSPEC in the second layer. The interruption of BC reception and the interruption of to save space. Figure 10a explains the DCD and CCD. Figure 10b shows the DFD, CFD, and the control cycle (command judgment) activate process 3, and the interruption of RT re- CSPEC in the second layer. The interruption of BC reception and the interruption of the sponse reception and the interruption of the control cycle activate process 2. Figure 10c control cycle (command judgment) activate process 3, and the interruption of RT response shows the DFD, CFD, and CSPEC related to process 3. When the interruption of BC re- reception and the interruption of the control cycle activate process 2. Figure 10c shows the ception occurs, the processes (3.1, 3.2, 3.3, and 3.7) are executed by the activation table. DFD, CFD, and CSPEC related to process 3. When the interruption of BC reception occurs, Those processes’ concrete operations are as follows: increment stack pointer, set neces- the processes (3.1, 3.2, 3.3, and 3.7) are executed by the activation table. Those processes’ sary data to descriptor stack, and register data blocks in the BC command to the memory. concrete operations are as follows: increment stack pointer, set necessary data to descriptor Those processes are conducted by BUS-61553B automatically. When the control cycle stack, and register data blocks in the BC command to the memory. Those processes are conducted(command byjudgment) BUS-61553B interrupt automatically.ion occurs, When the processes the control (3.4, cycle 3.5, (commandand 3.6) are judgment) activated interruptionby the activation occurs, table. the The processes concrete (3.4, operations 3.5, and 3.6)are as are follows: activated extract by the the activation descriptor table. that Theis referred concrete by operations the stackare pointer as follows: from extract the stack, the investigatedescriptor that the is status referred of the by thereceived stack pointercommand, from and the extract stack, investigatedata blocks thethat status are referred of the received by the address command, written and in extract descriptor data blocksfrom memory. that are referred by the address written in descriptor from memory.

GHF command data transfer 1553B message incomplete Controller Device

Status word H&S data 7.Emergency stop Experiment data communicat Crew showing data PW. ON e between BC reception Controller GMT data RT response reception and Remote File transfer group error Terminal File transfer block error 0

Experiment data Scheduler Crew showing data Control cycle(data set) Telemetry Control cycle(command judgment) editing Control cycle(H&S data creation) DCD communicate between controller and Remote terminal

(a)

Figure 10. Cont.

Information 2021, 12, x FOR PEER REVIEW 20 of 31 Information 2021, 12, 79 20 of 29

1.PW. BC reception ON Control cycle(command judgment) *1 GMT data GHF command set Look up receive data transfer incomplete RT mode table command File transfer group error 1 3 Current File transfer block error aera 7.Emergency Send data block stop File transfer number 1553B message status Command Data block not set response *1 Address position File length Status word conduct create H&S data RT response Health & Experiment data 2 Status data 4 Crew showing data Control cycle (H&S data creation) Experiment data H&S data RT response reception Crew showing data Control cycle(data set) process ID 1 2 3 4 PW. ON 1 (b) BC reception, Control cycle(command judgment) conduct Current area 1553B message command File transfer status increment stack response pointer for 3.7 command count 3.1

stack store necessary store Look up table pointer info to descriptor data blocks *1 stack 3.3 3.2 data descriptor block *2 stack Received command word 5.Cycle(command judgement) Block status confirm extract data GMT data conduct action received word blocks GHF command command 3.5 Data according to status block received data transfer incomplete 3.4 command File transfer group error *2 *1 3.6 File transfer block error Send data block number process ID 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Data block not set Emergency BC reception 1 2 3 4 Address position Control cycle stop (command 1 2 3 File length judgment) (c)

FigureFigure 10.10. ((aa)) DesignDesign Result Result of of Communication Communication CSW–Data CSW–Data Context Context Diagram Diagram (DCD) (DCD and) and Control Control Context Context Diagram Diagram (CCD). (CCD (b)). Design(b) Design Result Result of Communication of Communication CSW–Data CSW– FlowData DiagramFlow Diagram (DFD), (DFD Control), Control Flow Diagram Flow Diagram (CFD), and(CFD Control), and Control Specifications Speci- fications (CSPEC) in the Second Layer. (c) Design Result of Communication CSW–DFD, CFD, and CSPEC of Process 3. (CSPEC) in the Second Layer. (c) Design Result of Communication CSW–DFD, CFD, and CSPEC of Process 3.

4.3. Safety Analysis Results Related to CSW This subsection describes the results of the safety analysis to CSW of the communi- cation control equipment. The safety analysis of the existing CSW of the communication

Information 2021, 12, 79 21 of 29

4.3. Safety Analysis Results Related to CSW This subsection describes the results of the safety analysis to CSW of the communi- cation control equipment. The safety analysis of the existing CSW of the communication control equipment was conducted by the . The engineers who designed this CSW had sufficient experience that the engineers developed more than 10 CSWs of the aerospace systems and had enough knowledge about the SADM and the safety analysis method. The developed the CSW obeying the guideline [39]; while the engineers who conduct safety analysis using the proposed method had the same experience and knowledge and mastered the proposed method additionally. First, we explain the results of FMEA, which is conducted using the information of DFDs, CFDs, and CSPECs in the second and/or third layer from DSDB. Table3 shows the results of FMEA conducted to the functions of process 3.X. There are no countermeasures for the processes (3.1, 3.2, 3.3, and 3.7) because BUS-61553B activates those processes automatically (by hardware). In process 3.4, there are no countermeasures that can be added because the format of the data blocks sent and/or received is defined by the hardware and protocol of BUS-61553B. In process 3.5, the reliability of received data blocks can be improved by adding the error-recovering bit. In process 3.6, the operation’s reliability corresponding to the received command can be improved by registering the obtained data multiplexed. As a result, we add the following countermeasures: add the error-recovering bit in the data block and register the obtained data multiplexed. The total man-hours in conducting FMEA and planning countermeasures are approximately 20 h. (The man-hours of developing DFD, CFD, and CSPEC, as well as the man-hours of modifying the CSW, are not also included.)

Table 3. Result of FMEA.

Common Process ID Function Causes Impact to System Severity Probability Detection Rate Risk Countermeasures Failure Modes increment stack pointer for Not Applicable malfunction of malfunction of cannot receive 3.1 command count High Low Low High (NA) (use high hardware BUS61553B commands (conducted by reliability parts) BUS61553B) store necessary info to malfunction of malfunction of cannot receive NA (use high 3.2 descriptor stack High Low Low High hardware BUS61553B commands reliability parts) (conducted by BUS61553B) store data blocks malfunction of malfunction of cannot receive NA (use high 3.3 High Low Low High (conducted by hardware BUS61553B commands reliability parts) BUS61553B) activation cannot receive conditions no sync data High Low Low High commands failure multiplexing timer completion cannot access interruption, conditions descriptor multiplexing failure stack descriptor error of pointer, inadequate descriptor multiplexing input data stack descriptor confirm pointer (Spec. of command 3.4 received inadequate BUS61553B) reception command status output data error on inadequate NA NA - - - - - algorithm multiplexing program malfunction of cannot receive error bit for High Low Low High destruction memory commands command reception back up error NA NA - - - - security error NA NA - - - - - Information 2021, 12, 79 22 of 29

Table 3. Cont.

Common Process ID Function Causes Impact to System Severity Probability Detection Rate Risk Countermeasures Failure Modes operation NA NA - - - - procedure miss malfunction of malfunction of Electronic NA - - - - hardware Control Unit (ECU) activation cannot finish cannot receive countermeasure conditions High Low Low High process 3.4 commands in process 3.4 failure multiplexing completion cannot access cannot receive data block conditions High Low Low High data blocks commands pointer (Spec. of failure BUS61553B) cannot conduct action increase retry inadequate data block corresponding to Middle Low High Low number of input data error received sending command command (Spec. 3.5 extract data of 1553B blocks inadequate command request to send Middle Low High Low protocol) output data reception error command inadequate NA NA - - - - - algorithm program malfunction of cannot extract add check bit to High Low High Low destruction memory data blocks data block back up error NA NA - - - - - security error NA NA - - - - - operation NA NA - - - - - procedure miss malfunction of malfunction of NA - - - - - hardware ECU activation cannot finish cannot response countermeasure conditions High Low Low High process 3.5 for command in process 3.5 failure completion conditions NA NA - - - - - failure cannot conduct add command inadequate adequate cannot response High Low Low High send request input data command for command function response cannot conduct add command conduct action inadequate adequate cannot response High Low Low High send request according to output data command for command 3.6 function received response command inadequate NA NA - - - - - algorithm program NA NA - - - - - destruction back up error NA NA - - - - - security error NA NA - - - - - operation NA NA - - - - - procedure miss malfunction of malfunction of cannot response use redundant High Low High Low hardware ECU for command system conduct command malfunction of malfunction of cannot response 3.7 response ----- hardware BUS61553B for command (conducted by BUS61553B)

Next, we describe the result of the FTA. Here, we discuss the FT related to the fault that loses the BC command except for the BC command received immediately before when CSW receives plural BC commands. Figure 11a–d shows the FTs. Figure 11a is the FT corresponding to the operation that pops the descriptors from the stack. We found that Information 2021, 12, 79 23 of 29

the causes of this fault are as follows: (A) the algorithm of process 3.5 is inadequate, (B) the value of the stack pointer is not the value of the lost BC commands, (C) the value of the descriptor is not the value of the lost BC command, and (D) the obtained data block is not the data block of the lost BC command. Figure 11a–d shows the detailed FT related to causes (A–D), respectively. Figure 11a shows the FT related to cause (A). We then explain the preliminary events obtained by conducting FTA: “the algorithm of process 3.5 is inadequate”. As the results of analyzing Figure 11b–d, causes (B–D) are revised by BUS-61553B simultaneously, and those refer to the command received immediately before. As a result of investigating process 3.6, the algorithm obtains (pops) the data block referred to as the descriptor at the top of the stack. This algorithm is adopted because the frequency of the interruption of BC reception and the cycle of the interruption of the control cycle is the same. However, practically those interruptions occur asynchronously. Additionally, there exists the deviation of the frequency of occurrence of the interruption of completion of BC reception. Therefore, when plural receptions of BC within an occurrence of interruption of the control cycle exist, it finds that the data blocks are not obtained (popped), except for BC’s data block immediately before. As a result, we clarified the causes (preliminary events) related to BC’s loss, except for what the BC received immediately before. We also found a possibility that there will be stack overflow and memory (data area) destruction when CSW is used for the long term because the non-extracted (popped) data blocks remain as the garbage in the stack, the descriptor stack, and data block area. As a result, we proposed the following countermeasures when the interruption of the control cycle Information 2021, 12, x FOR PEER REVIEW 24 of 31 occurs: (I) add an operation that extracts (pops) the data blocks until the stack pointers and descriptor stack become empty, (II) add the data escape area that temporarily saves the read data block until it can be processed, and (II) add a flag that shows that there exist data blocks waiting foradd the a flag conduction. that shows As that a there result exist of considerationdata blocks waiting of the for CSW’s the conduction. resources As and a result of consideration of the CSW’s resources and the modification costs, the countermeasure the modification costs, the countermeasure (III) is only applied to the CSW because the (III) is only applied to the CSW because the frequency of this fault is much rarer. The total frequency of thisman fault-hours is much of creating rarer. th Theis FT total and man-hoursplanning the of countermeasures creating this FT is approximately and planning 10 h the countermeasures(the man is approximately-hours of developing 10 h DFD, (the CFD, man-hours and CSPEC of developing, as well as the DFD, man CFD,-hours andof mod- CSPEC, as well asifying the CSW man-hours, are not included.) of modifying CSW, are not included).

loss BC cmd

cycle(cmd judgment) cause event

cycle(cmd judgment) cycle(cmd judgment ) INT occur INT does not occur

timer malfunction cycle(cmd judgment) there is not received cmd INT occur in extracted cmd in 3.6

(A) inadequate algorithm process 3.5 inadequate data block inadequate descriptor of cmd(incorrect cmd) (incorrect descriptor)

set data blocks in set descriptor in process 3.5 process 3.2

(B) stack pointer value (D) data block value (C) descriptor value is not stack pointer of is not descriptor of is not descriptor of lost cmd lost cmd lost cmd *3 *2 *1

(a)

Figure 11. Cont.

Information 2021, 12, x FOR PEER REVIEW 25 of 31

Information 2021, 12, 79 24 of 29

*1

inadequate descriptor set in process 3.2 *4 receive BC cmd INT

BC cmd INT does BC cmd INT occur not occur INT controller malfunction BC cmd receive process 3.2 causes INT occurs event

algorithm of process descriptor value not descriptor value = 3.2 inadequate = descriptor value descriptor value of of lost cmd lost cmd

descriptor is not descriptor is descriptor of lost cmd descriptor of lost cmd

*4 (b) *2

inadequate data block set in process 3.3 *5 receive BC cmd INT

BC cmd INT does not BC cmd INT occur occur INT controller malfunction BC cmd receive process 3.3 causes INT occurs event

algorithm of process data block value data block value = 3.3 inadequate not = data block data block value of value of lost cmd lost cmd

data block is not data data block is data block of lost cmd block of lost cmd

*5 (c)

Figure 11. Cont.

Information 2021, 12, x FOR PEER REVIEW 26 of 31

Information 2021, 12, 79 25 of 29

*3

inadequate stack pointer set in process 3.1 *6 receive BC cmd INT

BC cmd INT does not BC cmd INT occur occur INT controller malfunction BC cmd receive process 3.1 causes INT occurs event

algorithm of process stack pointer value stack pointer value = 3.1 inadequate not = stack pointer stack pointer value of value of lost cmd lost cmd stack pointer is not stack pointer is stack stack pointer of lost pointer of lost cmd cmd

*6 (d)

Figure 11. (a). DevelopedFigure 11. ( FTa).— DevelopedOutline of FT—Outline FT. (b) FT— ofDetail FT. (b ) of FT—Detail (C) Descriptor of (C) DescriptorValue Is Incorrect Value Is. Incorrect.(c) Developed FT—Detail of (D)(c Data) Developed Block Value FT—Detail Is Incorrect of (D). ( Datad) Developed Block Value FT— IsDetail Incorrect. of (B) (d )Stack Developed Pointer FT—Detail Value Is Incorrect of (B) Stack. Pointer Value Is Incorrect.

4.4. Benefits of the Proposed Method First, the benefits of the usage of the proposed FMEA method are described. In conventional FMEA, the definition of a failure mode is different depending on the analyst. Therefore, it could be said that the result of FMEA depended on the analyst. In the proposed FMEA, common failure modes were defined as deviations from the original usage of the function. This made it easy to correspond between the processes in DFD and the functions and to conduct FMEA without depending on the granularity of the function. As a result, the analyst with a certain level of experience and knowledge can obtain the same adequate FMEA results. Second, the benefits of the usage of the proposed FTA method are described. In conventional FTA, the completeness of the FTA results was different depending on the experience and knowledge of the analyst. Additionally, the well-experienced analyst sometimes omitted the obvious parts of the developed FT. Therefore, when the other analyst investigated the FT developed by the well-experienced analyst, the other analyst sometimes could not understand how the well-experienced analyst came to such a conclusion. In the proposed FTA, the CSW’s instruction was defined as the minimum component of the FT, and the FT development rules were defined. As a result, the analyst with a certain level of experience and knowledge became able to obtain the same adequate FTA results. Third, the whole proposed method is described. In the proposed method, the design information and safety analysis results could be managed unitarily, and those are shared in the whole CSW’s development process. As a result, we could conduct designing and safety analysis in pairs in the adequate CSW’s development stage and become able to feedback the safety analysis results to the design information. This reduced the additional works related to the implementation of safety countermeasures after the completion of the CSW. Finally, we could develop the CSW with an adequate program structure (CSW without the conflicts between the normal functions and the safety functions). By rigorously applying

Information 2021, 12, 79 26 of 29

the proposed method, the CSW’s design and safety analysis have come to be carried out appropriately. As a result, engineers with a certain level of experience and ability can develop CSWs with high quality. Furthermore, by the reduction of backtracking work, a reduction of development cost is expected. From the above-mentioned results, we can conduct the reproducible safety analysis and develop the adequate CSW that is reflected in the safety analysis results. In the future, we will enrich the variety of the common failure modes and the FT templates by feed- backing the other development results. We consider that the benefits of the usage of the proposed method will increase.

4.5. List of Limitations The proposed method can cooperate with FMEA and FTA by managing CSW’s design information and safety analysis results unitarily. In the CSW’s upper development process, the comprehensive failure countermeasures can be adopted to the CSW by conducting FMEA, whereas in the CSW’s lower development process, the safety analysis and coun- termeasures for a specific fault can be adopted by conducting FTA. Additionally, safety analysis can be done within a reasonable time. Therefore, the reliability of CSW improves by applying the proposed method. By applying and evaluating the proposed method, we clarified the following problems.

4.5.1. The Issue of the CSW’s Size Recently, the CSW became larger (the scale of avionics software of the newest aircraft has reached 20 million LOC). Because the CSW has many functions, it is difficult to conduct FMEA for all functions in the given development time. When conducting FTA, it is difficult to reverse-trace the instructions because a function consists of codes in many modules (classes and methods). For FMEA, a method that can analyze safety from the upper to lower layers step by step by classifying and dividing the functions hierarchically will be investigated. For FTA, the analysis method for a validated black box module, such as modules completed in single-unit testing and combined testing, will be investigated. Additionally, we investigated the adequate design guideline because it is necessary to conduct adequate function dividing (adequate functional granularity) and good design (low coupling and high cohesion) to realize the methods mentioned above.

4.5.2. The Issue Related to the Conflicts between Countermeasures and the Addition of New Risk Generally, the safety of the CSW will be improved by repeating safety analysis because the problems that the CSW has have been clarified, and the countermeasures are applied. If additional countermeasures are applied to the CSW portion that has already been applied with countermeasures, conflicts between old countermeasures and new countermeasures will occur. As a result of adding a new countermeasure to a function, there is a possibility of negatively impacting other functions. In this case, the following countermeasures are required: applying only the countermeasure that the function has a greater negative impact and redesigning the CSW that does not have conflicts. We investigated a method to judge whether redesigning or not was based on the risk evaluation results using Figure3.

4.5.3. The Issue of Attacks The proposed method improves safety by detecting the problems of the CSW’s func- tions based on good design; however, enough countermeasures for the attack from the outside of CSW have not been considered. In the future, because it is considered that CSWs will become the Internet of things (IoT), we have to investigate security in addition to safety. Information 2021, 12, 79 27 of 29

4.5.4. The Issue of Using an Object-Oriented Programming Language Recently, there is an increase in cases of CSW developments using an object-oriented programming language. However, it is not easy to apply the proposed method to the CSW developed using an object-oriented programming language. FMEA can apply to those CSWs by corresponding the unit of the function to the classes’ methods. Applying FTA to CSWs requires developing a new FT template corresponding to the object-oriented pro- gramming language. Additionally, the FT development rules cannot deal with inheritance and polymorphism; thus, we have to investigate applying the proposed method to the object-oriented language.

4.5.5. The Issue of Other Safety Analysis Methods The proposed method adopts FMEA and FTA as safety analysis methods. Recently, accidents that have not been considered before have occurred because of the increasing complexity of the CSW’s functions. In the future, the proposed method should adopt other safety analysis methods, such as HAZOP, STPA, and the Functional Resonance Analysis Method (FRAM) [40], to realize a safer CSW.

5. Conclusions and Future Works This paper proposes a method that develops a safer CSW and a safety analysis envi- ronment by managing the CSW’s design information and safety analysis results unitarily and cooperating with multiple safety analysis methods (FMEA and FTA). Additionally, the proposed method and the environment were applied to the development and safety analysis of the communication CSW installed into control equipment on the space system. It was found that we can plan adequate countermeasures for realizing safety CSW within an adequate analysis time. The points to be noted when using the proposed method are now described. The proposed method becomes effective by managing design information and safety analysis results unitarily throughout the development process. The organizations need to define the CSW’s development standards that adapt to the proposed method and supervise the engineers to follow standards. Additionally, it is necessary to adopt the proposed method to the development standards, since the development process is different for each organization. We now detail future works. At first, we will apply the proposed method and the environment to more CSWs and improve them by reflecting on the application results. Next, since there are restrictions described in Section 4.5 to using the proposed method, these will have to be solved. It is necessary to propose a safety analysis method for attacks through the Internet because CSW will always connect to the Internet. Additionally, it is necessary to propose a safety analysis method for AI modules (machine learning modules) because the AI modules will be added to the conventional CSW to improve CSW’s functions. Additionally, we will investigate the proposed method by adopting new safety analysis methods, because many new analysis methods are proposed, such as HAZOP, STPA, and FRAM, etc.

Author Contributions: Conceptualization, M.T.; Discussion, M.T., Y.A. and Y.W.; writing—original draft preparation, M.T.; writing—review and editing, M.T., Y.A. and Y.W.; supervision, M.T.; funding acquisition, M.T. All authors have read and agreed to the published version of the manuscript. Funding: This research was supported by a Grant-in-Aid for Scientific Research (C) of the Japan Society for the Promotion of Science, grant number 19K04920, title, “Integrated analysis method for hazard caused by software interaction cooperating with multiple safety analysis methods”. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Information 2021, 12, 79 28 of 29

Data Availability Statement: The part of data presented in this study are available on request from the corresponding author. The data are not publicly available due to limitation of the security for the product. Conflicts of Interest: The authors declare no conflict of interest.

References 1. WIRED. Google’s Self-Driving Car Caused Its First Crash. 2016. Available online: https://www.wired.com/2016/02/googles- self-driving-car-may-caused-first-crash (accessed on 19 January 2021). 2. Leveson, N.; Tuner, C. An Investigation of the Therac-25 accidents. IEEE Comput. 1993, 26, 18–41. [CrossRef] 3. Japan Aerospace Exploration Agency (JAXA). Operation Plan of X-ray Astronomy Satellite ASTRO-H (Hitomi). 2016. Avail- able online: https://global.jaxa.jp/press/2016/04/20160428_hitomi.html; https://www.jaxa.jp/press/2016/05/files/201605 24_hitomi_01_j.pdf (accessed on 19 January 2021). (detailed report; (In Japanese)). 4. International Electro Technical Commission. International IEC Standard ICE 60812:2006, Analysis Techniques for System Reliability– Procedure for Failure Mode and Effects Analysis (FMEA); International Electrotechnical Commission: Geneva, Switzerland, 2006. 5. International Electro Technical Commission. International IEC Standard ICE 61025:2006, Fault Tree Analysis; International Elec- trotechnical Commission: Geneva, Switzerland, 2006. 6. Hatley, D.; Pirbhai, I. Strategies for Real-Time System Specification; Dorset House Publication: New York, NY, USA, 1988. 7. Information-Technology Promotion Agency, Japan. Software Reliability Enhancement Center, White Paper of Embedded Software Development 2017; Information-Technology Promotion Agency Japan: Tokyo, Japan, 2017. (In Japanese) 8. International Electro Technical Commission. International IEC Standard ICE 61508, Functional Safety of Electrical/Electronic/ Programmable Electronic Safety-Related Systems; International Electrotechnical Commission: Geneva, Switzerland, 2010. 9. International Organization for Standardization. ISO26262 Road Vehicles–Functional Safety; International Organization for Stan- dardization: Geneva, Switzerland, 2011. 10. International Electro Technical Commission. International IEC Standard ICE 61882:2001 Hazard and Operability Studies; International Electrotechnical Commission: Geneva, Switzerland, 2016. 11. International Electro Technical Commission. International IEC Standard ICE 82304-1 Health Software—Part 1: General Requirements for Product Safety; International Electrotechnical Commission: Geneva, Switzerland, 2016. 12. International Society for Pharmaceutical Engineering. GAMP5 A Risk-Based Approach to Compliant GxP Computerized Systems; International Society for Pharmaceutical Engineering: North Bethesda, MD, USA, 2008. 13. Takahashi, M.; Nanba, R.; Fukue, Y. A proposal of operational risk management method using FMEA for drug manufacturing computerized system. Trans. Soc. Instrum. Control Eng. 2012, 48, 285–294. (In Japanese) [CrossRef] 14. Morita, M. Reduction of Software Bugs using FMEA, Japanese Software Quality Control -Case Studies. Union Jpn. Sci. Eng. 1990, 1990, 461–486. (In Japanese) 15. Niwa, M. Improvement of Reliability for System Design using FMEA, Japanese Software Quality Control -Case Studies-. Union Jpn. Sci. Eng. 1990, 1990, 467–475. (In Japanese) 16. Goddard, L. Validating the Safety of Embedded Real-Time Control Systems Using FMEA. In Proceedings of the Annual Reliability and Maintainability Symposium, Atlanta, GA, USA, 26–28 January 1993; pp. 227–230. 17. Snooke, N.; Price, C. Model-driven automated software FMEA. In Proceedings of the Reliability and Maintainability Symposium, Reliability and Maintainability Symposium, Lake Buena Vista, FL, USA, 24–27 January 2011; pp. 1–7. 18. Lazarus, K.; Nggada, S. Software Failure Analysis using FMEA. Int. J. Softw. Eng. Appl. 2018, 12, 19–28. [CrossRef] 19. Kim, H. SW FMEA for ISO-26262 Software Development. In Proceedings of the APSEC2014, 2014 21st Asia-Pacific Conference (APSEC), Santiago, Chile, 1–5 November 2014; pp. 1–5. 20. Batbayar, K. Medical device software risk assessment using FMEA and fuzzy linguistic approach: Case study. In Proceedings of the SACI, 2016 IEEE 11th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 12–14 May 2016; pp. 1–5. 21. Yang, S.; Bian, C.; Li, X.; Tan, L.; Tang, D. Optimized fault diagnosis based on FMEA-style CBR and BN for embedded software system. Int. J. Adv. Manuf. Technol. 2018, 94, 3441–3453. [CrossRef] 22. Weber, W.; Tondok, H.; Bachmayer, M. Enhancing Software Safety by Fault Trees: Experiences from an Application to Flight Critical SW. In Proceedings of the SAFECOMP2003, SAFECOMP, Edinburgh, UK, 24–26 September 2003; pp. 289–302. 23. Friedman, M. Automated Software Fault Tree Analysis for Pascal Program. In Proceedings of the Annual Reliability and Maintainability Symposium, Atlanta, GA, USA, 26–28 January 1993; pp. 458–461. 24. Leveson, N.; Harvey, R. Analyzing Software Safety. IEEE Trans. Softw. Eng. 1983, 9, 569–579. [CrossRef] 25. Takahashi, M.; Nanba, R. A Proposal of Fault Tree Analysis for Control Programs. In Proceedings of the SICE 2014, Hokkaido, Japan, 9–12 September 2014; The Society of Instrument and Control Engineers: Tokyo, Japan, 2014; pp. 1719–1724. 26. Kumar, K.; Ramaiah, P. An Experimental Safety Analysis using FTA for A Ball Position Control System. J. Netw. Inf. Secur. 2016, 4, 1–7. Available online: http://www.publishingindia.com (accessed on 28 December 2020). 27. Oveisi, S.; Farsi, M.; Kamandi, A. Design Safe Software via UML-based SFTA in Cyber Physical Systems. J. Appl. Intell. Syst. Inf. Sci. 2020, 1, 11–23. Information 2021, 12, 79 29 of 29

28. Junga, S.; Yoo, J.; Lee, Y. A Software Fault Tree Analysis Technique for Formal Requirement Specifications of Nuclear Reactor Protection Systems. Reliab. Eng. Syst. Saf. 2020, 203, 107064. 29. Hansen, K.; Wells, L.; Maier, T. HAZOP Analysis of UML-Based Software Descriptions of Safety-Critical. In Proceedings of the NWUML 2004, 2nd Nordic Workshop on the Unified Modeling Language, Turku, Finland, 19–20 August 2004; pp. 59–78. 30. Guiochet, J. Hazard Analysis of Human–Robot Interactions with HAZOP–UML; Safety Science; Elsevier: Amsterdam, Nederland, 2016; Volume 84, pp. 225–237. 31. Kaleeswaran, A.; Munk, P.; Sarkic, S.; Vogel, T.; Nordmann, A. A Domain Specific Language to Support HAZOP Studies of SysML Models. In Proceedings of the International Symposium on Model-Based Safety and Assessment, IMBSA 2019: Model-Based Safety and Assessment, Tessaloniki, Greece, 16–18 October 2019; pp. 47–62. 32. Abdulkhaleq, A.; Wagner, S. Experiences with Applying STPA to Software-Intensive Systems in the Automotive Domain. In Proceedings of the 2013 STAMP Workshop, Cambridge, MA, USA, 26–28 March 2013; pp. 1–3. 33. Yang, C. Software Safety Testing Based on STPA. Procedia Eng. 2014, 80, 399–406. [CrossRef] 34. Nakao, H.; Katahira, M.; Miyamoto, Y.; Leveson, N. Safety Guided Design of Crew Return Vehicle in Concept Design Phase Using STAMP/STPA. In Proceedings of the 5th IAASS Conference A safer space for a safer world’, The 5th IAASS Conference A safer space for a safer world, Versailles, France, 17–19 October 2011; pp. 1–5. 35. Hong, Z.; Binbin, L. Integrated Analysis of Software FMEA and FTA. In Proceedings of the 2009 International Conference on Information Technology and Computer Science, 2009 International Conference on Information Technology and Computer Science, Kiev, Ukraine, 25–26 July 2009; pp. 184–187. 36. Oveisi, S.; Ravanmehr, R. Analysis of software safety and reliability methods in cyber physical systems. Int. J. Crit. Infrastruct. 2017, 13, 1–15. [CrossRef] 37. International Organization for Standardization. ISO/IEC 2382-14: 1997, Information Technology–Vocabulary–Part 14: Reliability, Maintainability and Availability; International Organization for Standardization: Geneva, Switzerland, 1997. 38. Takahashi, M.; Anang, Y.; Watanabe, Y. A Proposal of Fault Tree Analysis for Embedded Control Software. Information 2020, 11, 402. [CrossRef] 39. Space System Safety and Mission Guarantee Committee. Space System Safety and Mission Guarantee; Chapter 5 Software Safety and Mission Guarantee; Union of Japanese Scientists and Engineers: Tokyo, Japan, 2016; pp. 203–214. (In Japanese) 40. Hollnagel, E. Barriers and Accident Prevention; Tayer & Francis: New York, NY, USA, 2004.