arXiv:1812.08815v3 [cs.SE] 22 Sep 2020 h rn o 381212925. no. Grant the errve,pecpei eso fa ril ulse in published article an of version pre-copyedit peer-review, eso saalbeoln t doi:10.1007 at: online available is version CMMI A 2 ICT HQ GQM FM EOU DI 1 ntttf Institut S eateto optrSine nvriyo ok,Drmr ae elntn okY1 5GH, YO10 York Heslington, Lane, Deramore York,, of University Science, Computer of Department cronyms F 1 epnet ihdcesdus- decreased with respondents re of urvey * omlmethod formal edquarter head nomto n communica- and information ormal Funding: aeo use of ease goal-question-metric aaiiyMtrt Model Maturity Capability K earlier with coherent strongly (1993). are Parkin findings and Our Austin by methods. observations formal and of this theoretical research future support for empirical to directions highlighting challenges insights, key valuable use. provide the of observations among ease positively be perceived a to negatively suggesting seem a intent. industry, education indicate in and also skills, results FMs Scalability, the apply But to intent usefulness. perceived increased and an industrial indicate examining results domains, software their mission-critical views. for in academic directions methods new formal practice. set of into and transfer use successful challenges, more their a towards overcome improvement benefits, their leverage to Context: intechnology tion intent age Integration challenges practical rIfrai,TcnclUiest fMnc, otmnsrß ,878Grhn,Germany Garching, 85748 3, Boltzmannstraße Munich,, of University Technical Informatik, ur ¨ eywords atyfne yteDush oshnseencat(F,Gra eerhFudto)under Foundation) Research German (DFG, Forschungsgemeinschaft Deutsche the by funded Partly Conclusions: M omlmtos(M)hv enaon o hl,silbigucerhow unclear being still while, a for around been have (FMs) methods Formal P omlmethods formal tosin ethods oesoasfrom rofessionals oflc fInterest: of Conflict Method: · *correspondence: ai Gleirscher Mario epeettelretsuyo hskn ofr( far so kind this of study largest the present We eerhtransfer research P reprint M LE IS II PEOU P NP ME MbE epromacosscinlo-iesurvey. on-line cross-sectional a perform We · / D practitioners s10664-020-09836-5 miia research empirical epnet ihicesdusage increased with respondents epnet ihsm motiva- some with respondents nomto system information non-practitioners eseprecdrespondents experienced less oeeprecdrespondents experienced more ependable , h uhr elr htte aen oflc fitrs.Ti sapost- a is This interest. of conflict no have they that declare authors The oe-ae engineering model-based compiled ecie aeo use of ease perceived ntdKingdom United intent in oueFMs use to tions A [email protected] · 1 bstract otaeegneigeuain&training & education engineering software n ig Marmsoler Diego and S eptember miia otaeEngineering Software Empirical E rp and urope · nln survey on-line S ystems 3 2020 23, U UFM NM TLD TAM SMT SE RQ PU Objective: 2 · usefulness E usage otaeengineering software ecie usefulness perceived N eerhquestion research epnet ihu n moti- any without respondents o-ee domain top-level aifiblt ouotheory modulo satisfiability ngineering ehooyacpac model acceptance technology N s fFsi mission- in FMs of Use orth = rtclSE critical ain oueFMs use to vations · 1) n our and 216), Results: esuythe study We h nlauthenticated final The . usefulness A merica Our · A : ∗

1812.08815 Preprint – Use 2

1 Motivation and Challenges

Over the past decades, many software errors have been deployed in the field and some of these errors had a clearly intolerable impact.2 Cost savings from reducing such impact have been the motivation of formal methods (FMs) as a first-class approach to error prevention, detection, and removal (Holloway, 1997). In university courses on , we learned that FMs are among the best we have to design and assure correct systems. The question “Why are FMs not used more widely?” (Knight et al., 1997) is hence more than justified. With a Twitter poll,3 which emerged from our coffee spot discussions, we solicited opinions on a timely paraphrase of a statement argued by Holloway (1997): “FMs should be a cornerstone of dependability and security of highly distributed and adaptive automation.” What can a tiny opportunity sample of 22 respondents from our social network tell? Not much, well, (i) 55% agrees, i.e., seem to attribute importance to this role of FMs, (ii) 14% disagrees, i.e., oppose that view, (iii) 32% just don’t know. Why should and how could FMs be a cornerstone? Since the beginning of software engineering (SE) there has been a debate on the usefulness of FMs to improve SE. In the 1970s and 1980s, several SE and FM researchers had started to examine this usefulness and to identify error possibilities despite the rigour in FMs (Gerhart and Yelowitz, 1976), with the aim of responding to critical observations of practitioners (Jackson, 1987). Hall (1990) and Bowen and Hinchey (1995a) illuminate 14 myths (e.g. “formal methods are unneces- sary”), providing their insights on when FMs are best used and highlighting that FMs can be overkill in some cases but are highly recommended in others. The transfer of FMs into SE practice is by far not straightforward. Knight et al. (1997) examine reasons for the low adoption of FMs in practice. Barroca and McDermid (1992) ask: “To what extent should FMs form part of the [safety-critical SE] method?” Glass (2002, pp. 148–149, 165–166) and Parnas (2010) observe that “many [SE] researchers advocate rather than investigate” by assuming the need for more methodologies. Glass summarises that FMs were supposed to help represent firm requirements concisely and support rigorous inspections4 and testing. He observes that changing requirements has become an established practice even in critical domains, and inspections, even if based on FMs, are insufficient for complete error removal. In line with Barroca and McDermid (1992, p. 591), he notes that FMs have occasionally been sold as to make error removal complete, but there is no silver bullet (Glass, 2002, pp. 108–109). Bad communication between theorists and practitioners sustains the issue that FMs are taught but rarely applied (Glass, 2002; Holloway and Butler, 1996, pp. 68–70). Parnas (2010) compares alternative paradigms in FM research (e.g. axiomatic vs. relational calculi) and points to challenges of FM adoption (e.g. valid simple abstractions). In contrast, Miller et al. (2010) draw positive conclusions from recent applications of model checking and highlight lessons learned. In his keynote, O’Hearn (2018) conveys positive experiences in scaling FMs through adequate tool support for continuous reasoning in agile projects (see, e.g. Chudnov et al., 2018). Many researchers (see, e.g. Aichernig and Maibaum, 2003) have been working on the improvement of FMs towards their successful transfer. Boulanger (2012) and Gnesi and Margaria (2013) summarise promising industry-ready FMs and present larger case studies. Have software errors been overlooked because of hidden inconsistencies that can be detected when prop- erly formalised? Are such errors compelling arguments for the wider use of FMs? Strong evidence for the ease of use of FMs and their efficacy and usefulness is scarce and largely anecdotal, rarely drawn from comparative studies (e.g. Pfleeger and Hatton, 1997; Sobel and Clarkson, 2002), often primarily conducted in research labs (e.g. Chudnov et al., 2018; Galloway et al., 1998 and many others). In late re- sponse to Holloway and Butler’s request for empirical data (Holloway and Butler, 1996), Graydon (2015)

2See anecdotal evidence (grey literature, press articles) on software-related incidents, for example, by Kaner and Pels (1998, 2018), Charette (2018) and Neumann (2018). 3See https://twitter.com/MarioGleirscher/status/889737625178976256. 4For example, walking through development artefacts in a structured and moderated discussion group and with bug pattern checklists (Fagan, 1976). Preprint –Formal Methods Use 3 still observes a lack of evidence for the effectiveness of FMs in assurance argumentation for safety-critical systems, suggesting empirical studies to examine hypotheses and collect evidence. FMs have many potentials but SE research has reached a stage of maturity where strong empirical evi- dence is crucial for research progress and transfer. Jeffery et al. (2015) identify questions and metrics for FM productivity assessment, supporting FM research transfer.

Contributions. We contribute to SE and FM research (1) by presenting results of the largest cross- sectional survey of FM use among SE researchers and practitioners to this date, (2) by answering research questions about the past and intended use of FMs and the perception of systematically mapped FM chal- lenges, (3) by relating our findings to the perceived ease of use and usefulness of FMs using a simplified variant of the technology acceptance model for evaluating engineering methods and techniques, and (4) by providing a research design for repetitive (e.g. longitudinal) FM studies.

Overview. The next section introduces important terms. Section3 relates our work to existing research. In Section4, we explain our research design. We describe our data and answer our research questions in Section5. In Section6, we summarise and interpret our findings in the light of existing evidence and with respect to threats to validity. Section7 highlights our conclusions and potential follow-up work.

2 Background and Terminology

By formal methods, we refer to explicit mathematical models and sound logical reasoning about crit- ical properties (Rushby, 1994)—such as reliability, safety, security, more generally, dependability and performance—of electrical, electronic, and programmable electronic or software systems in mission- or property-critical application domains. Model checking, theorem proving, abstract interpretation, asser- tion checking, and formal contracts are examples of FMs. By use of FMs, we refer to their application in the development and analysis of critical systems and to substantially integrating FMs with the used pro- gramming methodologies (e.g. structured development, model-based engineering (MbE), assertion-based programming, test-driven development), notations (e.g. UML, SysML), and tools.

Tool and Method Evaluation. In the following, we give an overview of several evaluation approaches and explain in Section 4.2 which approach we take. The widely used technology acceptance model (TAM; Davis, 1989) is a psychological test that allows the assessment of end-user IT based on the two constructs perceived ease of use (PEOU, i.e., positive and negative experiences while using an IT system) and perceived usefulness (PU, i.e., positive experiences of accomplishing a task using an IT system compared to not using this system for accomplishing the same task). Complementary to TAM, Basili (1985) proposes the goal-question-metric (GQM) approach to method and tool evaluation. While GQM serves as a good basis for quantitative follow-up studies, we follow the user-focused TAM. Maturity models according to the Capability Maturity Model Integration (SEI, 2010) do not fit our purposes because they focus on engineering process improvement beyond particular development techniques. Poston and Sexton (1992) present tool survey guidelines based on technology- focused classification and selection criteria with a very limited view on tool usefulness and usability. Miyoshi and Azuma (1993) evaluate ease of use of development environments (i.e., specification and modelling tools) using metrics from the ISO/IEC 9126 quality model. From comparing two models of predicting an individual’s intention to use a tool, Mathieson (1991) sup- ports TAM’s validity and convenience but indicates its limits in providing enough information on users’ opinions. For software methods and programming techniques, Murphy et al. (1999) show how surveys, case studies, and experiments can be used to compensate for this lack of information about usefulness and usability. Preprint –Formal Methods Use 4

Because FMs are by definition based on a formal language and usually supported by tools, it is reasonable to adopt the TAM for the assessment of FMs. Unfortunately, the body of literature on the evaluation of FMs in TAM style is very small. However, Riemenschneider et al. (2002) apply TAM to methods (e.g. UML-based architecture design), concluding that “if a methodology is not regarded as useful by develop- ers, its prospects for successful deployment may be severely undermined.” According to their approach, FM usage intentions would be driven by (1) an organisational mandate to use FMs, (2) the compatibility of FMs with how engineers perform their work, and (3) the opinions of developers’ coworkers and super- visors toward using FMs. Overall, the application of TAM to FMs allows causal reasoning from FM user acceptance towards intention of FM use. Specialising the approach in Riemenschneider et al. (2002), ease of use (EOU) of a FM characterises the type and amount of effort a user is likely to spend to learn, adopt, and apply this FM. Usefulness (U) determines how fit a FM is for its purpose, that is, how well it supports the engineer to accomplish an appropriate task. If EOU and U are measured by a survey whose data points are user perceptions then we talk of perceived ease of use (PEOU) and perceived usefulness (PU). Together, PEOU and PU form the user acceptance of a FM and, by support of Mathieson (1991) and Riemenschneider et al. (2002), can predict the intention to use this FM. Whereas TAM is a model based on the two user-focused constructs PEOU and PU, Kitchenham et al. (1997) propose a meta-evaluation approach called DESMET for tools and methods based on multiple performance indicators (e.g. with TAM as one of the indicators).

3 Related Work

Table1 shows a systematic map (Petersen et al., 2008) of 36 studies on FM research evaluation and transfer. For each study, we estimate5 the authors’ attitude against or in favour of FMs, the motivation of the study, the approach followed, and the type of result obtained. Most of these works present personal experiences, opinions, case studies, or literature summaries. In contrast, the work presented in this paper focuses on the analysis of experience from a wide range of practitioners and experts. However, we found four similar studies. Austin and Parkin (1993) sought to explain the low acceptance of FMs in industry around 1992. Using a questionnaire similar to ours with only open questions, they evaluated 111 responses from a sample of size 444, using a sampling method similar to ours (then using different channels). Responses from FM users are distinguished from general responses. Their questions examine benefits, limitations, barriers, suggestions to overcome those barriers, personal reasons for or against the use of FMs, and ways of assessing FMs. In a second study in 2001, Snook and Harrison (2001) conduct interviews with representatives of five companies to discover the main issues involved in FM use, in particular, issues of understandability and the difficulty of creating and utilising formal specifications. A similar, though more comprehensive interview study was performed by Woodcock et al. (2009) in 2009. They assess the state of the art of the application of FMs, using questionnaires to collect data on 62 industrial projects. Liebel et al. (2016, pp. 102–103) assess effects on and shortcomings of the adoption of MbE in embedded SE including a discussion of FM adoption. The authors observe a lack of tool support, bad reputation, and rigid development processes as obstacles to FM adoption. Their data suggests a need of FM adoption. 30% of the responses from industry declare the need for FMs as a reason to adopt MbE. Moreover, responses indicate that MbE adoption has a positive effect on FM adoption. One limitation of their study is the small number of responses from FM users.

5This estimate is based on opinions and attitudes expressed by the original authors and, where unavailable, on our own interpretation when reading the studies. Preprint –Formal Methods Use 5

Table 1: Overview of related work on FM use and adoption, grouped by primary focus and motivation

Study A Motivation Support E C R Surveys Austin and Parkin (1993) = LoEv Interviews •• Snook and Harrison (2001) = LoEv Interviews • Oliveira (2004) = Edu./Train. Course websites • Woodcock et al. (2009)a = LoEv Interviews • Davis et al. (2013) + TechTx Interviews •• Liebel et al. (2016) + LoEv Online questionnaire • Ferrari et al. (2019) + TechTx Literature study • Literature Studies and Summaries Wing (1990) + SotA O/E •• Bloomfield et al. (1991) = SotA • Fraser et al. (1994) = TechTx • Heitmeyer (1998) = TechTx • Gleirscher et al. (2019) + TechTx SWOT analysis •• Expert Opinions and Experience Reports Jackson (1987) = TechTx • Bjorner (1987) = TechTx • Barroca and McDermid (1992) = SotA Multiple cases • Bowen and Hinchey (1995a) + Hyp. Testing • Bowen and Hinchey (1995b) + TechTx • Hinchey and Bowen (1996) – TechTx • Heisel (1996) + TechTx • Holloway and Butler (1996) + LoEv • Lai (1996) + TechTx • Bowen and Hinchey (2005) + Hyp. Testing Literature study • Parnas (2010) = TechTx •• Case Studies and Experiments Gerhart and Yelowitz (1976) = LoEv Multiple cases, O/E ••• Hall (1990) + Hyp. Testing O/E • Craigen et al. (1995)b + SotA Multiple cases, O/E • Knight et al. (1997) = TechTx Field experiment • Pfleeger and Hatton (1997) = Hyp. Testing Effect analysis • Galloway et al. (1998) + TechTx Single case in lab •• Sobel and Clarkson (2002) = Hyp. Testing Lab experiment • Miller et al. (2010) = TechTx Multiple cases, O/E • Klein et al. (2018) + TechTx • Chudnov et al. (2018) = TechTx • aSee also Bicarregui et al. (2009), bsee also Craigen (1995) and Craigen et al. (1993); (A)ttitude, (E)valuation/analysis, (C)hallenges, (R)ecommendations, +/=/– . . . positive/neutral/negative, LoEv . . . lack of em- pirical evidence, Hyp. Testing . . . hypotheses testing, Edu./Train. . . . education/training, O/E . . . opinion/experience report, SotA . . . state of the art, SWOT . . . strengths, weaknesses, opportunities, and threats, TechTx . . . technology transfer

While these studies focus on the elicitation of the state of the art and the state of practice, the main focus of our study is to compare the current FM adoption or use with the intention to adopt and use FMs in the future. To the best of our knowledge, our study offers the largest set of data points investigating the use of FMs in SE, so far. In Section 6.3, we provide a further discussion of how our findings relate to the findings of these studies, particularly to the works of Austin and Parkin (1993) and Woodcock et al. (2009). Preprint –Formal Methods Use 6

Table 2: Concepts and scales for the construct “Use of FMs in mission-critical SE” (UFM)

Concept Id. Description [Scale] Point [Question] Measured twice . . . Domain C1 Application domains of FMs [MC among domains] Past [Q1], Intent [Q8] Role C2 Role in using FMs [MC among roles] Past [Q4], Intent [Q9] Use C3 Use of FMs [experience level/relative frequency per FM class] Past [Q5/Q6], Intent [Q10/Q11] Purpose C4 Purpose of using FMs [absolute/relative frequency per pur- Past [Q7], Intent [Q12] pose] Measured once . . . Experience C5 Level of FM experience [duration ranges in years] Single [Q2] Motivation C6 Motivation to use FMs [degree per motivational factor] Single [Q3] Obstacles C7 Difficulty of obstacles to using FMs [degree per challenge] Single [Q13] MC. . . multiple-choice

4 Research Method

In this section, we describe our research design, our survey instrument, and our procedure for data col- lection and analysis. For this research, we follow the guidelines of Kitchenham and Pfleeger (2008) for self-administered surveys and use our experience from a previous more general survey (Gleirscher and Nyokabi, 2018).

4.1 Research Goal and Questions

The questions in Section1 have led to this survey on the use, usage intent, and challenges of FMs. Our interest is devoted to the following research questions (RQs):

RQ1 In which typical domains, for which purposes, in which roles, and to what extent have FMs been used? RQ2 Which relationships can we observe between past experience in using FMs and the intent to use FMs? RQ3 How difficult do study participants perceive widely known FM challenges to be? RQ4 What can we say about the perceived ease of use and the perceived usefulness of FMs?

4.2 Construct and Link to Research Questions

Table2 lists the (C)oncepts that constitute the construct Use of FMs in mission-critical SE (UFM), the corresponding scales, the points of measurement, and references to (Q)uestions from the questionnaire.

Measuring Past and Intended Use. For RQ1( UFM), we examine potential application domains for FMs (C1), roles when using FMs (C2), motivations and purposes of using FMs (C6,C4), and the extent of UFM at the general (C5) and specific (C3) experience level of our study participants when using FMs.

For RQ2, we compare the past (UFMp) and intended use (UFMi) of FMs regarding the domain (C1), role (C2), FM class (C3), and purpose (C4). We measure UFMi by relative frequency (Table4) with respect to a participants’ current situation, FM class, and purpose of use. Using a relative instead of an absolute frequency scale slightly reduces the burden on respondents to make detailed and, hence, uncertain predictions of UFMi. For RQ3, we measure the perception of difficulty of several obstacles (C7) known from the literature and from our experience. Preprint –Formal Methods Use 7

Method Evaluation and TAM-style Interpretation. We follow DESMET (Kitchenham et al., 1997) and Murphy et al. (1999) insofar as we combine a qualitative survey (i.e., FM evaluation by SE practi- tioners and researchers) and a qualitative effects analysis based on the past and intent measurements for C4 (i.e., subjective assessment of effects of FMs by asking SE practitioners and researchers). We assume UFM is, nowadays, to a large extent implying the use of the tools automating the corre- sponding FMs. This assumption is justified inasmuch as for all FMs referred to in this survey, tools are available. In fact, in the past two decades (the period most survey respondents could have possibly used FMs), the development of a FM has mostly gone hand in hand with the development of its supporting tools. For RQ4, we associate our findings from RQ2 and RQ3 with PEOU and PU. Whereas TAM predicts UFMi of a specific tool by measuring PEOU and PU, we directly interrogate past (like in Mohagheghi et al., 2012, Figure 2) and intended use of classes of FMs. We measure UFMi (C1,C2,C3,C4) in more detail than TAM. Our approach relates to TAM for methods (Riemenschneider et al., 2002, Table 2) inas- much as we collect data for PEOU through asking about potential obstacles to the further use of FMs (C7) based on experience with past FM use (UFMp). For this, respondents are asked to rate the difficulty of several known challenges to be tackled in typical FM applications. Furthermore, UFMi is known to be correlated with PU. We then interpret the answers to RQ3 to examine the PEOU and, furthermore, inter- pret the answers to RQ2 to reason about PU. In Section 4.4, we discuss our questionnaire including the questions for measuring the sub-constructs.

4.3 Study Participants and Population

Our target group for this survey includes persons with (1) an educational background in engineering and the sciences related to critical computer-based or software-intensive systems, preferably having gained their first degree, or (2) a practical engineering background in a reasonably critical system or product domain involving software practice. We use (study or survey) participant and respondent as synonyms. We talk of FM users to refer to the part of the population that has already used FMs in one or another way. See Appendix A.1 and Table8 for a more fine-grained analysis of the population.

4.4 Survey Instrument: On-line Questionnaire

Table3 summarises the questionnaire we use to measure UFM (Table2). The scales used for encoding the answers are described in Table4. Although we do not collect personal data, respondents could leave us their email address if they want to receive our results. We expect participants to spend about 8 to 15 minutes to complete the questionnaire. However, we thought it to be unnecessary in our case to instrument the questionnaire or our tooling to allow us to determine the time spent for submitting complete data points.

Face and Content Validity. We derived answer options from the literature, our own experience with FMs, SE research training, discussions with other SE researchers and colleagues from industry, pilot responses, and coding of open answers. Particularly, the classification of FM methods (C3;Q5,Q6,Q10, Q11) and the list of obstacles or challenges (C7;Q13) were derived from our own training, literature knowledge prior to this study, and experience as well as from occasional personal discussions with SE experts from academia and industry. Most questions are half-open, allowing respondents to go beyond given answer options. We treat degree and relative frequency as 3-level Likert-type scales. For each question, we provide “do not know” (dnk)-options to include participants without previous knowledge of FMs in any academic or practical context. If participants are not able to provide an answer they can choose, e.g. “do not know”, “not yet used”, “no experience”, or “not at all”, and proceed. This way, we reduce bias by forced response. We indicate dnk-answers whenever we exclude them. Our questionnaire tool (Section 4.6) supports us with getting complete data points, reducing the effort to deal with missing answers. Preprint –Formal Methods Use 8

Table 3: Summary of questions from the questionnaire Id. Question or Question Template Scale (see Table4) Sec. Fig. Q1 In which application domains (C1) in industry or academia MC among domains 5.22 have you mainly used FMs? Q2 How many years of FM experience (including the study of Duration range in years 5.23 FMs, C5) have you gained? Q3 Which have been your motivations (C6) to use FMs? Degree per motivational 5.24 factor Q4 In which roles (C2) have you used FMs? MC among roles 5.35 Q5 Describe your level of experience (C3) for hclass of formal Level of experience per 5.36 description techniquesi. class Q6 Describe your level of experience (C3) for hclass of formal Level of experience per 5.37 reasoning techniquesi. class Q7 I have mainly used FMs for (C4) ... Absolute frequency per 5.38 purpose Q8 In which domains (C1) in industry or academia do you intend MC among domains 5.49 to use FMs? Q9 In which roles (C2) would (or do) you intend to use FMs? MC among roles 5.4 10 Q10 I (would) intend to use (C3) hclass of formal description Relative frequency per 5.4 11 techniquesi hthisi often. class Q11 I (would) intend to use (C3) hclass of formal reasoning Relative frequency per 5.4 12 techniquesi hthisi often. class Q12 I (would) intend to use FMs for (C4) hpurposei. Relative frequency per 5.4 13 purpose Q13 For any use of FMs in my future activities, I consider Degree of difficulty per 5.5 16 hobstaclei (C7) as hthati difficult. obstacle MC. . . multiple-choice

Table 4: Scales used in the questionnaire

Name Values Type degree of “no motivation”, “moderate motivation”, “strong motivation (or requirement)” L3 motivation degree of “not as an issue.”, “as a moderate challenge.”, “as a tough challenge.”, “I don’t L3 difficulty know.” experience level “I do not have any knowledge of or experience in FMs.”, “less than 3 years”, O (duration-based) “3 to 7 years”, “8 to 15 years”, “16 to 25 years”, “more than 25 years” experience level “no experience or no knowledge”, “studied in (university) course”, “applied in O (task-based) lab, experiments, case studies”, “applied once in engineering practice”, “applied several times in engineering practice” frequency “not at all.”, “once.”, “in 2 to 5 separate tasks.”, “in more than 5 separate tasks.” O (absolute) frequency “no more or not at all.”, “less often than in the past.”, “as often as in the past.”, L3 (relative) “more often than in the past.”, “I don’t know.” choice single/multiple: (ch)ecked, (un)checked N bold. . . express lack of knowledge or indecision; (N)ominal, (O)rdinal, Ln . . . Likert-type scale with n values Preprint –Formal Methods Use 9

Channel Type Examples & References General panels SurveyCircle, www.surveycircle.com LinkedIn groups E.g. on ARP 4754, DO-178, FME, ISO 26262 Mailing lists E.g. system safety (U Bielefeld, formerly U York) Newsletters BCS FACS; GI RE, SWT, TAV Personal pages E.g. Facebook, Twitter, LinkedIn, Xing ResearchGate Q&A forums on www.researchgate.net Xing groups E.g. Safety Engineering, RE Table 5: Channels used for sampling

4.5 Data Collection: Sampling Procedure

We could not find an open, non-commercial panel of engineers. Large-scale panel services are either commercial (e.g. Decision Analyst, 2018) or they do not allow the sampling of software engineers (e.g. Leiner, 2014). Hence, we opt for a mixture of opportunity, volunteer, and cluster-based sampling. To draw a diverse sample of potential FM users, we

1. advertise our survey on various on-line discussion channels, 2. invite software practitioners and researchers from our social networks, and 3. ask these people to disseminate our survey.

We examine C5,C1,C2, and C3 from Table2 to check how well our sample covers the given concept categories. The better the coverage of these categories the wider is the range of analyses possible from our data set. Less covered categories might indicate inappropriate concepts as well as the case that our sample just does not touch this fraction of the target population. Under the assumption that the sample is drawn from the target population in a uniformly random fashion, we would be able to draw conclusions about the constitution of the target population. However, as noted, this assumption was in our case not controllable.

4.6 Data Evaluation and Analysis

For RQ1, we summarise the data and apply descriptive statistics for categorical and ordinal variables in Section 5.3. We answer RQ2 by comparison of the data for the past and future views regarding the domain (C1), role (C2), FM class (C3), and purpose (C4) in Section 5.4. Then, in Section 5.5, we answer RQ3 by

• describing the challenge difficulty ratings after associating one of (1) domain, (2) motivational factor, (3) role, (4) purpose, and (5) FM class with challenge (C7) and • distinguishing (1) more experienced (ME, > 3 years) from less experienced respondents (LE, ≤ 3 years), (2) practitioners (P, practised at least once) from non-practitioners (NP, not used or only in course or lab), (3) motivated (M, moderately or strongly motivated by at least one specified factor) from unmotivated respondents (NM, no motivating factor specified), (4) re- spondents’ past and future views, and (5) respondents with increased usage intent (II) from ones with decreased usage intent (DI).

We apply association analysis between these categorical and ordinal variables, using pairs of matri- ces (e.g. Figure 17). We answer RQ4 by arguing from results for RQ1, 2, and 3.

Half-Open and Open Questions. We code open answers in additional text fields as follows: If we can subsume an open answer into one of the given options, we add a corresponding rating (if necessary). If we cannot do this then we introduce a new category “Other” and estimate the rating. Finally, we cluster Preprint –Formal Methods Use 10

15 10 5 Responses 0 W30/07/17 W34/08/17 W38/09/17 W42/10/17 W45/11/17 W49/12/17 W01/01/18 W05/02/18 W09/03/18 W14/04/18 W18/04/18 W22/05/18 W26/06/18 W30/07/18 W34/08/18 W38/09/18 W42/10/18 W45/11/18 W49/12/18 W00/01/19 W04/02/19 W08/03/19 W13/04/19 W17/04/19 W21/05/19

Figure 1: Distribution of responses over time the added answers and split the “Other” category (if necessary). For Q13, we performed the latter step combined with independent coding (Neuendorf, 2016) to confirm that the understanding of the challenge categories is consistent among the authors of the present study. For MC questions, we eliminate the choice of “I do/have not. . . ” options from the data if ordinary answer options where also checked.

Tooling. We use Google Forms (Google, 2018) for implementing our questionnaire (Appendix A.12) and for data collection (Section 4.5) and storage. For statistical analysis and data visualisation (Sec- tion 4.6), we use GNU R (The R Project, 2018) (with the packages likert, gplots, and ggplot2 and some helpers from the “Cookbook for R” and the “Stack Exchange Stats” community6). Content anal- ysis and coding takes place in a spreadsheet application. A draft of AppendixA has been archived in Gleirscher and Marmsoler (2018).

5 Execution,Results, and Analysis

In this section, we summarise the responses to the questions in Table3 and answer the RQs 1, 2, and 3 as explained in Section 4.1. To answer RQ1, we describe the sample in Section 5.2 and discuss some facets of FM use in Section 5.3. For RQ2, we summarise data about past use and usage intent in Section 5.4. For RQ3, we analyse further data in Section 5.5.

5.1 Survey Execution

For data collection, we (1) advertised our survey on the channels in Table5 and (2) personally invited > 30 persons. The sampling period lasted from August 2017 til March 2019. In this period, we repeated step 1 up to three times to increase the number of participants. Figure1 summarises the distribution of responses. The channels in Table5 particularly cover the European and North American areas.

5.2 Description of the Sample (answering RQ 1)

A size estimation of the channels in Table5 yields around 65K channel memberships (for some channels we make a best guess but, e.g. for LinkedIn the counts are given). Assuming participants are, on average, member of at least three of the channels, we could have reached up to 20K real persons. Given a recent estimate of worldwide 23 million SE practitioners (Evans Data, 2018) and assuming that at least 1% are mission-critical SE practitioners, our population might comprise at least 230K persons, possibly around 38K in the US and 61K in Europe.7 We received N = 216 responses resulting in an estimated response

6See http://www.cookbook-r.com and https://stats.stackexchange.com. 7An estimation in Gleirscher et al. (2019) suggests that about 5% of the overall ICT/IS developer population are embedded systems practitioners in critical and non-critical domains. Moreover, Evans Data (2018) and Wikipedia contributors (2018) describe data from 2016 and 2017, suggesting that 3.87 million (19%) SE practitioners live in the US and about 13.3 million (39%) in Europe, the Middle East, and Africa. According to an analysis of data from Stack Overflow by ATOMICO (2019), there is a “software engineering talent pool” of about 6.1 million in Europe. Preprint –Formal Methods Use 11

Transportation Supportive Critical infrastructures Business information Platforms Device industry I have not used FMs in any academic or industrial domain. Industrial machinery Process automation N = 216 Military systems not in the above domains other

20 40 60 80 Figure 2: (Q1) In which application domains in industry or academia have you mainly used FMs? (MC) rate between 1 and 2% and a population coverage of at most 0.1% globally and 0.2% in the US and in Europe. About 40% of our respondents provided their email addresses, the majority from the US, UK, Germany, France, and a sixth from other EU and non-EU countries. In the following, we summarise the responses to the questions about the application domain (Q1), the level of experience (Q2), and the motivations (Q3) of a FM user.

Guide to the Figures. For Likert-type ordered scales, we use centred diverging stacked bar charts (see, e.g. Figure4) as recommended by Robbins and Heiberger (2011). The horizontal bars in each line show the answer fractions according to the legend at the bottom and are annotated with the percentages of the left-most, middle, and right-most answer options. These bars are aligned by the midpoint of the middle group (for 3- and 5-level scales) or by the boundary between the two central groups (for 4- level scales). Bar labels often abbreviate the corresponding answer options in the questionnaire. The questionnaire copy in Appendix A.12 contains short definitions, explanations, and examples to clarify the answer options. For sake of brevity, we do not repeat this information here. “M” denotes the median, “CI” the 95% confidence interval for the median calculated according to Campbell and Gardner (1988), “X” the number of excluded data points per answer option, and “NA” the number of invalid data points.

Q1: Application Domain. For each domain, Figure2 shows the number of participants having experi- ence in that domain.8 Note that 180 of the respondents do have experience with applying FM in different industrial contexts, while only 36 have not applied FMs to any application domain. Medical healthcare is an example where participants could have checked more than one answer category because medical devices would belong to “device industry” and emergency management IT would belong to “critical in- frastructures”. See Appendix A.12 for more information about the answer categories.

Q2: FM Experience. Figure3 depicts participants’ years of experience in using FMs, showing that the sample covers all experience levels. However, the fraction of respondents with no experience (i.e., cate- gory “0”) is comparatively low. According to Section 4.6, one third of the participants can be considered LEs with up to three years of experience, and two thirds can be considered MEs with at least three years of experience (29 of those with even more than 25 years). A further analysis of the study participants’ experience profile is available from Table8 in Appendix A.1 on page 35.

Q3: Motivation. Figure4 suggests that regulatory authorities play a subordinate role in triggering the use of FMs. In contrast, intrinsic motivation (in terms of private interest) seems to be the major factor for using FMs. For 9 respondents, none of the given factors was motivating at all. The 88 open responses for this question could either be subsumed in at least one of the given categories (65 in “Own (private)

8MC entails that the sum of answers can exceed N. Preprint –Formal Methods Use 12

50 N = 216 40

30

20 Frequency

10

0 0 1 − 3 3 − 7 8 − 15 16 − 25 > 25 Figure 3: (Q2) How many years of FM experience (including the study of FMs) have you gained?

Study or research program (X=0, M=moderate, 31% 20% 50% CI[moderate,strong motivation], NA=0) Own interest (X=0, M=moderate, CI[moderate,strong 23% 29% 48% motivation], NA=0) Employer / research collaborators (X=0, 34% 31% 35% M=moderate, CI[moderate,moderate], NA=0) Customers / scientific community (X=0, 27% 40% 32% M=moderate, CI[moderate,moderate], NA=0) Superior / principal investigator (X=0, M=no, 54% 26% 19% CI[no,moderate], NA=0) Regulatory authorities (X=0, M=no, 52% 31% 17% CI[no,moderate], NA=0) On behalf of an FM (X=0, M=no, CI[no,moderate], 52% 37% 12% NA=0) 100 50 0 50 100 Percentage

Degree of motivation: no moderate strong motivation

Figure 4: (Q3) Which have been your motivations to use FMs? interest”, 11 in other categories) or be declared as a comment (3) or not a further motivation (9). Hence, coding did not require an additional answer category to Q3.

5.3 Facets of Formal Methods Use (answering RQ 1)

In the following, we summarise the responses to the questions about the role of a user (Q4), use in specification (Q5), use in analysis (Q6), and the underlying purpose (Q7) of such use.

Q4: Role. Figure5 shows in which roles the respondents applied FMs. An analysis of the MC answers shows that 72% of the participants used FMs in an academic environment, as a researcher, lecturer, or student. 50% of the participants applied FMs in practice, as an engineer or consultant (see also Gleirscher and Marmsoler, 2018).

Q5: Use in Specification. The degree of usage of FMs for specification is depicted in Figure6. There is an almost balanced proportion between theoretical and practical experience with the use of various specification techniques. Only the use of FMs for the description of dynamical systems seems to be remarkably low.

Q6: Use in Analysis. The use of FMs for analysis is depicted in Figure7. Similar to specification techniques, we observe an almost balanced proportion between theoretical and practical experience with the usage of various analysis techniques. Outstanding is the use of assertion checking techniques, such as contracts. As expected from the observations for Q5, the use of FMs in computational engineering, such as algebraic reasoning about differential equations, is again exceptionally low. Preprint –Formal Methods Use 13

Researcher in academia Bachelor, master, or PhD student Lecturer, teacher, trainer, or coach Engineering practitioner in industry Researcher in industry External consultant Consulting or managing practitioner in industry Stakeholder of an FM tool or service provider N = 216 I have not used FMs in any specific role.

20 40 60 80 100 Figure 5: (Q4) In which roles have you used FMs? (MC)

Process models (X=0, M=lab, CI[course,lab], 46% 19% 34% NA=0) Predicative, relational, or algebraic (X=0, 43% 23% 34% M=lab, CI[course,lab], NA=0) Modal and temporal logic specification (X=0, 46% 28% 26% M=lab, CI[course,lab], NA=0) Dynamical systems (X=0, M=course, 72% 13% 15% CI[course,course], NA=0) 100 50 0 50 100 Percentage Experience: none course lab practiced practiced > 1

Figure 6: (Q5) Describe your level of experience with each of the following classes of formal description techniques.

Assertion checking (X=0, M=lab, CI[lab,lab], 36% 22% 43% NA=0) Consistency checking (X=0, M=lab, 50% 15% 36% CI[course,lab], NA=47) Model checking, SMV (X=0, M=lab, 45% 25% 30% CI[course,lab], NA=0) Symbolic execution (X=0, M=course, 50% 20% 29% CI[course,lab], NA=0) Abstract interpretation (X=0, M=course, 56% 18% 26% CI[course,lab], NA=0) Constraint solving (X=0, M=course, 51% 25% 24% CI[course,lab], NA=0) Generic theorem proving (X=0, M=course, 56% 21% 22% CI[course,lab], NA=0) Process calculi (X=0, M=course, 60% 19% 21% CI[course,course], NA=0) Computational engineering, simulation (X=0, 66% 13% 20% M=course, CI[course,course], NA=0) 100 50 0 50 100 Percentage

Experience: none course lab practiced practiced > 1

Figure 7: (Q6) Describe your level of experience with each of the following classes of formal reasoning techniques. Preprint –Formal Methods Use 14

Specification (X=0, M=2 − 5, CI[2 − 5,2 − 37% 63% 5], NA=0) Assurance (X=0, M=2 − 5, CI[2 − 5,2 − 5], 37% 63% NA=0) Inspection (X=0, M=2 − 5, CI[1,2 − 5], 43% 57% NA=0) Clarification (X=0, M=2 − 5, CI[1,2 − 5], 45% 55% NA=0)

Synthesis (X=0, M=0, CI[0,1], NA=0) 69% 31%

100 50 0 50 100 Percentage in 0 1 2 − 5 > 5 separate tasks

Figure 8: (Q7) I have mainly used FMs for ...

I would not or do not intend to use FMs in any... Application Domain (past) I have not used FMs in any academic or industri... Application Domain (future) Process automation Transportation Industrial machinery other Military systems not in the above domains Device industry Critical infrastructures Supportive Platforms Business information

0 20 40 60 80 100 120 140

Figure 9: Number of respondents using FMs by domain (past vs. intent)

Q7: Purpose. Figure8 depicts the participants’ purposes to apply FMs. It seems that the respondents employ FMs mainly for assurance, specification, and inspection. Synthesis, on the other hand, to them seems to be only a subordinate purpose in the use of FMs.

5.4 Past Use versus Usage Intent (answering RQ 2)

We investigate the usage intent of FMs across various domains and roles as well as the participants’ intent to use various FMs and their intended purpose to use FMs.

Application Domain. Figure9 compares the respondents’ past domains of FM application with their intended domains (see Q8). This figure reveals two insights into the participants’ intentions to use FMs: (i) Fewer participants do not want to apply FMs in the future (19) than participants that have not used FMs (36, see yellow bars). Ten participants fall into both categories, they have not used FMs and do not intend to use FMs. (ii) The intended application of FMs outperforms the current application of FMs across all domains. Hence, there is a tendency to increase the use of FMs across all application domains.

Role. Figure 10 compares the participants’ roles in which they applied FMs in the past with their in- tended role to apply FMs in the future (see Q9). Similar to the results for the application domain, we observe that some participants, who have not applied FMs in any role so far, intend to apply such meth- ods in the future. However, the comparison reveals that academic disciplines (i.e., researcher and lecturer) seem to be stable. There is only a small difference between the number of participants who applied FMs Preprint –Formal Methods Use 15

I do not or would not intend to use FMs in any ... Role (past) I have not used FMs in any specific role Role (future) Engineering practitioner in industry Stakeholder of an FM tool or service provider Lecturer, teacher, trainer, or coach Researcher in industry Researcher in academia External consultant Consulting or managing practitioner in industry Bachelor, master, or PhD student

0 20 40 60 80 100

Figure 10: Number of respondents applying FMs by role (past vs. intent)

Modal or temporal logic specification (X=27, M=equally, 28% 72% CI[equally,equally], NA=0) Predicative, relational, or algebraic specification (X=26, 33% 67% M=equally, CI[equally,equally], NA=0)

Process models (X=27, M=equally, CI[equally,equally], NA=0) 35% 65% Dynamical system models (X=39, M=equally, 44% 56% CI[equally,equally], NA=0) 100 50 0 50 100 Percentage Relative frequency: no more less equally more often

Figure 11: (Q10) I (would) intend to use ... in academic domains in the past and the number of participants who want to apply such methods to these domains in the future. In contrast, there is a significant increase in the number of participants aiming to apply FMs, across all industrial roles. Furthermore, the diagram shows a strong contrast between past and indented use in the category “Bach- elor, master, or PhD student.” We can see several reasons for this difference. From the respondents who “used FMs as a student,” many (i) might not be able to “use FMs as a student” anymore because of having graduated, (ii) did not find FMs or the way FMs were taught as helpful, or (iii) moved into a business domain with no foreseeable demand for the application of FMs.

Q10: Intended use for Specification. Figure 11 depicts the respondents’ intended future use of various FMs for system specification (i.e., formal description techniques). The figure shows an almost equal amount of participants aiming to decrease (i.e., “no more” and “less”) and increase (i.e., “more often”) their use of FMs for specification. Only dynamical system models again seem to be an exception: more participants want to decrease their use of this technology, compared to participants who want to increase it.

Q11: Intended use for Analysis. The respondents’ intended use of FMs for the analysis of specifica- tions (i.e., formal reasoning techniques) is depicted in Figure 12. Except for process calculi, we observe a general tendency of the participants to increase their future FM use.

Q12: Intended Purpose. Figure 13 indicates why respondents intend to apply FMs. Again, there is a tendency of the participants to increase FM use across all listed purposes. Preprint –Formal Methods Use 16

Assertion checking (X=17, M=equally, CI[equally,more 19% 81% often], NA=0) Model checking, SMV (X=21, M=equally, CI[equally,more 25% 75% often], NA=0) Symbolic execution (X=26, M=equally, CI[equally,more 26% 74% often], NA=0) Consistency checking (X=23, M=equally, CI[equally,more 27% 73% often], NA=47) Constraint solving (X=23, M=equally, CI[equally,more 28% 72% often], NA=0)

Simulation (X=28, M=equally, CI[equally,equally], NA=0) 31% 69%

Abstract interpretation (X=37, M=equally, 32% 68% CI[equally,equally], NA=0) Generic theorem proving (X=38, M=equally, CI[equally,more 38% 62% often], NA=0) Process calculi (X=46, M=equally, CI[equally,equally], 42% 58% NA=0) 100 50 0 50 100 Percentage

Relative frequency: no more less equally more often

Figure 12: (Q11) I (would) intend to use ...

Specification (X=12, M=equally, CI[equally,more 18% 82% often], NA=0) Inspection (X=20, M=equally, CI[equally,more often], 18% 82% NA=0) Assurance (X=18, M=equally, CI[equally,more often], 20% 80% NA=0) Clarification (X=22, M=equally, CI[equally,more 22% 78% often], NA=0) Synthesis (X=34, M=equally, CI[equally,more often], 30% 70% NA=0) 100 50 0 50 100 Percentage Relative frequency: no more less equally more often

Figure 13: (Q12) I (would) intend to use FMs for ...

Q7 and Q12: Comparison of Code- and Model-based FMs. In the following, we regard practition- ers with experience level “applied several times in engineering practice” or “applied once in engineering practice” and frequency “applied in 2 to 5 separate tasks” or “applied in more than 5 separate tasks” (see Table4). We compare users of code-based FMs (CBs; including “abstract interpretation”, “asser- tion checking”, “symbolic execution”, “consistency checking”; with N=128) with users of model-based FMs (MBs; including “process calculi”, “model checking”, “theorem proving”, and “simulation”; with N=114). While some of the FM classes can be seen as both, code- and model-based, we made a choice based on our experience but left out “constraint solving” because it is a fundamental technique intensively applied in both. The comparison of past and future use for code-based (top half of Figure 21 in Appendix A.4) and model- based FMs (bottom half of Figure 21), for example, in inspection (e.g. error detection, bug finding) shows the following:

• CBs show slightly more frequently an increased intent (the “more often” group) than MBs; for both sub-groups, respondents with 2 to 5 and with more than 5 past uses. • MBs show slightly more frequently a decreased intent (the “no more” group) than CBs.

Looking at assurance (e.g. proof, error removal) shows the following: Preprint –Formal Methods Use 17

Transportation 26 38 17 25 25 14 24 26 33 34 18 33 31

Supportive 26 43 17 32 29 12 26 24 34 34 21 33 33

Process automation ●9 16 14 ●7 ●8 11 12 11 11 15 ●8 18 14

Platforms 11 24 ●8 13 16 ●5 12 13 17 23 13 22 14

Other ●4 ●4 ●2 ●2 ●4 ●2 ●4 ●2 ●4 ●6 ●4 ●4 ●2

Military systems not in the above domains ●8 14 ●5 12 ●9 ●5 ●6 10 10 ●7 ●7 12 ●9

Industrial machinery 12 14 ●6 ●6 ●7 ●8 ●7 13 13 11 11 15 12

Application Domain I have not used FMs in any academic or ●0 ●8 ●3 ●4 ●1 ●1 ●0 ●0 ●1 ●0 ●1 ●2 ●1 industrial domain Device industry 13 20 ●7 12 14 ●7 13 15 19 19 13 18 17

Critical infrastructures 20 35 14 24 20 10 22 23 22 33 18 24 19

Business information 13 24 10 18 14 ●6 15 10 16 19 13 19 20 Predicative, specification specification Computational Process calculi Process models Constraint solving Constraint Assertion checking Symbolic execution Dynamical systems Consistency checking Model checking, SMV Abstract interpretationAbstract relational, or algebraic engineering, simulation Generic theorem proving Modal and temporal logic Modal and temporal FM Class

Figure 14: Approximation (likelihood) of practised use (UFMp) by FM class and application domain

• MBs show slightly more frequently an increased intent than CBs when looking at respondents who have used FMs more than 5 times. However, MBs indicate slightly less frequently an increased intent than CBs when looking at respondents with 2 to 5 uses. • CBs indicate more dnks after 2 to 5 uses and slightly more frequently a decreased intent after 5 uses in comparison with MBs.

Q1,Q5, and Q6: Practised FM Classes by Application Domain. We asked respondents about their use of each FM class independent of the application domain and about their general use of FMs in each such domain. Hence, we can only approximate past usage per FM class and application domain assum- ing that the overall usage per respondent is uniformly distributed among the specified FM classes and domains. For that, we interpret (and count) each respondent who specifies a domain in combination with “applied once in engineering practice” or “applied several times in engineering practice” for an FM class as a practitioner who has used (UFMp) or, respectively, wants to use (UFMi) FMs of that class in that domain. More generally, we count a respondent who specifies n domains, say d1 to dn, in combination with “applied once in engineering practice” or “applied several times in engineering practice” for m FM classes, say c1 to cm, as a practitioner who has used (UFMp) or, respectively, wants to use (UFMi) FMs of the classes c1 to cm in the domains d1 to dn. Figures 14 and 15 show these approximations for UFMp and UFMi.

5.5 Perception of Challenges (answering RQ 3)

Table6 lists the FM challenges subject to discussion, their background, and literature referring to them. We apply the procedure described in Section 4.6. Preprint –Formal Methods Use 18

Table 6: Feedback on given and additional challenges (see Appendix A.8 for a full list of references)

Challenge Name & Description Src. Supported by (oldest, Findings for RQ3 (Section 5.5) newest) Scalability: Useful in handling Q 7 studies, e.g. Hall toughest in Figure 16; by Ps more than large and technologically (1990) and Miller et al. by NPs; when using FMs for assurance heterogeneous systems (2010) and clarification; independent of FM class Skills & Education: Methods Q 13 studies, e.g. Bjorner 2nd toughest; agreed by LEs and MEs; known (little misconception); (1987), Bicarregui largely independent of FM class; trained and experienced users et al. (2009) comparatively small tough-proportions available by Ms Transfer of Proofs: Relation Q 8 studies, e.g. Jackson Agreed by LEs and MEs; top-rated by between models and reality (e.g. (1987) and Parnas DIs and NMs; largely independent of code), handling incomplete (2010) FM class specifications Reusability: Parametric proofs, Q Barroca and Top-rated by tool provider stakeholders reusable specifications and McDermid (1992) and and lectures verification results Bowen and Hinchey (1995b) Abstraction: Useful and correct Q 12 studies, e.g. Varies notably across FM classes (automated) abstractions from Jackson (1987), Miller irrelevant detail (for et al. (2010), and comprehension and validation) Parnas (2010) Tools & Automation: Useful Q 16 studies, e.g. Bjorner Top-rated by DIs; but comparatively notations and trustworthy tools (1987) and O’Hearn small tough-proportions from (for manipulation, checking, (2018) practitioners collaboration, documentation) Maintainability: Stable proofs, Q Barroca and Comparatively small tough-proportions easily modifiable specifications, McDermid (1992), from practitioners and adaptable verification results Knight et al. (1997), and Parnas (2010) Resources: Sufficient resources, 4R 11 studies, e.g. Hall good cost-benefit ratio (despite (1990) and Woodcock No detailed data was collected: adoption, training, licenses) et al. (2009) Because these challenges were Process Compatibility: 6R 12 studies, e.g. Bjorner mentioned several times each, Integration into existing process, (1987) and O’Hearn we classify them to be at least method culture, standards, and (2018) of moderate difficulty. regulations Practicality & Reputation: 7R 6 studies, e.g. Lai and Benefit awareness and sufficient Leung (1995) and empirical evidence for benefits Parnas (2010) Src.. . . source, Q . . . in questionnaire, nR . . . additionally raised by n Respondents Preprint –Formal Methods Use 19

Transportation 43 61 49 46 70 38 47 46 66 35 29 42 64

Supportive 34 40 30 33 50 22 37 32 45 24 19 26 47

Process automation 22 37 25 29 39 21 27 21 34 18 17 24 36

Platforms 34 39 31 39 49 25 35 27 41 24 18 21 46

Other ●0 ●0 ●1 ●1 ●1 ●0 ●1 ●1 ●1 ●1 ●1 ●0 ●1

Military systems not in the above domains 24 35 25 26 38 19 27 22 38 20 ●14 22 35

Industrial machinery 22 41 26 32 41 24 30 26 40 21 21 28 39 I would not or do not Application Domain intend to use FMs in any ● ● ● ● academic or industrial 0 ●3 ●3 ●2 ●1 ●1 ●1 ●1 0 0 0 ●1 ●2 domain Device industry 33 51 36 38 59 28 37 32 50 30 23 31 51

Critical infrastructures 40 60 42 48 68 33 47 39 58 38 25 33 59

Business information 22 37 23 29 37 16 31 25 35 21 21 20 38 Predicative, specification specification Computational Process calculi Process models Constraint solving Constraint Assertion checking Symbolic execution Dynamical systems Consistency checking Model checking, SMV Abstract interpretationAbstract relational, or algebraic engineering, simulation Generic theorem proving Modal and temporal logic Modal and temporal FM Class

Figure 15: Approximation (likelihood) of increased usage intent (UFMi) by FM class and application domain

Scalability (X=26, M=tough, CI[tough,tough], NA=0) 9% 23% 67% Skills and education (X=23, M=tough, CI[tough,tough], 11% 37% 52% NA=0) Transfer of verification results from (X=31, 12% 39% 49% M=moderate, CI[moderate,tough], NA=0) Proper abstractions from irrelevant details (X=36, 13% 43% 44% M=moderate, CI[moderate,tough], NA=0) Reusability of verification results (X=37, M=moderate, 18% 39% 42% CI[moderate,tough], NA=0) Automation or tool support (X=27, M=moderate, 12% 50% 38% CI[moderate,tough], NA=0) Maintainability of verification results (X=36, 17% 46% 37% M=moderate, CI[moderate,tough], NA=0) 100 50 0 50 100 Percentage

Degree of difficulty: not an issue moderate tough

Figure 16: (Q13) For any use of FMs in my future activities, I consider hobstaclei as [not an|a moderate|a tough] issue.

General Ranking (Q13). Figure 16 shows the respondents’ ratings of all challenges. Most of them believe that scalability will be the toughest challenge and maintainability is considered the least difficult of all rated obstacles. For reuse of proof results, proper abstractions, and tool support, the participants distribute more uniformly across moderate and high difficulty. In the following, we compare specific groups of respondents by how they perceive the difficulty of the various challenges. We group respondents according to the criteria in Section 4.6 and according to the Preprint –Formal Methods Use 20 role, motivating factor, FM class, and purpose they specified. Appendix A.6 provides some background material for the following association analyses.

Less Experienced (LE) versus More Experienced (ME) Respondents (Q2). The comparison of the difficulty ratings of LEs with the ratings of MEs shows that (i) LEs less often perceive the given challenges as tough, (ii) MEs significantly more often rate scalability as tough, (iii) both groups show the closest agreement on transfer of verification results and skills and education.

Non-Practitioners (NP) versus Practitioners (P) by Past Purpose (Q7). The perception of skills and education and scalability as the most difficult challenges is largely independent of the purpose, again Ps attributing more significance to scalability. Scalability, the forerunner in Figure 16, exhibits the most tough-ratings from NPs in synthesis and from Ps in assurance and clarification (see the top half of Fig- ure 22 in Appendix A.6).

Decreased Intent (DI) versus Increased Intent (II) by Purpose (Q12). The comparison of the dif- ficulty ratings of respondents with no or decreased intent to use FMs for a specific purpose and of re- spondents with equal or increased intent shows: (i) Scalability and skills and education, both forerunners in Figure 16, show the most tough-ratings from IIs for assurance (67%) and inspection (66%) and from DIs for synthesis (53%). (ii) The trend in Figure 16 is more clearly observable from IIs than from DIs, where transfer of verification results and automation and tool support seem to be tougher than skills and education.

Non-Practitioners (NP) versus Practitioners (P) by FM Class (Q5,Q6). The top half of Figure 17 shows for NPs, the trend in Figure 16 is largely independent of the FM class, except for consistency checking and logic leading with tough proportions of 49%. The bottom half of Figure 17 shows for Ps, difficulty ratings across FM classes vary more: The foremost challenges in Figure 16 received the most tough-ratings from users of process models, dynamical systems, process calculi, model checking, and theorem proving. Difficulty ratings of users are often centred on moderate or tough, proper abstraction and skills and education show a comparatively wide variety across FM classes. The histograms in the lower right corners in Figure 17 indicate that (i) NPs’ difficulty ratings vary less than Ps’ ratings, (ii) NPs’ ratings are more independent from the FM classes, and (iii) NPs’ difficulty ratings are lower on average than Ps’ ratings. Appendix A.6 contains several such association matrices with more detailed data in the matrix cells.

Decreased Intent (DI) versus Increased Intent (II) by FM Class (Q10,Q11). The trend in Figure 16 is supported by many tough ratings (48%) for transfer of verification results from DIs in consistency checking. However, DIs in process calculi provide comparatively many tough-ratings (39%) for the gen- erally low-ranked automation and tool support. Assertion checking exhibits comparatively low tough- proportions across all challenges whereas process calculi exhibit comparatively high tough-ratings. Mir- roring the trend in Figure 16, IIs show less variance than DIs across all FM classes.

Unmotivated (NM) versus Motivated (M) respondents by Motivating Factor (Q3). Respondents with moderate to strong motivation to use FMs more likely identify the given challenges as moderate to tough, regardless of the motivating factor. The trend in Figure 16 seems explainable by many tough rat- ings from respondents motivated by regulatory authorities (69%), not motivated by tool providers (56%), and not motivated by superiors/principal investigators (56%, see Figure 24 in Appendix A.6). NMs’ tough-ratings are notably lower than Ms’ tough-ratings.

Past and Future Views by Role (Q4,Q9). Although participants show role-based discrepancies be- tween their past and intended use of FMs (Figure 10), the perception of difficulty of the rated challenges seems to be largely similar, following the trend in Figure 16. The high ranking of scalability (and Preprint –Formal Methods Use 21

Comparison of Challenge Difficulty across FMs (users not practicing FMs, past) Predicative, relational, or algebraic specification Modal and temporal logic specification Process models Dynamical systems Abstract interpretation Assertion checking Process calculi Model checking, SMV Constraint solving Generic theorem proving Computational engineering, simulation Symbolic execution Consistency checking Key/Histogram of Toughs 15 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Comparison of Challenge Difficulty across FMs (users practicing FMs, past) Predicative, relational, or algebraic specification Modal and temporal logic specification Process models Dynamical systems Abstract interpretation Assertion checking Process calculi Model checking, SMV Constraint solving Generic theorem proving Computational engineering, simulation Symbolic execution Consistency checking Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 17: Difficulty of challenges (cols): NPs (top) compared to Ps (bottom) by class of used FM (rows). Legend: In each cell of an association matrix, both the solid vertical line and the colour (gradient from red to white) represent the tough proportions (from 0 to 100%), with the dotted vertical line marking the 50% margin. The histogram (to the lower right corner of each matrix) counts the combinations (cells) in each 5%-band of tough ratings. E.g. ∼70% of “process calculi” users perceive “scalability” as a tough challenge. Preprint –Formal Methods Use 22 reusability of verification results) is supported by many tough-ratings from tool provider stakeholders for the past view and many from lecturers for the future view. Respondents not having used FMs or not planning to use FMs exhibit the lowest tough-ratings but also the highest fractions of dnk-answers.

Past and Future Views by Domain (Q1,Q8). The trend in Figure 16 is underpinned by highest tough- proportions for respondents from the transportation, military systems, industrial machinery, and support- ive domains.

6 Discussion

In this section, we discuss and interpret our findings, relate them to existing evidence, outline general feedback on the questionnaire, and critically assess the validity of our study.

6.1 Findings and their Interpretation

The following (F)indings are based on the data summarised and analysed in the Sections 5.2 to 5.4. All findings are then collected in Table7 on page 25.

Findings for RQ 1. (F1) Regulatory authorities with their norms, codes, or policies represent only a minor motivating factor to use FMs. Intrinsic motivation (maybe market-triggered) seems to be stronger. This finding is consistent with what we know from the literature survey in Gleirscher et al. (2019): FMs are not formally required by corresponding standards today, not even for the highest safety integrity levels. If regulatory authorities change their recommendations to requirements, then this might spike as a motivating factor. (F2) The low fraction of respondents with no experience in Figure3 may have been caused (1) by our choice of expert channels in Table5 where the likelihood of encountering FM users is probably higher than in more generic SE channels (e.g. Stack Overflow) and (2) by the fact that SE students will usually have an FM course or some lectures about FMs such that they would choose “1–3 years” in Q2 and “studied in course” in Q5. (F3) We observe the least use of FMs in computational engineering and for reasoning about dynamical systems, for example, reasoning about the correctness of algorithms, and their implementation in embed- ded software, controlling such systems. One explanation for this is that our sample mainly comprises software and systems engineers who will work less intensively with such FMs than, for example, me- chanical or control engineers. Another explanation is that such FMs are still less widely known, less well developed, or less well supported by tools than FMs focusing on the reasoning about pure software.

Findings for RQ 2. (F4) It seems that in all given domains (Figure9, except for other) respondents intend to increase their future use of FMs. Moreover, we observe that this tendency is independent of the particular FM class (except process calculi) or purpose. The data also suggest that the use of FMs by teachers and researchers is saturated. This saturation indicates a stable intent to teach FMs, to perform research in FMs, or to otherwise use FMs in teaching or research. However, there is an increased intent to apply FMs in industrial contexts in the future. One explanation could be that engineers have already wanted to use FMs but have not had the opportunity or were not told or permitted to do so. Another explanation for an increased intent of FM non-users could be due to some bias when answering questions about whether someone would do (e.g. try out) something. (F5) Our data suggest that experience in using a certain FM class is positively associated with the intent to use this FM class in the future. To investigate this suspicion, we analysed the intended use of a FM class based on the experience of participants in using this class (also by association analysis as described in Section 4.6). We observe that the more experience one has with using a specific FM class, the more likely they will apply it in the future (see the two charts in the Appendices A.3 and A.5). No experience with a specific FM class correlates with a low intent to use that class. Participants not having used FMs and, Preprint –Formal Methods Use 23

I have not used FMs Engineering practitioner in industry Stakeholder of an FM tool Lecturer, teacher, trainer, or coach Researcher in industry Researcher in academia External consultant Consulting or managing practitioner in Bachelor, master, or PhD student

0 5 10 15 20 25

Number of non−practitioners (MC)

Figure 18: The past role profile of the 46 non-practitioners (out of 216 respondents) helps to explain finding F8

hence, unfamiliar with them might not have had the need in the first place. Only little experience with a certain FM class significantly increases the intent to apply it again in the future. Similar observations can be made for the use of FMs in general for a specific purpose. (F6) The differences in past and intended use between code- and model-based FMs (Section 5.4), for example, when looking at inspection and assurance, are marginal. Moreover, we cannot find a signif- icant difference or a trend between these two categories of FMs when considering different purposes, experience levels, and usage frequencies. (F7) The approximation in the Figures 14 and 15 allows the, albeit vague, interpretation of the numbers as the likelihood that respondents have used (Figure 14) or want to use (Figure 15) a particular FM in a particular domain. Assuming this model, Figure 15 indicates the highest likelihoods of an increased UFMi for methods such as “assertion checking”, “constraint solving”, “model checking”, and “symbolic execution” in domains such as “transportation”, “critical infrastructures”, and the “device industry”.

Findings for RQ 3. (F8) Scalability and skills and education lead the challenge ranking, independent of the domain, FM class, motivating factor, and purpose. Practitioners see scalability as more problematic than non-practitioners, whereas non-practitioners perceive skills and education as more problematic than practitioners. Figure 18 may explain the latter by showing a high fraction of students among the 46 non-practitioners. (F9) Maintainability of proof results or other verification artefacts was found to be the least difficult chal- lenge. However, in the lower half of Figure 17, the challenge column “maintainability” shows relatively low frequencies for “modal and temporal logic” and “model checking” (possibly because of the high level of automation) whereas “theorem proving” (possibly because of a low level of automation) and “constraint solving” (possibly because of being too versatile or generic for the present purpose) show the highest frequencies of tough ratings. See Figure 26 in the Appendix A.6 for more details. (F10) Reusability of proof results was rated as tough by several practitioner groups. (F11) FM users with decreased usage intent rate tool deficiencies as their top obstacle to FM adoption. (F12) Furthermore, our respondents raised three additional challenges (i.e., resources, process compati- bility, and practicability & reputation) which we cross-validated with the literature (see highlighted rows in Table6). The fact that these obstacles were mentioned several times in addition to the given obstacles Preprint –Formal Methods Use 24 justifies them to be highly relevant and at least moderate. However, our data does not allow to rank them more precisely. (F13) Challenges are perceived as moderate or tough, largely similarly between the pairs of groups we distinguish in Section 4.6. (F14) With 72% of tough ratings for scalability, process calculi (e.g. ACP, CCS, CSP) perform in the midfield despite their high reputation as compositional methods. Scalability of process models (e.g. Petri nets, Mealy machines, labelled transition systems, Markov models) is also ranked in the middle field of tough challenges. The ranking of these models, however, is unsurprising in the light of the difficult scal- ability of model checking, a frequently used verification technique for process models and the leader in this ranking (cf. Figure 17). One explanation for the high number of tough-ratings from NPs in synthesis could be that NPs might either not associate FMs with synthesis in general, or because automated synthe- sis of sophisticated artefacts is known to be an unsolved problem in many cases, independent of the use of FMs.

6.2 Relationship to TAM for Methods (answering RQ 4)

In analogy to the reasoning in Davis (1989), an increased positive experience with practically applying FMs forms a high degree of PU (Section2). Davis (1989, pp. 329, 331) observed that current and intended usage are significantly correlated with PU, less with PEOU. In fact, F4 suggests an increased intent to use FMs in the future. Moreover, F5 suggests a positive association of the degree of experience with UFMi, that is, more experience increases the intent. (F15) Because the use of FMs is not mandatory for most respondents, a likely explanation for an increased intent (UFMi) is that our respondents perceive the usefulness of FMs to be more positive than negative. Inspired by Riemenschneider et al. (2002), in the last paragraph of Section 4.2, we justify the use of challenge scales to collect data for PEOU and PU. We justify the validity of the FM-specific challenge scale using the studies in Table1. The column “supported by” in Table6 indicates studies discussing the corresponding challenges. From these discussions, we infer that tackling these challenges contributes to an increased EOU and U. First, the studies suggest that FMs are easier to use if users have sufficient skills and education, if the methods scale to large systems, if mature tools and automation are available, and if proofs are easily maintainable and reusable. Second, the studies suggest that FMs are more useful if they are compatible with the process, if their cost-benefit ratio is low, if their abstractions are correct and expressive, and if proofs can be correctly transferred to reality. Hence, these challenges represent FM-specific substrata (Davis, 1989, p. 325) of EOU and U for FMs. Moreover, a high degree of PEOU corresponds to an increased positive user experience with FMs which translates to a low proportion of tough ratings for the obstacles measured in Q13. However, from F13, we observe that respondents rate most challenges as moderate to tough, largely independent of other variables (F8). (F16) Overall, it thus seems that our respondents perceive the ease of use of FMs to be more negative than positive. According to Table6, many of the surveyed studies discuss skills & education (12 studies) and tools & automation (16) as important challenges. Moreover, Figure 16 suggests that conceptual difficulties (possibly, from a lack of education and training, from difficulties in FM teaching, from a lack of FM students) seem to be at least as responsible for the negative ease of use as the lower ranked tool deficiencies. Indeed, in a recent discussion of “push-button verifiers”, O’Hearn (2018) highlights that both conceptual expertise and tool deficiencies are still significant bottlenecks. However, an investigation of respondents’ experiences with FM tools in comparison to their experiences with FM concepts goes beyond the possibilities of the data collected for this study.

6.3 Relationship to Existing Evidence

Our systematic map shows that our list of challenges is completely backed by substantial literature (see Table6) raising and discussing these challenges. (F17) However, the fact that maintainability and reusability were least covered by our literature is, on the one hand, in line with F9 but, on the other hand, not with F10 and typical cultures of reuse in practice. Preprint –Formal Methods Use 25

Table 7: Summary of findings per research question

RQ1: In which typical domains, for which purposes, in which roles, and to what extent have FMs been used? F1 Intrinsic motivation to use FMs is stronger than norms or codes of regulatory authorities. F2 The fraction of respondents with no experience at all is comparatively low. F3 Respondents use FMs the least in computational engineering and for dynamical systems. RQ2: Which relationships can we observe between past experience in using FMs and intent to use FMs? F4 Increased intent to use FMs observable across all application domains. F5 Amount of experience is positively associated with the strength of usage intent. F6 The responses do not show any significant differences between code- and model-based FMs. F7 Respondents show high likelihoods of an increased intent to use FMs such as “model checking” or “asser- tion checking” in areas such as “transportation” or “critical infrastructures”. RQ3: How difficult do study participants perceive widely known FM challenges? F8 Scalability and skills & education lead the challenge difficulty ranking. F9 Maintainability of proof results is found to be the least worrying challenge. F10 Reusability of proof results is rated as tough by several practitioner groups. F11 FM users with decreased usage intent rate tool deficiencies as their top obstacle. F12 Respondents identified resources, process compatibility, and reputation as further obstacles. F13 All considered challenges are generally perceived as moderate or tough. F14 Among the FM classes, process models are most positively associated with tough scalability. RQ4: What can we say about the perceived ease of use and the perceived usefulness of FMs? F15 Respondents perceive the usefulness of FMs as mainly positive and intend to increase their use. F16 Respondents perceive the ease of use of FMs as mainly negative. Relationship to Existing Evidence (from the literature): F17 Proof maintainability and reusability are least covered by the literature. F18 We repeat Austin and Parkin (1993), excluding benefit analysis but with a broader sample and more de- tailed questions.

Beyond the general findings about FM benefits in Austin and Parkin (1993), we steered our half-open questionnaire towards a refined classification of responses, comparing past with intended use, and inter- rogating recently perceived obstacles among a methodologically and geographically more diverse sample. Their sample mainly covers Z and VDM users in the UK. Our questionnaire has less focus on representa- tion and methodology and excludes both questions on benefits and on suggestions to overcome obstacles. Regarding the latter, Austin and Parkin (1993) mention the improvement of education and standardisation, the preparation of case studies, and the definition of FM effectiveness metrics. (F18) The report of Austin and Parkin (1993) from the National Physical Laboratory archive was unfor- tunately no more available to us. We finally managed to get access to a paper copy provided by a friendly colleague. This, however, only happened after conducting this survey. Anyway, we found that our conclu- sions are nearly identical to Austin and Parkin’s. The data from Figure 16 and Table6 confirms that many of the obstacles (i.e., limitations and barriers) they identified back in 1991/2 remain (e.g. understanding the notation and the underlying mathematics, resistance to process changes), some have been lightly ad- dressed (e.g. lack of cost/benefit evidence) and some have been more strongly addressed (e.g. lack of expressiveness, lack of appropriate tools). Not mentioned in Austin and Parkin (1993) is scalability, rated by our respondents to be the toughest obstacle. F5 is in line with other observations in Bicarregui et al. (2009) and Woodcock et al. (2009) that the re- peated use of a FM results in lower overheads (i.e., an experienced effort or cost reduction and improved error removal), up to an order of magnitude less than its first use (Miller et al., 2010). Finally, our study generalises the main findings about barriers in Davis et al. (2013) to several geographies and applica- tion domains, however, using an on-line questionnaire instead of interviews and not asking for barrier mitigations. Preprint –Formal Methods Use 26

6.4 Threats to Validity

We assess our research design with regard to four common criteria (Shull et al., 2008; Wohlin et al., 2012). Per threat ( ), we estimate its criticality (minor or major), describe it, and discuss its partial (◦) or full (X) mitigation.

6.4.1 Construct Validity

Why would the construct (Section 4.2) appropriately represent the phenomenon? maj : Inappropriate questions and conceptual misalignment / To support face validity, we applied our own experience from FM use to develop a core set of questions. For the design of our questionnaire, we use feedback from colleagues, from respondents we personally know, and from the general feedback on the survey to improve and support content validity. A positive comparison with the questionnaire in Austin and Parkin (1993) finally confirms the appropriateness of our questions. However, we might have needed additional questions to check for conceptual alignment, for example, to more precisely determine whether the respondents’ understanding of FMs and of the use or application of FMs closely matches ours. However, from 18 respondents giving feedback on our questionnaire, only one commented on the definition and one on the classification of FMs. That suggests that many respondents did not have or were not aware of misunderstandings worth mentioning. ◦ min : Questionnaire limited for measurement of PEOU (e.g. per FM class) and PU / We avoid deriving conclusions specific to a FM or a corresponding tool from our data. X min : Bias by omitted scale values (e.g. FM class, domain, purpose) / Respondents are encouraged to provide open answers to all questions, helping us to check scale completeness. Between 8% and 40% of the respondents made use of the text field “Other.” Our systematic map confirms that we have not listed unknown challenges in Q13. We identified three additional challenges via open answers and the litera- ture. We believe to have achieved good criterion validity through questions and scales for distinguishing important sub-groups (see Section 4.6) of our population. X min : Educational background asked indirectly / We approximate what we need to know by using data from Q1,Q3,Q4, and Q5. X

6.4.2 Internal Validity

Why would the procedure in Section4 lead to reasonable and justified results? min : Incomplete data points / After the 47th response, feedback from colleagues and respondents resulted in an extension of Q3 with the option “on behalf of FM tool provider” (Figure4) and of Q6 and Q11 with the option “consistency checking” (Figure7). The enhancement of 169 complete data points to 216 maintained all trends. X min : Duplicate & invalid answers / To identify intentional misconduct, we checked for timestamp anomalies and for duplicate or meaningless phrases in open answers. Voluntarily provided email ad- dresses (90/220) indicate only 4 double participants. We remove these 4 data points from our data set. Google Forms includes data points only if all mandatory questions are answered and the submit button is pressed. We also performed a consistency check of MC questions and corrected 5 data points where “I do/have not. . . ” was combined with other checked options. X min : Inter- vs. intra-UFM inference / Our study design is not suitable for “inter-UFM predictions”, for example, to predict that (dis)satisfied model checking practitioners have an increased (a decreased) intent to use theorem proving. However, the argumentation in the Sections 4.2 and 6.2 aims at “intra-UFM predictions”, that is, inferring an increased or decreased intent to use model checking from the quantity and quality of past experience in using model checking. Such predictions may inherit possible limitations of TAM studies. ◦ Preprint –Formal Methods Use 27

6.4.3 External Validity

Why would the procedure in Section4 lead to similar results with more general populations? maj : Low response rate / We believe our estimates in Section 5.2 to be sensible. We tried to (i) im- prove targeting by repetitively advertising on multiple appropriate channels, (ii) spot unreliable contact information, (iii) provide incentive (study results via email), (iv) keep the questionnaire short and com- prehensible, (v) avoid forced answers, and (vi) allow lack of topic knowledge. Some uncertainties remain, for example, lack of sympathy, personal motivation, and interest, or strong loyalty, and high expectations in the outcome, or intentional bias. However, from an estimated population of around 100K (i.e., the rounded sum of 38K and 61K), the minimum sample size for 95% confidence intervals with continuous scale error margins of less than 7% is 196, consistent with the ballpark figure in Gleirscher et al. (2019, p. 117:29). Our sample (N=216) exceeds this number. The 95% confidence intervals for the Likert items show that the margin of error for the median sometimes deviates by one category (e.g. Figure4). In this first study, we aim at understanding common perceptions, such as “FMs are not practically useful” or “FMs are difficult to apply”. Because these statements address FMs as a whole, we believe such local errors do not affect our general conclusions. However, the response rate (1 to 2%) and population coverage (0.1%, cf. Section 5.1) were too low to avoid such errors and refute specific null-hypotheses, such as “FM m is effective for role r and purpose p in domain d” (by the FM community) or “FM m is difficult to apply for role r and purpose p in domain d” (by SE practitioners), with satisfactory statistical power. X maj : Bias towards specific groups (Shull et al., 2008, p. 181) / We distributed our questionnaire over general SE channels. We mix opportunity (only 5 to 10% chain referral), volunteer, and cluster-based sampling. Selection bias, a problem in snowball sampling (Biernacki and Waldorf, 1981), is limited by good visibility and accessibility of the target population in these channels (Section 5.2) as well as little use and control of referral chains among respondents. Our sample includes 50% practitioners according to Section 4.6, ≈ 21% NPs (incl. laypersons), and ≈ 31% pure academics. A bias towards FM experts (Fig- ure3) does not harm our PEOU discussion led by practitioners but shapes our PU discussion. Regarding application domains, our conclusions cannot be generalised to, e.g. critical IT systems in the finance and e-voting sectors. ◦ min : Non-response / We decided not to enforce responses or provide incentives. Still, our data suggests that our advertisement stimulated responses from FM-critical minds. ◦ min : Lack of FM knowledge / 11 to 18% of our respondents did not know specific challenges (Fig- ure 16 ). For RQ1 (Figures2 to 16), dnk-data points have no influence because the findings of RQ1 directly describe and interpret the status quo of UFMp. For test purposes, we included dnk-data points in the analyses of RQ2 and RQ3 (Figures 11 to 16), with no relevant influence. X min : Geographical background missing / Respondents were not required to own a Google account to avoid tracking and to increase anonymity and the response rate. The limited geographical knowledge about our sample constrains the generalisability of our conclusions, e.g. to geographies such as China, India, or Brazil. ◦

6.4.4 Reliability

Why would a repetition of the procedure in Section4 with di fferent samples from the same population lead to the same results? maj : Internal consistency / All 7 items for the concept “obstacle to FM effectiveness” (C7) show good internal consistency for our sample with a Cronbach α = 0.84, the PEOU-part of C7 consisting of 5 items shows an α = 0.79 (Shull et al., 2008). The other concepts are not measured with multiple items. ◦ maj : Change of proportions / The limited sample and the low response rate make it hard to mitigate this risk. However, we compared the first (til 4.8.2018, N1 = 114) and second (from 5.8.2018, N2 = 102) half of our sample to simulate a repetition of our survey with the same questionnaire. A two-sided Mann- Preprint –Formal Methods Use 28

Whitney U test for difference does not show a significant difference between these two groups (e.g. for Q13 and Q4). Only for the Q3 item “On behalf of FM tool provider,” a p = 0.07 indicates a potential difference. The addition of that item only after the 47th respondent might explain this difference. ◦

7 Conclusions

We conducted an on-line survey of mission-critical software engineering practitioners and researchers to examine how formal methods have been used, how these professionals intend to use them, and how they perceive challenges in using them. This study aims to contribute to the body of knowledge of the software engineering and formal methods communities.

Overall Findings. From the evidence we gathered for the use of formal methods, we make the follow- ing observations:

• Intrinsic motivation is stronger than the regulatory one. • Despite the challenges, our respondents show an increased intent to use FMs in industrial con- texts. • Past experience is correlated with usage intent. • All challenges were rated either moderately or highly difficult, with scalability, skills, and edu- cation leading. Experienced respondents rate challenges as highly difficult more often than less experienced respondents. • From the literature and the responses, we identified three additional challenges: sufficient re- sources, process compatibility, good practicality/reputation. • The negative responses to the questions about obstacles to FM effectiveness suggest that the ease of use of FMs is perceived more negative than positive. • Gaining experience and confidence in the application of a FM seems to play a role in developing a positive perception of usefulness of that FM.

Barroca and McDermid (1992) present evidence to show that FMs can be used in industry effectively and more widely. Their observation from 1992 is that FM use had been limited, benefits were clear but limitations were subtle. In response to Barroca and McDermid’s finding “FMs are both oversold and under-used”, our insights from the analysis of RQ 2 and 3 lead us to conclude that today FMs are probably more underused than oversold. However, our data also suggests that these methods still need substantial improvement and support in several areas in order for their benefits to be better utilised.

General Feedback on the Survey. The questionnaire seems to be well-received by the participants. One of them found it an “interesting set of questions.” This impression is confirmed by another partici- pant:

“Well chosen questions which do not leave me guessing. Relevant to future FM research and practice.”

Another respondent noted:

“Thank you very much for this survey. It is very constructive and important. It handles most of the issues encountered by any practitioner and user of FMs.”

Only one participant found the questionnaire difficult for FM beginners.

Implications Towards a Research Agenda. In the spirit of Jeffery et al. (2015) and complementing the suggestions from the SWOT analysis in Gleirscher et al. (2019), we want to make another step in setting out an agenda for future FM research. Preprint –Formal Methods Use 29

To address scalability, we need more research on how compositional methods (e.g. automated assume- guarantee reasoning, Cofer et al., 2012; automated assertion checking, Leino, 2017) can be better lever- aged in practical settings. To address skills and education, we need an enhanced and up-to-date FM body of knowledge (FMBoK; Oliveira et al., 2018). From his survey of “FMs courses in European higher education”, Oliveira (2004) observes that (i) “model-oriented specification”, “formalising distribution, concurrency and mobility”, and “logical foundations of formal methods” showed to be the topic areas most frequently taught by FM lecturers, and (ii) Z, B, SML, CSP, and Haskell showed to be the most popular formal notations and languages taught in these courses. A comparison of the current state with Oliveira’s observations can help to evaluate and revise current FM curricula (e.g. for undergraduate SE as suggested in Davis et al., 2013) and to derive recommendations for improved FM courses fostering good modelling, composition, and refinement skills in SE practice. To address controllable abstractions, we need semantics workbenches for underpinning domain-specific languages with formal semantics. We believe that further steps in theory integration and unification (Gleirscher et al., 2019) can help establish proof hierarchies and, hence, reusability and proof transfer. To address process compatibility, we need more research in continuous reasoning (e.g. Chudnov et al., 2018; O’Hearn, 2018), a revival of activities, possibly even regulations, in tool integration and model data interchange, and guidance on how to update engineering development processes. To address reputation, we need to provide more incentives for practitioners to use FMs and take recent progress in FM research into account when changing current software processes, policies, regulations, and standards. This in- cludes convincing practitioners to invest in the support of large-scale studies for monitoring FM use in industry. Cost-savings analyses of FM applications (e.g. Jeffery et al., 2015) supported by strong em- pirical designs (i.e., controlled field experiments) can help to collect the necessary evidence for decision making, successful knowledge transfer, and for implementing this vision. This survey underpins and enhances the analysis of strengths and weaknesses of FMs in Gleirscher et al. (2019) and can be a guide (1) for consulting and managing practitioners when considering the introduction of FMs into a engineering organisation, (2) for research managers when shaping a grant programme for FM experimentation and transfer, and (3) for associate editors when organising a journal special section on applied FM research.

Future Work. Our survey is another important step in the research of effectively applying FM-based technologies in practice. To put it with the words of one of our participants: “[A] closed questionnaire is just a start.” Hence, we aim at a follow-up study (i) to find out which particular FM (and tool) is used in which domain for which particular purpose and role (e.g. was SMT solving used for model checking in certification or for task scheduling at run-time?), (ii) to measure where particular techniques work well (e.g. which types of formal contracts work well in control software requirements management in a DO-178C context?), (iii) to measure key indicators for successful use of FMs, (iv) to identify management techniques needed to accommodate the changes in working practices, and, finally, (v) to provide guidance to future projects wishing to adopt FMs. In a next survey, we like to ask about typical FM benefits, about suggestions for barrier mitigation (Davis et al., 2013), pose more specific questions on scalability and useful abstraction, the geographical9 and educational background, and for conceptual alignment. Further analysis of obstacles, benefits, and usage intent could also benefit from a more fine-grained distinction between FMs directly applied to program code and FMs focusing on more abstract models. We would also like to change from 3-level to 5-level Likert-type scales to receive fine-granular responses. Our research design accounts for repeatability, hence, allowing us to go for a longitudinal study.

The research design, and even our current data set, allows the derivation of the usage intent (UFMi) for each FM class, application domain, and obstacle. These UFMi values could be used to analyse whether a particular FM might be (1) underused (i.e., domains with an increased usage intent indicate a potential

9According to https://en.wikipedia.org/wiki/United_Nations_geoscheme. Preprint –Formal Methods Use 30 for more applicability) or (2) oversold (i.e., domains with a lower usage intent and were obstacles are perceived as being particularly tough and, hence, FMs as being less effective).

Acknowledgements. It is our pleasure to thank all survey participants for their time spent and their valuable responses, and all channel moderators for forwarding our postings. We are much obliged to Jim Woodcock, who has led previous studies in our direction, and supported us to critically reflect our work and relate it to existing evidence. He connected us with John Fitzgerald, who made his paper copy of Austin and Parkin (1993) available to us such that we were able to complete our investigation. We are grateful to John Fitzgerald and also to John McDermid for helpful feedback and for encouraging us to do further research in this direction. We would like to spend sincere gratitude to Krzysztof Brzezinski, Louis Brabant, and Emmanuel Eze for pointing us to several related works.

References

Aichernig, Bernhard K. and Tom Maibaum, eds. (Nov. 18, 2003). Formal Methods at the Crossroads. From Panacea to Foundational Support. Springer Berlin Heidelberg. isbn: 3-540-20527-6. ATOMICO (2019). The State of European Tech 2019. Section 6.4. url: https : / / web . archive . org / web / 20191220234928/http://2019.stateofeuropeantech.com/chapter/people/article/strong- talent-base/. Austin, Stephen and Graeme Parkin (Mar. 1993). Formal methods: A survey. Tech. rep. Teddington, Middlesex, UK: National Physical Laboratory. Barroca, Leonor M. and John A. McDermid (1992). “Formal methods: Use and relevance for the development of safety-critical systems”. In: Comp. J. 35.6, pp. 579–99. doi: 10.1093/comjnl/35.6.579. Basili, Victor R. (July 1, 1985). Quantitative evaluation of software methodology. Tech. rep. TR-1519. University of Maryland. url: https : / / drum . lib . umd . edu / bitstream / handle / 1903 / 7520 / Quantitative + Evaluation.pdf?sequence=1 (visited on 05/30/2019). Bicarregui, J. C. et al. (2009). “Industrial Practice in Formal Methods: A Review”. In: FM 2009: Formal Methods. Ed. by Ana Cavalcanti and Dennis R. Dams. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 810–813. isbn: 978-3-642-05089-3. Biernacki, Patrick and Dan Waldorf (1981). “Snowball Sampling: Problems and Techniques of Chain Referral Sam- pling”. In: Sociological Methods & Research 10.2, pp. 141–163. doi: 10.1177/004912418101000205. Bjorner, D. (1987). “On the Use of Formal Methods in Software Development”. In: Proceedings of the 9th Inter- national Conference on Software Engineering. ICSE ’87. Monterey, California, USA: IEEE Computer Society Press, pp. 17–29. isbn: 0-89791-216-0. doi: 10.5555/41765.41768. url: http://dl.acm.org/citation. cfm?id=41765.41768. Bloomfield, RE, PKD Froome, and BQ Monahan (1991). “Formal methods in the production and assessment of safety critical software”. In: Reliability Engineering & System Safety 32.1-2, pp. 51–66. Boulanger, Jean-Louis (July 11, 2012). Industrial Use of Formal Methods: Formal Verification. Wiley-ISTE. 298 pp. isbn: 9781848213630. Bowen, J. P. and M. G. Hinchey (July 1995a). “Seven more myths of formal methods”. In: IEEE Software 12.4, pp. 34–41. issn: 0740-7459. doi: 10.1109/52.391826. — (Apr. 1995b). “Ten commandments of formal methods”. In: Computer 28.4, pp. 56–63. issn: 0018-9162. doi: 10.1109/2.375178. Bowen, Jonathan P. and Michael G. Hinchey (2005). “Ten Commandments Revisited: A Ten-year Perspective on the Industrial Application of Formal Methods”. In: Proceedings of the 10th International Workshop on Formal Methods for Industrial Critical Systems. FMICS ’05. Lisbon, Portugal: ACM, pp. 8–16. isbn: 1-59593-148-1. doi: 10.1145/1081180.1081183. Campbell, Michael J. and Martin J. Gardner (1988). “Calculating confidence intervals for some non-parametric anal- yses”. In: British Medical Journal 296, p. 1454. Charette, Robert N. (June 2018). Fiat Chrysler Is Being Sued Over a Software Flaw. IEEE. url: https://web. archive . org / web / 20180629231601 / https : / / spectrum . ieee . org / riskfactor / computing / software/court-allows-lawsuit-to-proceed-against-fiat-chrysler-over-software-flaw. Chudnov, Andrey et al. (2018). “Continuous Formal Verification of Amazon s2n”. In: Computer Aided Verification. Springer International Publishing, pp. 430–446. doi: 10.1007/978-3-319-96142-2_26. Cofer, Darren D. et al. (2012). “Compositional Verification of Architectural Models”. In: NASA Formal Methods - 4th International Symposium, NFM 2012, Norfolk, VA, USA, April 3-5, 2012. Proceedings, pp. 126–140. doi: 10.1007/978-3-642-28891-3\_13. Preprint –Formal Methods Use 31

Craigen, D., S. Gerhart, and T. Ralston (Feb. 1995). “Formal methods reality check: industrial usage”. In: IEEE Transactions on Software Engineering 21.2, pp. 90–98. issn: 0098-5589. doi: 10.1109/32.345825. Craigen, Dan (1995). “Formal methods technology transfer: Impediments and innovation (abstract)”. In: CONCUR ’95: Concurrency Theory: 6th International Conference Philadelphia, PA, USA, August 21–24, 1995 Proceed- ings. Ed. by Insup Lee and Scott A. Smolka. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 328–332. isbn: 978-3-540-44738-2. doi: 10.1007/3-540-60218-6_24. Craigen, Dan, Susan Gerhart, and Ted Ralston (1993). “An International Survey of Industrial Applications of Formal Methods”. In: Z User Workshop, London 1992: Proceedings of the Seventh Annual Z User Meeting, London 14–15 December 1992. Ed. by J. P. Bowen and J. E. Nicholls. London: Springer London, pp. 1–5. isbn: 978-1- 4471-3556-2. doi: 10.1007/978-1-4471-3556-2_1. Davis, Fred D. (Sept. 1989). “Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology”. In: MIS Quarterly 13.3, pp. 319–40. Davis, Jennifer A. et al. (2013). “Study on the Barriers to the Industrial Adoption of Formal Methods”. In: Formal Methods for Industrial Critical Systems. Springer Berlin Heidelberg, pp. 63–77. doi: 10.1007/978-3-642- 41010-9_5. Decision Analyst (Aug. 2018). Technology Advisory Board. Decision Analyst, Inc. url: https://web.archive. org/web/20191214142906/https://www.decisionanalyst.com/online/acop/. Evans Data (2018). Global Developer Population and Demographic Study. Tech. rep. Volume 1. Evans Data Corpo- ration. url: https://web.archive.org/web/20191015060004/https://evansdata.com/reports/ viewRelease.php?reportID=9. Fagan, M. E. (1976). “Design and Code Inspections to reduce Errors in Program Development”. In: IBM Systems Journal 15.3, pp. 182–211. doi: 10.1147/sj.153.0182. Ferrari, Alessio et al. (Jan. 2019). Survey on Formal Methods and Tools in Railways Technical Report on the activities performed within ASTRail, Deliverable D4.1. doi: 10.5281/zenodo.2535023. Fraser, Martin D., Kuldeep Kumar, and Vijay K. Vaishnavi (1994). “Strategies for incorporating formal specifications in software development”. In: Communications of the ACM 37.10, pp. 74–86. doi: 10.1145/194313.194399. Galloway, Andy J., Trevor J. Cockram, and John A. McDermid (Sept. 1998). “Experiences with the Application of Discrete Formal Methods to the Development of Engine Control Software”. In: IFAC Proceedings Volumes 31.32, pp. 49–56. doi: 10.1016/S1474-6670(17)36335-8. Gerhart, Susan L. and Lawrence Yelowitz (1976). “Observations of Fallibility in Applications of Modern Program- ming Methodologies”. In: IEEE Trans. Software Eng. 2.3, pp. 195–207. doi: 10.1109/TSE.1976.233815. Glass, Robert L. (Oct. 28, 2002). Facts and Fallacies of Software Engineering. Pearson Education (US). isbn: 978- 0321117427. Gleirscher, Mario and Diego Marmsoler (Nov. 14, 2018). Electronic Supplementary Material for “Formal Methods: Oversold? Underused? A Survey”. Zenodo. doi: 10.5281/zenodo.1487596. Gleirscher, Mario and Anne Nyokabi (2018). System Safety Practice: An Interrogation of Practitioners about Their Activities, Challenges, and Views with a Focus on the European Region. Tech. rep. York, UK: Department of , , UK. arXiv: 1812.08452 [cs.SE]. Gleirscher, Mario, Simon Foster, and Jim Woodcock (2019). “New Opportunities for Integrated Formal Methods”. In: ACM Computing Surveys 52 (6), 117:1–117:36. issn: 0360-0300. doi: 10.1145/3357231. arXiv: 1812.10103 [cs.SE]. Gnesi, Stefania and Tiziana Margaria (2013). Formal Methods for Industrial Critical Systems: A Survey of Applica- tions. Wiley-IEEE Press. isbn: 9781118459898. Google (Aug. 2018). Google Forms Service. Google, Inc. url: http://forms.google.com. Graydon, P.J. (June 2015). “Formal Assurance Arguments: A Solution in Search of a Problem?” In: Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on, pp. 517–528. doi: 10.1109/DSN.2015.28. Hall, Anthony (1990). “Seven Myths of Formal Methods”. In: IEEE Software 7.5, pp. 11–19. doi: 10.1109/52. 57887. Heisel, Maritta (Jan. 1, 1996). “A Pragmatic Approach to Formal Specification”. In: Object-Oriented Behavioral Specifications. Springer. isbn: 978-0-7923-9778-6. doi: 10.1007/978-0-585-27524-6_4. Heitmeyer, Constance L. (1998). “On the Need for ’Practical’ Formal Methods”. In: Proceedings of the 5th Inter- national Symposium on Formal Techniques in Real-Time Fault Tolerant Systems (FTRTFT). Vol. LICS 1486. Lyngby, DenmarkLyngby, Denmark, pp. 18–26. Hinchey, Michael G. and Jonathan P. Bowen (Apr. 1996). “To formalize or not to formalize?” In: IEEE Computer 29.4, pp. 18–19. Holloway, C. M. (Oct. 1997). “Why engineers should consider formal methods”. In: 16th DASC. AIAA/IEEE Digital Avionics Systems Conference. Reflections to the Future. Proceedings. Vol. 1, pp. 16–22. doi: 10.1109/DASC. 1997.635021. Preprint –Formal Methods Use 32

Holloway, C. M. and R. W. Butler (1996). “Impediments to industrial use of formal methods”. In: Computer 29.4, pp. 25–26. doi: 10.1109/MC.1996.488298. Jackson, Michael (1987). “Power and Limitations of Formal Methods for Software Fabrication”. In: Journal of In- formation Technology 2.2, pp. 72–76. doi: 10.1177/026839628700200204. Jeffery, Ross et al. (Apr. 2015). “An empirical research agenda for understanding formal methods productivity”. In: Information and Software Technology 60, pp. 102–112. doi: 10.1016/j.infsof.2014.11.005. Kaner, Cem and David Pels (Aug. 1998). Bad Software. Wiley. isbn: 978-0471318262. — (Aug. 2018). Bad Software: Website. url: https :/ / web. archive .org / web/ 20191210042547 /http : //badsoftware.com/. Kitchenham, B., S. Linkman, and D. Law (1997). “DESMET: a methodology for evaluating software engineering methods and tools”. In: Computing & Control Engineering Journal 8.3, pp. 120–126. doi: 10 . 1049 / cce : 19970304. Kitchenham, Barbara A. and Shari L. Pfleeger (2008). “Guide to Advanced Empirical Software Engineering”. In: Springer. Chap. Personal Opinion Surveys, pp. 63–92. Klein, Gerwin et al. (Sept. 2018). “Formally verified software in the real world”. In: Communications of the ACM 61.10, pp. 68–77. doi: 10.1145/3230627. Knight, John C. et al. (1997). “Why Are Formal Methods Not Used More Widely?” In: Fourth NASA Formal Methods Workshop, pp. 1–12. Lai, R. (Jan. 1, 1996). “How could research on testing of communicating systems become more industrially relevant?” In: Springer, pp. 3–13. doi: 10.1007/978-0-387-35062-2_1. Lai, Richard and Wilfred Leung (1995). “Industrial and Academic Protocol Testing: The Gap and the Means of Convergence”. In: Computer Networks and ISDN Systems 27.4, pp. 537–547. doi: 10.1016/0169-7552(93) E0110-Z. Leiner, D. J. (2014). SoSci Survey. Tech. rep. url: https://web.archive.org/web/20191202015133/https: //www.soscisurvey.de/. Leino, K. Rustan M. (2017). “Accessible Software Verification with Dafny”. In: IEEE Software 34.6, pp. 94–97. doi: 10.1109/ms.2017.4121212. Liebel, Grischa et al. (2016). “Model-based engineering in the embedded systems domain: an industrial survey on the state-of-practice”. In: Software & Systems Modeling 17.1, pp. 91–113. doi: 10.1007/s10270-016-0523-3. Mathieson, Kieran (1991). “Predicting User Intentions: Comparing the Technology Acceptance Model with the The- ory of Planned Behavior”. In: Information Systems Research 2.3, pp. 173–191. doi: 10.1287/isre.2.3.173. Miller, Steven P., Michael W. Whalen, and Darren D. Cofer (Feb. 2010). “Software model checking takes off”. In: Communications of the ACM 53.2, pp. 58–64. doi: 10.1145/1646353.1646372. Miyoshi, T. and M. Azuma (1993). “An empirical study of evaluating software development environment quality”. In: IEEE Transactions on Software Engineering 19.5, pp. 425–435. doi: 10.1109/32.232010. Mohagheghi, Parastoo et al. (Jan. 2012). “An empirical study of the state of the practice and acceptance of model- driven engineering in four industrial cases”. In: Empirical Software Engineering 18.1, pp. 89–116. doi: 10 . 1007/s10664-012-9196-x. Murphy, G. C., R. J. Walker, and E. L. A. Banlassad (1999). “Evaluating emerging software development technolo- gies: lessons learned from assessing aspect-oriented programming”. In: IEEE Transactions on Software Engi- neering 25.4, pp. 438–455. doi: 10.1109/32.799936. Neuendorf, Kimberly A. (Aug. 2016). The Content Analysis Guidebook. 2nd. Sage. isbn: 9781412979474. Neumann, Peter G. (May 2018). “Risks to the Public”. In: ACM SIGSOFT Software Engineering Notes 43.2, pp. 8– 11. doi: 10.1145/3203094.3203102. O’Hearn, Peter W. (2018). “Continuous Reasoning”. In: Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science - LICS’18. ACM Press. doi: 10.1145/3209108.3209109. Oliveira, Jose´ Nuno (2004). “A Survey of Formal Methods Courses in European Higher Education”. In: Teaching Formal Methods. Springer Berlin Heidelberg, pp. 235–248. doi: 10.1007/978-3-540-30472-2_16. Oliveira, Jose´ Nuno et al. (Aug. 2018). Formal Methods Body of Knowledge (FMBoK). url: https : / / web . archive.org/web/20200109111534/https://formalmethods.wikia.org/wiki/FMBoK. Parnas, David Lorge (2010). “Really Rethinking ’Formal Methods’”. In: IEEE Computer 43.1, pp. 28–34. doi: 10. 1109/mc.2010.22. Petersen, Kai et al. (2008). “Systematic Mapping Studies in Software Engineering”. In: 12th International Conference on Evaluation and Assessment in Software Engineering, EASE 2008, University of Bari, Italy, 26-27 June 2008. doi: 10.14236/ewic/ease2008.8. Pfleeger, S. L. and L. Hatton (1997). “Investigating the influence of formal methods”. In: Computer 30.2, pp. 33–43. doi: 10.1109/2.566148. Poston, R. M. and M. P. Sexton (1992). “Evaluating and selecting testing tools”. In: IEEE Software 9.3, pp. 33–42. doi: 10.1109/52.136165. Preprint –Formal Methods Use 33

Riemenschneider, C. K., B. C. Hardgrave, and F. D. Davis (2002). “Explaining software developer acceptance of methodologies: a comparison of five theoretical models”. In: IEEE Transactions on Software Engineering 28.12, pp. 1135–1145. doi: 10.1109/tse.2002.1158287. Robbins, Naomi B. and Richard M. Heiberger (2011). “Plotting Likert and Other Rating Scales”. In: Joint Statistical Meeting, pp. 1058–66. Rushby, John (1994). “Critical system properties: Survey and taxonomy”. In: Reliability Engineering & System Safety 43.2, pp. 189–219. doi: 10.1016/0951-8320(94)90065-5. SEI (2010). CMMI for Development. Tech. rep. CMU/SEI-2010-TR-033. CMU. Shull, Forrest, Janice Singer, and Dag I. K. Sjøberg, eds. (Oct. 2008). Guide to Advanced Empirical Software Engi- neering. London: Springer. Snook, C and R Harrison (2001). “Practitioners’ views on the use of formal methods: an industrial survey by struc- tured interview”. In: Information and Software Technology 43.4, pp. 275 –283. issn: 0950-5849. doi: 10.1016/ S0950-5849(00)00166-X. Sobel, A.E.K. and M.R. Clarkson (Mar. 2002). “Formal methods application: an empirical tale of software develop- ment”. In: IEEE Transactions on Software Engineering 28.3, pp. 308–320. doi: 10.1109/32.991322. The R Project (Aug. 2018). R. The R Project. url: https://web.archive.org/web/20200109063512/http: //www.r-project.org/. Wikipedia contributors (2018). Software engineering demographics — Wikipedia, The Free Encyclopedia. url: https://en.wikipedia.org/w/index.php?title=Software_engineering_demographics\&oldid= 823840899 (visited on 01/09/2020). Wing, J. M. (1990). “A specifier’s introduction to formal methods”. In: Computer 23.9, pp. 8–22. doi: 10.1109/2. 58215. Wohlin, Claes et al. (June 2012). Experimentation in Software Engineering. Springer. isbn: 9783642290435. Woodcock, Jim et al. (Oct. 2009). “Formal Methods: Practice and Experience”. In: ACM Comput. Surv. 41.4, 19:1– 19:36. issn: 0360-0300. doi: 10.1145/1592434.1592436. Preprint –Formal Methods Use 34

ASupplementary Material for “Formal Methods in Dependable Software Engineering:ASurvey”

In the following, we provide additional material to the survey, including

1. a more detailed analysis of responses to certain questions (Appendix A.1), 2. further visualizations of the collected data (Appendices A.2 to A.6), 3. more details on our analysis of related work (Table9 in Appendix A.7), 4. more details on the mapping from studies to challenges (Appendix A.8), 5. a comprehensive table of open answers (Appendix A.9), 6. a copy of the advertisement flyer (Appendix A.10), 7. a screenshot of the Twitter poll (Appendix A.11), and 8. a copy of the whole questionnaire (Appendix A.12).

A.1 Data for Analysis of RQ1 and Estimation of External Validity

Based on the responses for question Q1, the Table8 provides an overview of categories of respondents referred to in our analysis (particularly, in Section 5.2 and Figure 10) along with the corresponding counts based on the sample from 31.3.2019 with N = 220. For question Q1, by practitioner, we mean “practitioner in dependable or mission-critical software engi- neering.” To include respondents from all areas of the population or at any study stage, we generalize “practitioner” by the term “user”. Below, for the questions Q5 and Q6, we then refer to “formal method user” and “FM-non-user”. Preprint –Formal Methods Use 35

Table 8: Overview of categories of respondents for Q1,Q3,Q4,Q5, and Q6

Category of respondents to . . . Description Count Fraction ... Question Q4: Respondents with academic Researchers in academia; bachelor, master, or 156 72% educational background (AEB) PhD students; lecturers, teachers, trainers, coaches Academics with pure transfer AEB cut with researchers in industry, 35 16% experience consulting and managing practitioners, and external consultants; without engineering practitioners in industry and without tool provider stakeholders Academics with practical experience AEB cut with engineering practitioners in 41 21% industry Academics with experience in transfer AEB cut with researchers in industry, 31 14% and practice consulting and managing practitioners, and external consultants; cut with engineering practitioners in industry and without tool provider stakeholders Practitioners incl. transfer practitioners AEB cut with researchers in industry, 86 40% and industrial consultants, all with consulting and managing practitioners, external academic background consultants, and engineering practitioners in industry Pure academics AEB without respondents specifying additional 66 31% roles Respondents not specifying an The complement of AEB 60 27% educational background (NEB) Respondents not specifying an NEB intersected with researchers in industry 13 6% educational background and being researchers in industry Consultants NEB intersected with consulting or managing 23 11% practitioners and external consultants Pure practitioners NEB cut with engineering practitioners in 23 11% industry Tool provider stakeholders not NEB cut with stakeholder of an FM tool or 5 2.3% specifying an educational background service provider Non-academic FM non-users NEB cut with “I have not used FMs in any 19 9% specific role.” Practitioners incl. industrial Consulting and managing practitioners, 108 50% consultants external consultants, and engineering practitioners in industry FM users (all) Respondents having used FMs in one or 212 98% another way and context in the past FM users (beyond students) Excl. “only-in-course” respondents 202 93.5% ... Question Q1: FM non-users Respondents who chose “I have not used FMs 36 17% in any academic or industrial domain.” ... Question Q3: Respondents with no motivation Respondents who selected “no” for all 9 4% motivating factors ... Questions Q5 and Q6: Non-practitioners including FM Respondents who chose “no experience or no 46 21% non-users knowledge”, “studied in (university) course” or “applied in lab, experiments, case studies” for all FM techniques. This group includes laypersons. Sample (N) All valid responses 216 100% Preprint –Formal Methods Use 36

Estimated geographical reachability of population via survey channels 0 2 4 6 8 10 12

DE DE/IT DE/US EU EU/US EU/US/AUS FR UK

Geographical distribution of respondents by email address (TLD, company HQ, if provided) 0 5 10 15 20

AR AT BE DE DK ES FI FR IE IT JP NG NL SE UK US ZA

Figure 19: Geographical analysis of the sample. Legend: top-level domain (TLD), head quarter (HQ)

A.2 Geographical Analysis of the Sample

Figure 19 shows geographical aspects of the sample for this study.

A.3 Usage Intent (UFMi) by Purpose (for Analysis of RQ2) The comparison in Figure 20 (and in the figures of the Appendices A.4 and A.5) contains two columns. The left column describes for each purpose (e.g. specification) how often (e.g. in 2 to 5 separate tasks) respondents have used FMs in the past (UFMp).

The right column describes for each purpose the usage intent (UFMi) depending on how often respondents have used FMs in the past (UFMp). The horizontal bars representing the UFMp frequency categories are listed in descending order by the overall size of both UFMi groups “more often” and “dnk”. We chose to keep dnk-answers visible despite the readability inconvenience caused by the dnks influencing the ordering. However, in the majority of cases the largest group of respondents intending to increase FM use in the future is visible first or near the top.

A.4 Code-based vs. Model-based FMs for Assurance vs. Inspection

The data for this comparison is summarised in Figure 21. Preprint –Formal Methods Use 37

Clarification (N=216) 60 1 7% 34% 59%

40 2 − 5 11% 35% 54% > 5 separate tasks 6% 47% 47% 20 0 45% 13% 42% 0 Specification (N=216) 2 − 5 13% 29% 58% 60 1 12% 31% 56% 40 > 5 separate tasks 4% 53% 43% 20 0 46% 15% 40% 0 Inspection (N=216) 60 2 − 5 11% 37% 52% 40 > 5 separate tasks 5% 47% 48%

20 0 34% 18% 48% 1 16% 41% 44% 0 Synthesis (N=216) 120 > 5 separate tasks 3% 41% 55% 90 2 − 5 16% 32% 53% 60 1 6% 50% 44% 30 0 39% 20% 41% 0 Assurance (N=216) 2 − 5 6% 28% 66% 60 0 40% 11% 49% 40 > 5 separate tasks 6% 50% 44%

20 1 36% 24% 40%

0 rq2_comparison_P4_F5_purpose (future) 0 1 2 − 5> 5 separate tasks no more less equally more often dnk rq2_comparison_P4_F5_purpose (past)

Figure 20: Comparison of past and future usage intent by purpose Preprint –Formal Methods Use 38

Clarification (N=128) 40 1 6% 35% 59% 30 2 − 5 11% 39% 50% 20 > 5 separate tasks 7% 46% 48% 10 0 38% 17% 45% 0 Specification (N=128) 1 7% 33% 60% 40 2 − 5 7% 39% 54% 20 0 47% 11% 42% > 5 separate tasks 6% 55% 40% 0

Inspection (N=128) 50 2 − 5 5% 36% 59% 40 30 0 29% 21% 50% 20 > 5 separate tasks 6% 46% 48% 10 1 7% 53% 40% 0 Synthesis (N=128) 60 > 5 separate tasks 0% 43% 57% 40 1 0% 43% 57%

20 2 − 5 19% 37% 44% 0 36% 24% 41% 0 Assurance (N=128) 2 − 5 5% 23% 72%

40 0 32% 12% 56%

1 22% 22% 56% 20

> 5 separate tasks 7% 55% 38%

0 rq2_comparison_P4_F5_purpose_cbfm (future) 0 1 2 − 5> 5 separate tasks no more less equally more often dnk rq2_comparison_P4_F5_purpose_cbfm (past)

Clarification (N=114) 40 1 0% 33% 67% 30 2 − 5 5% 39% 55% 20 > 5 separate tasks 9% 45% 45% 10 0 35% 20% 45% 0 Specification (N=114) 50 1 0% 40% 60% 40 30 2 − 5 10% 33% 56% 20 0 38% 19% 44% 10 > 5 separate tasks 4% 53% 43% 0 Inspection (N=114) 40 2 − 5 11% 32% 57% 30 1 0% 50% 50% 20 0 6% 44% 50% 10 > 5 separate tasks 4% 53% 43% 0 Synthesis (N=114) 40 2 − 5 14% 32% 54% 30 1 0% 50% 50% 20 > 5 separate tasks 5% 48% 48% 10 0 31% 24% 44% 0 Assurance (N=114) 2 − 5 5% 30% 65%

40 > 5 separate tasks 4% 53% 43%

0 50% 10% 40% 20

1 38% 25% 38%

0 rq2_comparison_P4_F5_purpose_mbfm (future) 0 1 2 − 5> 5 separate tasks no more less equally more often dnk rq2_comparison_P4_F5_purpose_mbfm (past)

Figure 21: Comparison of past and future usage for code-based (top half) and model-based FMs (bottom half) by purpose Preprint –Formal Methods Use 39

A.5 Usage Intent (UFMi) by FM Class (for Analysis of RQ2)

Predicative, relational, practiced 15% 33% 52% or algebraic 40 lab 20% 45% 35% specification (N=216) practiced > 1 6% 60% 34% 20 course 40% 26% 34% 0 none 63% 5% 32%

Modal and temporal logic 60 course 25% 27% 48% specification (N=216) lab 23% 38% 38% 40 none 46% 17% 37% 20 practiced 14% 50% 36% 0 practiced > 1 9% 56% 35%

Process models (N=216) course 34% 19% 47% 60 lab 26% 38% 36% 40 practiced > 1 19% 47% 33% 20 none 63% 4% 33% 0 practiced 12% 59% 29%

80 Dynamical systems practiced 0% 42% 58% (N=216) 60 lab 29% 14% 57% 40 practiced > 1 10% 40% 50% 20 none 52% 13% 35% 0 course 35% 31% 34% Abstract interpretation 60 none 45% 9% 46% (N=216) practiced > 1 5% 49% 46% 40 course 28% 32% 40% 20 lab 16% 45% 39% 0 practiced 16% 47% 37% Assertion checking 60 lab 11% 38% 51% (N=216) practiced 23% 27% 50% 40 course 24% 26% 50% 20 none 45% 6% 48% 0 practiced > 1 4% 57% 39%

Process calculi (N=216) none 47% 13% 40% 60 practiced > 1 10% 52% 39% 40 lab 29% 33% 38% 20 course 31% 33% 35% 0 practiced 36% 43% 21%

Model checking, SMV practiced > 1 4% 39% 57% (N=216) 40 practiced 26% 21% 53% course 23% 29% 48% 20 lab 15% 40% 45% 0 none 48% 20% 33%

Constraint solving practiced 12% 24% 65% (N=216) 40 lab 11% 26% 62% practiced > 1 9% 31% 60% 20 course 26% 22% 52% 0 none 53% 12% 35%

Generic theorem proving 60 practiced 0% 39% 61% (N=216) practiced > 1 3% 40% 57% 40 lab 35% 13% 52% 20 course 25% 29% 45% 0 none 54% 13% 33%

Computational lab 17% 28% 55% 75 engineering, simulation practiced > 1 10% 45% 45% (N=216) 50 course 26% 33% 41% 25 none 39% 24% 37% 0 practiced 7% 60% 33%

Symbolic execution 60 practiced > 1 10% 24% 66% (N=216) practiced 14% 23% 64% 40 course 16% 36% 48% 20 lab 14% 41% 45% 0 none 46% 15% 38% Consistency checking 50 course 17% 20% 63% (N=169) 40 practiced 23% 15% 62%

30 practiced > 1 4% 38% 57%

20 lab 4% 48% 48%

10 none 57% 8% 35%

0 rq2_comparison_P2_P3_F3_F4_usage (future) none course lab practicedpracticed > 1 no more less equally more often dnk rq2_comparison_P2_P3_F3_F4_usage (past) Preprint –Formal Methods Use 40

A.6 Data for the Analysis of RQ3

Figure 22 and the following figures in this section show pairs of matrices, so-called “heatmaps”, useful for association analysis between categorical and ordinal variables. The cells in the matrices represent combinations of the scales, each cell containing data about the mode and median of “degree of difficulty” ratings, their proportion of tough ratings, and the actual numbers of data points. Both the colour gradi- ent (red to white) and the solid vertical lines in the cells represent the tough proportions (left = 0 to right = 100%), with the dotted vertical line signifying the 50% margin. Preprint –Formal Methods Use 41

Comparison of Challenge Difficulty across Purposes (users not practicing FMs, past)

0.42, 29/69, 0.23, 16/69, 0.3, 21/69, 0.25, 17/69, 0.36, 25/69, 0.35, 24/69, 0.45, 31/69, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:tough, mod:tough, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Clarification

0.44, 21/48, 0.19, 9/48, 0.31, 15/48, 0.23, 11/48, 0.35, 17/48, 0.27, 13/48, 0.38, 18/48, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:tough, mod:dnk, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Specification

0.46, 28/61, 0.3, 18/61, 0.31, 19/61, 0.28, 17/61, 0.31, 19/61, 0.28, 17/61, 0.44, 27/61, mod:tough, mod:tough, mod:dnk, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Inspection

0.55, 63/115, 0.35, 40/115, 0.26, 30/115, 0.29, 33/115, 0.37, 43/115, 0.3, 34/115, 0.46, 53/115, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Synthesis

0.38, 21/55, 0.25, 14/55, 0.22, 12/55, 0.22, 12/55, 0.35, 19/55, 0.29, 16/55, 0.4, 22/55, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Assurance

Key/Histogram of Toughs 6 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education Comparison of Challenge Difficulty across Purposes (users practicing FMs, past)

0.67, 99/147, 0.44, 64/147, 0.31, 46/147, 0.4, 59/147, 0.44, 65/147, 0.33, 48/147, 0.48, 70/147, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Clarification

0.64, 107/168, 0.42, 71/168, 0.31, 52/168, 0.39, 65/168, 0.43, 73/168, 0.35, 59/168, 0.49, 83/168, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Specification

0.65, 100/155, 0.4, 62/155, 0.31, 48/155, 0.38, 59/155, 0.46, 71/155, 0.35, 55/155, 0.48, 74/155, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Inspection

0.64, 65/101, 0.4, 40/101, 0.37, 37/101, 0.43, 43/101, 0.47, 47/101, 0.38, 38/101, 0.48, 48/101, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:tough med:moderate med:moderate Synthesis

0.66, 107/161, 0.41, 66/161, 0.34, 55/161, 0.4, 64/161, 0.44, 71/161, 0.35, 56/161, 0.49, 79/161, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Assurance

Key/Histogram of Toughs 8 4 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 22: Comparison of challenge difficulty across purposes (UFMp) Preprint –Formal Methods Use 42

Comparison of Challenge Difficulty across Purposes (respondents with no or decreased intent, future)

0.47, 20/43, 0.21, 9/43, 0.19, 8/43, 0.12, 5/43, 0.37, 16/43, 0.3, 13/43, 0.37, 16/43, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:tough, mod:tough, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Clarification

0.46, 17/37, 0.19, 7/37, 0.19, 7/37, 0.19, 7/37, 0.41, 15/37, 0.35, 13/37, 0.32, 12/37, mod:tough, mod:moderate, mod:dnk, mod:dnk, mod:tough, mod:tough, mod:tough, med:tough med:moderate med:tough med:tough med:tough med:tough med:tough Specification

0.42, 15/36, 0.22, 8/36, 0.14, 5/36, 0.17, 6/36, 0.31, 11/36, 0.22, 8/36, 0.33, 12/36, mod:tough, mod:not an issue, mod:not an issue, mod:dnk, mod:tough, mod:dnk, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:tough med:tough Inspection

0.52, 28/54, 0.2, 11/54, 0.2, 11/54, 0.24, 13/54, 0.33, 18/54, 0.3, 16/54, 0.44, 24/54, mod:tough, mod:moderate, mod:not an issue, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Synthesis

0.41, 16/39, 0.28, 11/39, 0.18, 7/39, 0.15, 6/39, 0.31, 12/39, 0.28, 11/39, 0.26, 10/39, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:not an issue, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Assurance

Key/Histogram of Toughs 8 4 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education Comparison of Challenge Difficulty across Purposes (respondents with same or increased intent, future)

0.64, 97/151, 0.43, 65/151, 0.35, 53/151, 0.42, 63/151, 0.45, 68/151, 0.34, 52/151, 0.51, 77/151, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Clarification

0.63, 106/167, 0.42, 70/167, 0.34, 57/167, 0.39, 65/167, 0.43, 72/167, 0.33, 55/167, 0.5, 84/167, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Specification

0.66, 105/160, 0.42, 67/160, 0.35, 56/160, 0.4, 64/160, 0.44, 71/160, 0.37, 59/160, 0.51, 81/160, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Inspection

0.63, 81/128, 0.45, 58/128, 0.35, 45/128, 0.4, 51/128, 0.45, 58/128, 0.37, 47/128, 0.46, 59/128, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Synthesis

0.67, 107/159, 0.4, 64/159, 0.33, 53/159, 0.4, 63/159, 0.45, 72/159, 0.34, 54/159, 0.51, 81/159, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:tough med:moderate med:tough Assurance

Key/Histogram of Toughs 8 4 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 23: Comparison of challenge difficulty across purposes (UFMi) Preprint –Formal Methods Use 43

Comparison of Challenge Difficulty across Motivations (users without motivation)

0.5, 56/112, 0.33, 37/112, 0.23, 26/112, 0.29, 32/112, 0.41, 46/112, 0.27, 30/112, 0.43, 48/112, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Regulatory authorities

0.36, 21/59, 0.24, 14/59, 0.24, 14/59, 0.2, 12/59, 0.25, 15/59, 0.22, 13/59, 0.41, 24/59, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Customers / scientific community

0.45, 33/73, 0.26, 19/73, 0.27, 20/73, 0.29, 21/73, 0.3, 22/73, 0.3, 22/73, 0.44, 32/73, mod:tough, mod:dnk, mod:dnk, mod:dnk, mod:tough, mod:dnk, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Employer / research collaborators

0.56, 66/117, 0.32, 37/117, 0.26, 30/117, 0.29, 34/117, 0.4, 47/117, 0.34, 40/117, 0.47, 55/117, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:tough med:tough Superior / principal investigator

0.42, 28/66, 0.27, 18/66, 0.27, 18/66, 0.23, 15/66, 0.29, 19/66, 0.27, 18/66, 0.45, 30/66, mod:tough, mod:dnk, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Study or research program

0.42, 21/50, 0.34, 17/50, 0.32, 16/50, 0.18, 9/50, 0.3, 15/50, 0.32, 16/50, 0.44, 22/50, mod:tough, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:tough, med:tough med:tough med:tough med:moderate med:tough med:tough med:tough Own interest

0.56, 63/112, 0.37, 41/112, 0.29, 33/112, 0.35, 39/112, 0.43, 48/112, 0.32, 36/112, 0.54, 60/112, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough On behalf of an FM tool

Key/Histogram of Toughs 6 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education Comparison of Challenge Difficulty across Motivations (users with at least moderate motivation)

0.69, 72/104, 0.41, 43/104, 0.39, 41/104, 0.42, 44/104, 0.42, 44/104, 0.4, 42/104, 0.51, 53/104, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:moderate med:moderate med:tough Regulatory authorities

0.68, 107/157, 0.42, 66/157, 0.34, 53/157, 0.41, 64/157, 0.48, 75/157, 0.38, 59/157, 0.49, 77/157, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Customers / scientific community

0.66, 95/143, 0.43, 61/143, 0.33, 47/143, 0.38, 55/143, 0.48, 68/143, 0.35, 50/143, 0.48, 69/143, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Employer / research collaborators

0.63, 62/99, 0.43, 43/99, 0.37, 37/99, 0.42, 42/99, 0.43, 43/99, 0.32, 32/99, 0.46, 46/99, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:moderate med:moderate med:moderate Superior / principal investigator

0.67, 100/150, 0.41, 62/150, 0.33, 49/150, 0.41, 61/150, 0.47, 71/150, 0.36, 54/150, 0.47, 71/150, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Study or research program

0.64, 107/166, 0.38, 63/166, 0.31, 51/166, 0.4, 67/166, 0.45, 75/166, 0.34, 56/166, 0.48, 79/166, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Own interest

0.62, 65/104, 0.38, 39/104, 0.33, 34/104, 0.36, 37/104, 0.4, 42/104, 0.35, 36/104, 0.39, 41/104, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:moderate, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:moderate On behalf of an FM tool

Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 24: Comparison of challenge difficulty across motivations Preprint –Formal Methods Use 44

Comparison of Challenge Difficulty across FMs (users not practicing FMs, past) 0.58, 82/142, 0.32, 45/142, 0.28, 40/142, 0.3, 43/142, 0.39, 55/142, 0.34, 48/142, 0.46, 65/142, Predicative, relational, or algebraic mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:tough med:tough specification 0.55, 87/159, 0.39, 62/159, 0.33, 52/159, 0.31, 49/159, 0.4, 63/159, 0.37, 59/159, 0.49, 78/159, mod:tough, mod:tough, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Modal and temporal logic specification 0.53, 75/142, 0.38, 54/142, 0.34, 48/142, 0.35, 49/142, 0.42, 59/142, 0.33, 47/142, 0.42, 59/142, mod:tough, mod:tough, mod:tough, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Process models 0.57, 104/184, 0.37, 68/184, 0.31, 57/184, 0.34, 62/184, 0.4, 74/184, 0.33, 61/184, 0.48, 88/184, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Dynamical systems 0.57, 91/160, 0.34, 54/160, 0.31, 49/160, 0.34, 55/160, 0.44, 71/160, 0.33, 53/160, 0.43, 69/160, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Abstract interpretation 0.59, 73/124, 0.34, 42/124, 0.31, 38/124, 0.34, 42/124, 0.42, 52/124, 0.34, 42/124, 0.44, 54/124, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Assertion checking 0.56, 96/171, 0.36, 61/171, 0.3, 52/171, 0.35, 59/171, 0.4, 68/171, 0.33, 56/171, 0.5, 85/171, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Process calculi 0.53, 80/151, 0.32, 48/151, 0.29, 44/151, 0.3, 46/151, 0.4, 60/151, 0.33, 50/151, 0.46, 70/151, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Model checking, SMV 0.57, 94/164, 0.34, 55/164, 0.27, 45/164, 0.32, 52/164, 0.42, 69/164, 0.35, 57/164, 0.48, 79/164, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Constraint solving 0.55, 93/168, 0.35, 58/168, 0.28, 47/168, 0.3, 51/168, 0.41, 69/168, 0.33, 55/168, 0.45, 75/168, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Generic theorem proving 0.58, 100/172, 0.38, 65/172, 0.29, 50/172, 0.33, 56/172, 0.38, 66/172, 0.32, 55/172, 0.45, 77/172, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Computational engineering, simulation 0.58, 88/153, 0.32, 49/153, 0.28, 43/153, 0.31, 48/153, 0.39, 59/153, 0.32, 49/153, 0.42, 64/153, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Symbolic execution 0.61, 67/109, 0.39, 42/109, 0.29, 32/109, 0.31, 34/109, 0.39, 43/109, 0.31, 34/109, 0.49, 53/109, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Consistency checking Key/Histogram of Toughs 15 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Comparison of Challenge Difficulty across FMs (users practicing FMs, past) 0.62, 46/74, 0.47, 35/74, 0.36, 27/74, 0.45, 33/74, 0.47, 35/74, 0.32, 24/74, 0.49, 36/74, Predicative, relational, or algebraic mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough specification 0.72, 41/57, 0.32, 18/57, 0.26, 15/57, 0.47, 27/57, 0.47, 27/57, 0.23, 13/57, 0.4, 23/57, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:moderate, med:tough med:moderate med:moderate med:moderate med:moderate med:moderate med:moderate Modal and temporal logic specification 0.72, 53/74, 0.35, 26/74, 0.26, 19/74, 0.36, 27/74, 0.42, 31/74, 0.34, 25/74, 0.57, 42/74, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:moderate med:moderate med:tough Process models 0.75, 24/32, 0.38, 12/32, 0.31, 10/32, 0.44, 14/32, 0.5, 16/32, 0.34, 11/32, 0.41, 13/32, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:moderate, med:tough med:moderate med:moderate med:moderate med:tough med:moderate med:moderate Dynamical systems 0.66, 37/56, 0.46, 26/56, 0.32, 18/56, 0.38, 21/56, 0.34, 19/56, 0.34, 19/56, 0.57, 32/56, mod:tough, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:moderate med:moderate med:tough Abstract interpretation 0.6, 55/92, 0.41, 38/92, 0.32, 29/92, 0.37, 34/92, 0.41, 38/92, 0.33, 30/92, 0.51, 47/92, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Assertion checking 0.71, 32/45, 0.42, 19/45, 0.33, 15/45, 0.38, 17/45, 0.49, 22/45, 0.36, 16/45, 0.36, 16/45, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:moderate, med:tough med:moderate med:moderate med:moderate med:tough med:moderate med:moderate Process calculi 0.74, 48/65, 0.49, 32/65, 0.35, 23/65, 0.46, 30/65, 0.46, 30/65, 0.34, 22/65, 0.48, 31/65, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Model checking, SMV 0.65, 34/52, 0.48, 25/52, 0.42, 22/52, 0.46, 24/52, 0.4, 21/52, 0.29, 15/52, 0.42, 22/52, mod:tough, mod:tough, mod:tough, mod:tough, mod:tough, mod:moderate, mod:moderate, med:tough med:tough med:moderate med:tough med:moderate med:moderate med:moderate Constraint solving 0.73, 35/48, 0.46, 22/48, 0.42, 20/48, 0.52, 25/48, 0.44, 21/48, 0.35, 17/48, 0.54, 26/48, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:moderate med:moderate med:tough Generic theorem proving 0.64, 28/44, 0.34, 15/44, 0.39, 17/44, 0.45, 20/44, 0.55, 24/44, 0.39, 17/44, 0.55, 24/44, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Computational engineering, simulation 0.63, 40/63, 0.49, 31/63, 0.38, 24/63, 0.44, 28/63, 0.49, 31/63, 0.37, 23/63, 0.59, 37/63, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Symbolic execution 0.6, 36/60, 0.38, 23/60, 0.32, 19/60, 0.43, 26/60, 0.5, 30/60, 0.3, 18/60, 0.52, 31/60, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Consistency checking Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 25: Comparison of challenge difficulty across FM classes (UFMp) Preprint –Formal Methods Use 45

Comparison of Challenge Difficulty across FMs (users with no or decreased intent, future) 0.48, 30/63, 0.29, 18/63, 0.22, 14/63, 0.29, 18/63, 0.35, 22/63, 0.27, 17/63, 0.33, 21/63, Predicative, relational, or algebraic mod:tough, mod:tough, mod:not an issue, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough specification 0.52, 27/52, 0.21, 11/52, 0.27, 14/52, 0.31, 16/52, 0.35, 18/52, 0.31, 16/52, 0.44, 23/52, mod:tough, mod:dnk, mod:tough, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough Modal or temporal logic specification 0.53, 35/66, 0.27, 18/66, 0.24, 16/66, 0.33, 22/66, 0.41, 27/66, 0.29, 19/66, 0.39, 26/66, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Process models 0.49, 38/78, 0.31, 24/78, 0.24, 19/78, 0.33, 26/78, 0.4, 31/78, 0.27, 21/78, 0.41, 32/78, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Dynamical system models 0.44, 25/57, 0.26, 15/57, 0.23, 13/57, 0.23, 13/57, 0.39, 22/57, 0.25, 14/57, 0.39, 22/57, mod:tough, mod:moderate, mod:dnk, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Abstract interpretation 0.37, 14/38, 0.29, 11/38, 0.24, 9/38, 0.18, 7/38, 0.29, 11/38, 0.24, 9/38, 0.26, 10/38, mod:tough, mod:tough, mod:not an issue, mod:dnk, mod:dnk, mod:not an issue, mod:dnk, med:tough med:tough med:tough med:tough med:tough med:moderate med:tough Assertion checking 0.53, 38/72, 0.29, 21/72, 0.26, 19/72, 0.26, 19/72, 0.38, 27/72, 0.39, 28/72, 0.46, 33/72, mod:tough, mod:moderate, mod:moderate, mod:dnk, mod:tough, mod:tough, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:tough med:tough Process calculi 0.47, 23/49, 0.18, 9/49, 0.2, 10/49, 0.24, 12/49, 0.33, 16/49, 0.31, 15/49, 0.39, 19/49, mod:tough, mod:moderate, mod:dnk, mod:dnk, mod:tough, mod:tough, mod:tough, med:tough med:moderate med:tough med:tough med:tough med:tough med:tough Model checking, SMV 0.44, 24/55, 0.24, 13/55, 0.18, 10/55, 0.25, 14/55, 0.31, 17/55, 0.31, 17/55, 0.4, 22/55, mod:tough, mod:moderate, mod:moderate, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Constraint solving 0.57, 38/67, 0.25, 17/67, 0.21, 14/67, 0.21, 14/67, 0.37, 25/67, 0.31, 21/67, 0.42, 28/67, mod:tough, mod:moderate, mod:not an issue, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Generic theorem proving 0.5, 29/58, 0.26, 15/58, 0.16, 9/58, 0.31, 18/58, 0.38, 22/58, 0.24, 14/58, 0.4, 23/58, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Simulation 0.52, 26/50, 0.2, 10/50, 0.22, 11/50, 0.24, 12/50, 0.4, 20/50, 0.28, 14/50, 0.36, 18/50, mod:tough, mod:moderate, mod:dnk, mod:dnk, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:tough med:tough med:tough med:tough med:tough Symbolic execution 0.48, 19/40, 0.2, 8/40, 0.18, 7/40, 0.28, 11/40, 0.48, 19/40, 0.3, 12/40, 0.4, 16/40, mod:tough, mod:moderate, mod:dnk, mod:dnk, mod:tough, mod:tough, mod:tough, med:tough med:moderate med:tough med:tough med:tough med:tough med:tough Consistency checking Key/Histogram of Toughs 15 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education Comparison of Challenge Difficulty across FMs (users with same or increased intent, future) 0.68, 86/127, 0.42, 53/127, 0.34, 43/127, 0.38, 48/127, 0.46, 58/127, 0.34, 43/127, 0.53, 67/127, Predicative, relational, or algebraic mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough specification 0.63, 86/137, 0.42, 58/137, 0.3, 41/137, 0.38, 52/137, 0.45, 62/137, 0.34, 46/137, 0.5, 68/137, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Modal or temporal logic specification 0.63, 78/123, 0.41, 50/123, 0.33, 41/123, 0.37, 46/123, 0.44, 54/123, 0.35, 43/123, 0.51, 63/123, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Process models 0.69, 68/99, 0.37, 37/99, 0.33, 33/99, 0.35, 35/99, 0.4, 40/99, 0.39, 39/99, 0.46, 46/99, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Dynamical system models 0.68, 83/122, 0.44, 54/122, 0.32, 39/122, 0.39, 48/122, 0.4, 49/122, 0.36, 44/122, 0.5, 61/122, mod:tough, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Abstract interpretation 0.65, 105/161, 0.39, 63/161, 0.3, 49/161, 0.39, 62/161, 0.46, 74/161, 0.35, 57/161, 0.52, 84/161, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Assertion checking 0.68, 67/98, 0.43, 42/98, 0.32, 31/98, 0.41, 40/98, 0.46, 45/98, 0.31, 30/98, 0.5, 49/98, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:tough med:moderate med:tough Process calculi 0.66, 97/146, 0.45, 65/146, 0.35, 51/146, 0.41, 60/146, 0.46, 67/146, 0.36, 53/146, 0.51, 74/146, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Model checking, SMV 0.68, 94/138, 0.44, 61/138, 0.33, 46/138, 0.41, 56/138, 0.44, 61/138, 0.34, 47/138, 0.49, 68/138, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Constraint solving 0.59, 66/111, 0.44, 49/111, 0.35, 39/111, 0.45, 50/111, 0.45, 50/111, 0.32, 36/111, 0.51, 57/111, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Generic theorem proving 0.64, 83/130, 0.4, 52/130, 0.37, 48/130, 0.36, 47/130, 0.43, 56/130, 0.37, 48/130, 0.46, 60/130, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Simulation 0.63, 88/140, 0.44, 61/140, 0.33, 46/140, 0.39, 54/140, 0.44, 61/140, 0.34, 48/140, 0.49, 68/140, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Symbolic execution 0.66, 70/106, 0.42, 45/106, 0.33, 35/106, 0.37, 39/106, 0.42, 45/106, 0.29, 31/106, 0.52, 55/106, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Consistency checking Key/Histogram of Toughs 15 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 26: Comparison of challenge difficulty across FM classes (UFMi) Preprint –Formal Methods Use 46

Comparison of Challenge Difficulty across Roles (Past)

0.63, 66/105, 0.37, 39/105, 0.3, 32/105, 0.38, 40/105, 0.41, 43/105, 0.31, 33/105, 0.48, 50/105, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Bachelor, master, or PhD student

0.57, 28/49, 0.39, 19/49, 0.33, 16/49, 0.39, 19/49, 0.39, 19/49, 0.33, 16/49, 0.47, 23/49, Consulting or managing practitioner in mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:moderate med:moderate med:tough industry

0.65, 32/49, 0.49, 24/49, 0.35, 17/49, 0.47, 23/49, 0.53, 26/49, 0.39, 19/49, 0.45, 22/49, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:moderate, med:tough med:tough med:moderate med:tough med:tough med:moderate med:moderate External consultant

0.76, 81/107, 0.49, 52/107, 0.41, 44/107, 0.47, 50/107, 0.54, 58/107, 0.38, 41/107, 0.5, 54/107, mod:tough, mod:tough, mod:tough, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Researcher in academia

0.71, 40/56, 0.46, 26/56, 0.38, 21/56, 0.41, 23/56, 0.39, 22/56, 0.34, 19/56, 0.46, 26/56, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Researcher in industry

0.71, 63/89, 0.43, 38/89, 0.38, 34/89, 0.51, 45/89, 0.47, 42/89, 0.39, 35/89, 0.53, 47/89, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Lecturer, teacher, trainer, or coach

0.75, 18/24, 0.38, 9/24, 0.33, 8/24, 0.54, 13/24, 0.38, 9/24, 0.21, 5/24, 0.46, 11/24, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:moderate med:moderate med:moderate Stakeholder of an FM tool or

0.58, 38/66, 0.44, 29/66, 0.26, 17/66, 0.32, 21/66, 0.33, 22/66, 0.29, 19/66, 0.45, 30/66, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Engineering practitioner in industry

0.21, 4/19, 0.11, 2/19, 0.26, 5/19, 0.16, 3/19, 0.32, 6/19, 0.32, 6/19, 0.42, 8/19, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:tough, med:tough med:dnk med:tough med:tough med:tough med:tough med:tough I have not used FMs in

Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Comparison of Challenge Difficulty across Roles (Future)

0.54, 26/48, 0.44, 21/48, 0.35, 17/48, 0.38, 18/48, 0.44, 21/48, 0.33, 16/48, 0.52, 25/48, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Bachelor, master, or PhD student

0.56, 49/87, 0.43, 37/87, 0.33, 29/87, 0.36, 31/87, 0.36, 31/87, 0.3, 26/87, 0.55, 48/87, Consulting or managing practitioner in mod:tough, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough industry

0.69, 61/89, 0.46, 41/89, 0.37, 33/89, 0.43, 38/89, 0.46, 41/89, 0.34, 30/89, 0.51, 45/89, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough External consultant

0.72, 64/89, 0.52, 46/89, 0.4, 36/89, 0.45, 40/89, 0.48, 43/89, 0.4, 36/89, 0.52, 46/89, mod:tough, mod:tough, mod:tough, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Lecturer, teacher, trainer, or coach

0.7, 79/113, 0.48, 54/113, 0.35, 39/113, 0.42, 47/113, 0.49, 55/113, 0.36, 41/113, 0.47, 53/113, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Researcher in academia

0.72, 28/39, 0.54, 21/39, 0.36, 14/39, 0.44, 17/39, 0.38, 15/39, 0.26, 10/39, 0.41, 16/39, mod:tough, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:moderate, med:tough med:tough med:moderate med:tough med:moderate med:moderate med:moderate Stakeholder of an FM tool or

0.69, 62/90, 0.46, 41/90, 0.36, 32/90, 0.36, 32/90, 0.39, 35/90, 0.36, 32/90, 0.46, 41/90, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Researcher in industry

0.58, 55/95, 0.36, 34/95, 0.32, 30/95, 0.33, 31/95, 0.35, 33/95, 0.35, 33/95, 0.51, 48/95, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Engineering practitioner in industry

0.4, 8/20, 0.2, 4/20, 0.2, 4/20, 0.1, 2/20, 0.3, 6/20, 0.3, 6/20, 0.3, 6/20, mod:tough, mod:dnk, mod:not an issue, mod:dnk, mod:dnk, mod:dnk, mod:not an issue, med:tough med:tough med:tough med:tough med:tough med:tough med:tough I do not or would not

Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 27: Comparison of challenge difficulty across roles Preprint –Formal Methods Use 47

Comparison of Challenge Difficulty across Domains (Past)

0.6, 25/42, 0.4, 17/42, 0.45, 19/42, 0.52, 22/42, 0.38, 16/42, 0.4, 17/42, 0.6, 25/42, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:moderate med:moderate med:tough Business information

0.6, 24/40, 0.4, 16/40, 0.4, 16/40, 0.48, 19/40, 0.45, 18/40, 0.28, 11/40, 0.48, 19/40, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Platforms

0.71, 55/78, 0.45, 35/78, 0.35, 27/78, 0.42, 33/78, 0.44, 34/78, 0.35, 27/78, 0.44, 34/78, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:tough med:tough med:moderate med:tough Supportive

0.68, 43/63, 0.44, 28/63, 0.3, 19/63, 0.41, 26/63, 0.44, 28/63, 0.38, 24/63, 0.41, 26/63, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:moderate, med:tough med:moderate med:moderate med:moderate med:moderate med:moderate med:moderate Critical infrastructures

0.59, 23/39, 0.38, 15/39, 0.33, 13/39, 0.44, 17/39, 0.41, 16/39, 0.36, 14/39, 0.46, 18/39, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:moderate med:moderate med:tough Device industry

0.78, 14/18, 0.28, 5/18, 0.39, 7/18, 0.33, 6/18, 0.5, 9/18, 0.28, 5/18, 0.44, 8/18, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:moderate, med:tough med:moderate med:moderate med:moderate med:tough med:moderate med:moderate Military systems not in the above

0.75, 6/8, 0.38, 3/8, 0.5, 4/8, 0.5, 4/8, 0.62, 5/8, 0.5, 4/8, 0.62, 5/8, mod:tough, mod:moderate, mod:tough, mod:tough, mod:tough, mod:tough, mod:tough, med:tough med:moderate med:tough med:tough med:tough med:tough med:tough Other

0.71, 22/31, 0.35, 11/31, 0.32, 10/31, 0.35, 11/31, 0.42, 13/31, 0.35, 11/31, 0.52, 16/31, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:tough, mod:moderate, mod:tough, med:tough med:moderate med:moderate med:moderate med:moderate med:moderate med:tough Industrial machinery

0.71, 55/78, 0.46, 36/78, 0.33, 26/78, 0.36, 28/78, 0.45, 35/78, 0.42, 33/78, 0.5, 39/78, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:tough med:moderate med:tough Transportation

0.67, 20/30, 0.47, 14/30, 0.33, 10/30, 0.4, 12/30, 0.43, 13/30, 0.4, 12/30, 0.6, 18/30, mod:tough, mod:tough, mod:moderate, mod:moderate, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Process automation

0.28, 10/36, 0.17, 6/36, 0.31, 11/36, 0.28, 10/36, 0.31, 11/36, 0.25, 9/36, 0.39, 14/36, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:dnk, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough I have not used FMs in Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Comparison of Challenge Difficulty across Domains (Future)

0.64, 48/75, 0.45, 34/75, 0.37, 28/75, 0.51, 38/75, 0.47, 35/75, 0.37, 28/75, 0.49, 37/75, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Business information

0.64, 61/96, 0.45, 43/96, 0.35, 34/96, 0.44, 42/96, 0.42, 40/96, 0.28, 27/96, 0.45, 43/96, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Platforms

0.68, 69/101, 0.47, 47/101, 0.31, 31/101, 0.4, 40/101, 0.44, 44/101, 0.34, 34/101, 0.48, 48/101, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Supportive

0.62, 84/135, 0.38, 51/135, 0.36, 49/135, 0.48, 65/135, 0.47, 64/135, 0.33, 44/135, 0.49, 66/135, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Critical infrastructures

0.61, 70/114, 0.41, 47/114, 0.32, 37/114, 0.44, 50/114, 0.48, 55/114, 0.34, 39/114, 0.54, 61/114, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Device industry

0.63, 50/79, 0.44, 35/79, 0.35, 28/79, 0.43, 34/79, 0.41, 32/79, 0.33, 26/79, 0.48, 38/79, mod:tough, mod:tough, mod:moderate, mod:tough, mod:moderate, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Military systems not in the above

0.65, 60/92, 0.43, 40/92, 0.32, 29/92, 0.45, 41/92, 0.46, 42/92, 0.35, 32/92, 0.52, 48/92, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Industrial machinery

0.64, 90/141, 0.38, 53/141, 0.29, 41/141, 0.37, 52/141, 0.41, 58/141, 0.32, 45/141, 0.5, 71/141, mod:tough, mod:moderate, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:tough med:tough med:moderate med:tough Transportation

0.64, 57/89, 0.47, 42/89, 0.34, 30/89, 0.4, 36/89, 0.44, 39/89, 0.35, 31/89, 0.54, 48/89, mod:tough, mod:tough, mod:moderate, mod:tough, mod:tough, mod:moderate, mod:tough, med:tough med:tough med:moderate med:moderate med:moderate med:moderate med:tough Process automation

0.5, 1/2, 0.5, 1/2, 0.5, 1/2, 0.5, 1/2, 0.5, 1/2, 0, 0/2, 1, 2/2, mod:moderate, mod:moderate, mod:not an issue, mod:not an issue, mod:moderate, mod:moderate, mod:tough, med:moderate med:moderate med:moderate med:moderate med:moderate med:moderate med:tough Other

0.42, 8/19, 0.26, 5/19, 0.21, 4/19, 0.16, 3/19, 0.26, 5/19, 0.32, 6/19, 0.37, 7/19, mod:tough, mod:dnk, mod:not an issue, mod:not an issue, mod:dnk, mod:dnk, mod:tough, med:tough med:tough med:tough med:tough med:tough med:tough med:tough I would not or do not Key/Histogram of Toughs 10 support # Cells 0 Scalability Transfer of Transfer

Reusability of 0 0.2 0.4 0.6 0.8 1 from irrelevant Tough−proportion of all ratings per cell Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education

Figure 28: Comparison of challenge difficulty across domains Preprint –Formal Methods Use 48

Comparison of Challenge Difficulty as rated by Expert Users (> 3 years of experience) 1.0

0.8

0.6

0.4

0.2

0.0 % of expertchallenge as 'tough' users rating details support Scalability Transfer of Transfer Reusability of from irrelevant Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education from models to reality

Comparison of Challenge Difficulty as rated by non−Expert Users (<= 3 years of experience) 1.0

0.8

0.6

0.4

0.2

0.0 % of non−expertschallenge as 'tough' rating details support Scalability Transfer of Transfer Reusability of from irrelevant Maintainability of verification results verification results verification results verification Automation or tool Automation Proper abstractions Skills and education from models to reality

Figure 29: Comparison of expert and non-expert users by their perception of challenge difficulty

A.7 Details on the Systematic Map

Table9 contains the data we collected from the literature for the systematic map. Preprint –Formal Methods Use 49 cult to ffi freedom of / compatibility / Continued on next page communicate to customers, lack of expressiveness expression, lack of methodology Lack of skills and education, lack of tools, changeability with existing process (method culture) Myth 1: improper abstraction; transfer of v.results; (myth 2-4: skills and directed education;) myth 5: time-budget restrictions; myth 6: improper abstraction, tool support (usability); myth 7: scalability Proper abstraction (formality gap, neglected environmental assumptions) Handling of incomplete specs, lack of verification tools, costly training, changing management style Formality gap, appropriate abstraction and structuring, lack of skills and education Inherent informality of formalisation, di Challenges could be considered obstacles Rejection of common Hypothesis about FMs. Evaluation of a single FM by means of astudy. case Justifies the FM classification used in our questionnaire Investigation of state of the art, no comparison with future. design of applicable FMs Observations contain obstacles, recommendations their alleviation taxonomy of FMs; / education, hiring, and tool support. Formal methods are powerful tools which must be better understood by developers at large. Overview analysis of limitations At the present time, formal methods are good for the description of sequential properties of systems, and for communication protocols, although they do not yet address temporal properties and concurrency particularly well. Identification of three classes of errors: specification errors, systematic construction errors, proved program errors. Discussion of potential causes of errors of each class. Table 9: Details for the classification of related work Personal opinion, experience Identify 3 main challenges: They evaluate FMs on one larger case study ( 50.000 lines of Objective C Code) where they use Z (550schemas Z to define 280 operations) to develop a CASE tool. and analysis Reference to technology transfer, existing case studies, and tool support Evaluation of methodologies, error classification and analysis, using several examples of fundamental algorithms Development method based on FM myths. Introduction to formal methods Pinpoint fallibility in the use of FMs Bjorner ( 1987 ) Proposes a software Hall ( 1990 ) Present and test FM Wing ( 1990 ) FM adoptionBloomfield et al. ( 1991 ) Literature review, summary, ReferenceGerhart and Yelowitz Motivation( 1976 ) ApproachJackson ( 1987 ) Generic evaluation Expert opinion Result FMs have inherent limitations Aim of method transfer, Relation List of Obstacles Preprint –Formal Methods Use 50 to / benefit evidence / results from / Continued on next page education, transfer of training, not suitable for / / benefit evidence, change / Math, tools, lack of cost resistance Stated as recommendations: scalability, lack of tool support, lack of skills verif. obl. code; resource constraints Lack of method and tool support, lack of skills requirements prototyping, lack of cost Time-budget-restrictions, lack of tool support, lack of integration in current process, scalability (multi-tech) Stated as commandments: lack of tool support (documentation guidelines), proper abstraction, budget restrictions (bad cost-benefit ratio), lack of experts (skills and education); compatibility with current process (lack of quality culture); lack of reuseability VDM users in / Our questionnaire has less focus on representation and methodology and excludes questions on benefits, suggestions. Their sample mainly covers Z the UK. Our analysis of past use is more elaborate. Study of FMs in industrial practice Add suggestions on FM introduction Rejection of common Hypothesis about FMs. Evaluation with reference to case studies. Investigation of FM usage; Argumentation and Examples. Identify obstacles and suggest to improve education and standardisation and to perform case studies and define metrics. seriously and successfully by industry Present a two-dimensional framework for assessing strategies for incorporating formal specifications in software development More real links between industry and academia are required, and the successful use of formal methods must be better publicized. 10 Hypotheses on how to improve the success of FM usage Literature survey and questionnaire Analysis of 12 case studies FMs are beginning to be used Discuss benefits and problems of FM adoption, literature study Based on observations (by ourselves and others) on a number of recently completed and in-progress projects Argumentation and mentioning of case studies (although no reference to the studies are given). Lack of acceptance in industry Determine state of the art about the use of FM in practice Lack of FM adoption, improvement of RE Identify maxims that may help in the application of formal methods in an industrial setting. Re-examine Hall’s myths, introduce 7 new Myths. ReferenceAustin and Parkin ( 1993 ) Motivation ApproachCraigen et al. ( 1993 ) ResultFraser et al. ( 1994 ) RelationBowen and Hinchey ( 1995b ) List of Obstacles Bowen and Hinchey ( 1995a ) Preprint –Formal Methods Use 51 regulation; / education, lack / education, / misconception); / comprehensibility); / Continued on next page proof maintainability; / ectiveness) ff Practicality (being too academic): scalability, skills and education, resistance to change (compat with existing process); budget restrictions Lack of skills of experts, improper abstraction (low useability, low correctness); budget restrictions; compatibility with existing process Wrong skills and education, lack of clarification (bad reputation lack of standards lack of tools; Inadequate tools and examples, inadequate transfer Lack of empirical evidence, lack of skills scalability, proper abstractions; compatibility with existing process, lack of tool support Cost-benefit (fault removal e Integration in existing process (tools, methods, environments, people); proper and useful abstractions (incl. usability tool support (group development, collaborative eng.); evolution and spec budget constraints orts ff ectiveness shown ff Method tries to overcome obstacles The reasons can be considered as obstacles They identify obstacles. However, they to not provide evidence against or in favor of them. Do not measure usage intent but highlight lack of transfer e Reasons can be considered obstacles Empirical evidence for FM e for FMs in design phase but not in more general, however only one system Evaluation of FMs in practice ect on ff culty to re-invest ffi approach to FM practicality, too academic, Education, Resistance to change, di Misconception of Myths, Standards, Tools, and Education are obstacles to use FM in industry. List of impediments to FM adoption 8 Reasons why academia needs to do industry research, 7 catalysts, 12 industry relevant factors FM have a positive e code quality state-charts ect Observation Proposes a pragmatic / ff Personal experiencePersonal Opinion Provide 5 reasons: Experience in editing a collection of essays on the industrial application of FM. report, expert opinion Personal opinions; critize research transfer and suggest improvements that might also be helpful for FM transferpractice to Case study and e analysis: Comparison of change requests for FM-based and non-FM-based code fragments as a result of postdelivery problems caused by these fragments. Elementary Field Experiment Evaluation of Z, PVS, and ectiveness of ff Investigate reasons why academic methods are not adapted by industry not widely used Identify reasons for industry’s reluctance to take formal methods to heart. FM adoption(FM) not Position used statement, in experience communication industry Investigate e FMs FMs are not accepted by industry ReferenceLai and Leung ( 1995 ) MotivationHeisel ( 1996 ) Approach In practice FMs are Hinchey and Bowen ( 1996 ) ResultHolloway and Butler ( 1996 ) Lai ( 1996 ) Academic methods RelationPfleeger and Hatton ( 1997 ) List of Obstacles Knight et al. ( 1997 ) Preprint –Formal Methods Use 52 education), / education, compatibility / / resource constraints / Continued on next page cult to integrate discrete ffi Lack of skills improper abstractions (understandability); transfer of verif.results (models to reality) Tool support still an issue despite some case studies Lack of tool support, lackempirical of evidence; lack of experience (skills budget restrictions Budget (high entry costs, cost-benefit), lack of tool support (automation) Lessons from case studies: scalability and proper abstraction (useability) and continuous abstractions); lack of training; lack of interest from engineers; FMs are perceived as too expensive to apply abstraction, process changeability Inadequate abstraction (di Empirical investigation of the benefits of FMindustry in They investigate the use of FM in industry identifying obstacles identifying obstacles They investigate the applicability of one FM in industry Usability as obstacle Tool support, proper Interesting discussion of FM weaknesses in a very relevant application context ectively used to find errors ff with little or no additional lifecycle costs empirically validate commandments with little conclusion 4 challenges where identified Empirical Study Identify several challenges Empirical Study Model Checking can be e early in the development process for many classes of models Provide a set of guidelines how to make FM more usable PVS, and state-charts) for specifying and aggregating requirements of aircraft engine control systems Personal experience, reference to literature Structured Questionnaire on 62 industrial projects Structured Questionnaire on 62 industrial projects (claimed to be most comprehensive review ever made of formal methods application in industry) 3 Case studies about theof use Model Checking in Industry 5 Structured Interviews Improved quality of software Reference to a few case studies Single case study in the lab Applied discrete FMs (e.g. Z, Re-examination of their 1995 commandments 10 Years later No recent study on the industrial use of FMs. State of the art of industrial use of FM (extension of Bicarregui et al., 2009 ) FMs are not widely used in Industry Lack of empirical investigation on the use of FM FM tools are useful but not used in industry since people are not skilled enough Lack of FM adoption, improvement of requirements Bowen and Hinchey ( 2005 ) Bicarregui et al. ( 2009 ) Woodcock et al. ( 2009 ) Miller et al. ( 2010 ) Snook and Harrison ( 2001 ) Heitmeyer ( 1998 ) ReferenceGalloway et al. ( 1998 ) Motivation Approach Result Relation List of Obstacles Preprint –Formal Methods Use 53 transfer / Continued on next page culty to learn, lack of tool ffi qualifications, lack of expressiveness Di empirical evidence, lack of tool support; maintainability (refinement,step-by-step) Lack of training, lack of maturity, broken tool chains, high cost of adoption (tool integration) Education, tools, environment, engineering, certification, misconceptions, scalability, evidence of benefits, cost Lack of tool support, bad reputation, rigid development processes maturity of / quality / FM and FM tool features from subjective assessment of survey respondents; Provides Obstacles Proper abstraction, lack of Also use IVs current and future use; little focus on FM; but MDE is sometimes based on FM, MDE adoption can improve FM adoption; Similar research questions, open-ended interview questions; restricted to one domain and geography Few FM users as participants Analyse maturity of FMs from literature review; evaluate rele- vance skills selection matrix / / ect on FM adoption; ff Provides reasons why FMs are not used and suggestions for improvement Evaluation of PEOU, PU, current, and future use Top barriers: education, tools, work environment; top mitigations: education, tool integration, evidence of FM benefits; occasional non-barriers: evidence on savings, FM complexity, training SotA and challenge assessment: FMs not used widely; their data suggests a need of FM adoption; 30%the of responses from industry declare the need for FMsreason as to a adopt MDE; median of responses suggests that MbE adoption has a positive e UML dominates as the MbE language, many FMs and FM-based tools are used, B dominates as the FM; tool ranking ects, and ff Personal / negative e / Argumentation Experience Tool evaluations based on TAM (Riemenschneider et al., 2002 ), interviews, survey Interviews with 31 practitioners from the US aerospace domain Online survey on needs, positive shortcomings of MDE adoption Review of FM literature, FM projects, and FM tools according to DESMET (Kitchenham et al., 1997 ); survey among practitioners used in industry Adoption of MbE in SE Identify barriers to FM adoption and suggestions for barrier mitigation Adoption of MbE (incl. FM) in embedded SE Lack of FM adoption in railway domain ReferenceParnas ( 2010 ) Motivation FMs are not widely Mohagheghi Approachet al. ( 2012 ) Davis et al. ( 2013 ) ResultLiebel et al. ( 2016 ) RelationFerrari et al. ( 2019 ) List of Obstacles Preprint –Formal Methods Use 54 Incompleteness of theorems for abstracting from all hardware features Evidence for scalability contradicting the belief of our responses FMs can scale to realmixed systems, assurance levels are possible ort ff measurement of proof e FM adoption Large case study, ReferenceKlein et al. ( 2018 ) Motivation Approach Result Relation List of Obstacles Preprint –Formal Methods Use 55

A.8 Mapping of Studies to Challenges for RQ3

In addition to Table6 in Section 5.5, Table 10 provides the complete lists of surveyed studies mapped to the corresponding challenges.

Table 10: Mapping of studies to challenge names (with the number of studies in parentheses)

Challenge Name Supported by Scalability (7) Bowen and Hinchey (1995a), Craigen et al. (1995, 1993), Hall (1990), Lai (1996), Lai and Leung (1995), and Miller et al. (2010) Skills & Education Barroca and McDermid (1992), Bicarregui et al. (2009), Bjorner (1987), Bowen and (13) Hinchey (1995b), Craigen et al. (1995, 1993), Galloway et al. (1998), Hall (1990), Heisel (1996), Hinchey and Bowen (1996), Lai (1996), Lai and Leung (1995), and Snook and Harrison (2001) Transfer of Proofs Barroca and McDermid (1992), Bloomfield et al. (1991), Craigen et al. (1995, 1993), (8) Hall (1990), Jackson (1987), Parnas (2010), and Snook and Harrison (2001) Reusability (2) Barroca and McDermid (1992) and Bowen and Hinchey (1995b) Abstraction (12) Barroca and McDermid (1992), Bowen and Hinchey (1995b), Galloway et al. (1998), Hall (1990), Heisel (1996), Heitmeyer (1998), Jackson (1987), Knight et al. (1997), Lai (1996), Miller et al. (2010), Parnas (2010), and Snook and Harrison (2001) Tools & Automation Bicarregui et al. (2009), Bjorner (1987), Bloomfield et al. (1991), Bowen and Hinchey (16) (1995a,b), Bowen and Hinchey (2005), Craigen et al. (1995, 1993), Hall (1990), Heitmeyer (1998), Hinchey and Bowen (1996), Knight et al. (1997), Lai (1996), O’Hearn (2018), Parnas (2010), and Woodcock et al. (2009) Maintainability (3) Barroca and McDermid (1992), Knight et al. (1997), and Parnas (2010) Resources (11) Bicarregui et al. (2009), Bloomfield et al. (1991), Bowen and Hinchey (1995a,b), Craigen et al. (1995, 1993), Hall (1990), Heisel (1996), Knight et al. (1997), Lai and Leung (1995), and Woodcock et al. (2009) Process Bjorner (1987), Bloomfield et al. (1991), Bowen and Hinchey (1995a,b), Craigen et al. Compatibility (12) (1995), Heisel (1996), Heitmeyer (1998), Hinchey and Bowen (1996), Knight et al. (1997), Lai (1996), Lai and Leung (1995), and O’Hearn (2018) Practicality & Bicarregui et al. (2009), Galloway et al. (1998), Glass (2002), Lai (1996), Lai and Reputation (6) Leung (1995), and Parnas (2010)

A.9 All Answers to Open Questions

Table 11 provides all answers to the open questions of our questionnaire. We fixed a small number of spelling mistakes in the responses during the preparation of this table. Preprint –Formal Methods Use 56 Continued on next page Survey not suitable for(like me) beginners with the FMs. Abstraction, reduction, exemplifi- cation, representation, knowledge, experience, skillanalysing and the the artselection taxonomies are necessary of under or in need. General Feedback Further obstacles to FM Budgetary restrictions environment The cost-benefit relationshipnot is slanted in FM’s favour: sub- stantial improvements inware engineering soft- will not come from applicationthrough of process FM improvement, better but trainingand and better education, controlments. of Formal require- uses for methods verifying have criticalponents com- but they willthe big not problems solve in software en- gineering. always hidden obstacles as well as those listedine already. real and nominal Imag- definitions were reallyAdd first operational order definitionstay and sound. logic. high license costs goes completelyexisting against software development any process: FMwrite require specifications that before you plementing. im- I don’t believe any- more that anybodythis any time. would do O1o use Future use for F5o other purp. Future use of F4o other FMs Used FMs for That is pretty all. No No In real operative situations are P4o other purp. Other FMs No, that appear approximately complete to me. P3o used Table 11: Open answers to the questions Q 3 ,Q 6 ,Q 7 ,Q 11 ,Q 12 , and Q 13 ; R. . . respondent Further motivations Actually never cared about applying it, but research positions are in FM so well :-) great. problems, like MCDC coverage to use FMs 16 20 12 28 Completeness, accuracy 25 I just liked logic. 28 Error elimination 14 Science shall be best and 18 Test-Generation for small 1418 Scientific curiosity none none none none none cool / / / / / / / / / / 10 10 09 08 08 08 08 08 08 08 / / / / / / / / / / 3234 2017 2017 28 2017 22 2017 19 2017 21 2017 4 2017 9 2017 R Date2 2017 D3o 8 2017 Preprint –Formal Methods Use 57 Continued on next page I make a clear distinctionthe between mathematically Formal Meth- ods and the proceduremalised based Methods. For- General Feedback culty of ffi market-for- / No customer demand. (apart from notpractice having with them) needed apart fromfind I my Formalised Methods (to strict procedures) getsdone the well task enough. Brokenlemons in software quality market use requiring specialistedge knowl- Q 13 Further obstaclesuse to FM by industry is the ”use”neers of not engi- able to handle abstrac- tion, leading to poor results. Q 12 Future useother for purp. ectively ff I worktriggered’ with ’time systems. These can beelled mod- (semi-formally) e withoutplete a mathematical model. com- our customers Many think of this isadvanced than far they require ... more Q 11 Future useother FMs of Fault andanalysis failure other purp. reverse engineering The main obstacle to adoption ,Z + SPARK Ada programming language, Z, CSP Specification: TLA None None None None Perceived cost and di Q 6 Other FMs used Q 7 Many dialectsUsed of FMsPetri nets for an Automata problem and correctness finished product guarantee certain properties of software and its documentation demonstrate the highest possible integrity of systems confidence versus more traditional means Q 3 Further motivations to use FMs construction of reliable and dependable systems 25 Clearly thinking25 about the 25 Cost and quality of 25 none none no No 2525 They are the only way to B method 25 No. None None There are no specific barriers 25 The need to achieve and 25 Improved level of 25 Supporting the design and / / / / / / / / / / 10 10 10 10 10 10 10 10 10 10 / / / / / / / / / / 4547 2017 2017 40 2017 44 2017 39 2017 43 2017 38 2017 37 2017 36 2017 No. Date 35 2017 Preprint –Formal Methods Use 58 cult to use in indus- ffi Continued on next page Interesting set of questions General Feedback FM haveare strong also di potentialstry but practices.good academia Transferring results very intotry indus- practices is challenging. erent FMs, productiv- ff ignorant persons lack of time Not very user friendly,stract too and ab- tween diverse di syntaxity be- Q 13 Further obstaclesuse to FM The software industry’s cultural predisposition to informalproaches ap- (”Butway that’s we’ve always not done it!”) the Most of practitioners don’tdon’t and wantmethods. to So knowneed there for is ”hidden” formal use a ofmethods, formal strong like compilers. Q 12 Future useother for purp. None thatthink of now I can rapidcalculation prototyping tools SPARK-ada Q 11 Future useother FMs of I don’tthis understand question. to make exam ques- tions :-) Reduce costs. other purp. Drive construction of good code. Prop- erties ofproof are a also prop- erties good of good code: minimallogical progression of steps, steps, ... Writing code toable be (even prove- ifexplicitly never yields proven) better code. Unifying Theories of Programming mostly as a researcher and teacher, something for which the above does not allow me to tick a box Q 6 Other FMs used Q 7 Currently Used FMsintegrating a for degree of formal methods into software requirements and design modelling method reliability foundation of principled software engineering systems with much less errors development. things, because of enforcement of a systematic approach mathematics and Formal methods based Software V & V Q 3 Further motivations to use FMs are semantic inconsistencies in code. Formal methods acknowledge the necessary semantics and help prevent defects. 2323 Elegance and precision Maintenance cost23 and I VDM, consider Z, it Event-B the 23 Higher level of trust pseudo-occam, 04 Belief it helps construct 23 To reduce23 costs in system simplifies sometimes 23 The beauty23 of research program 26 Combining traditional 26 I love logic and maths. my experience is 06 no 25 ”Bugs”, software defects, / / / / / / / / / / / / / 11 11 11 11 11 11 11 11 11 10 10 11 10 / / / / / / / / / / / / / 6465 2017 66 2017 2017 67 2017 54 2017 5859 2017 2017 6061 2017 2017 49 2017 50 2017 56 2017 No. Date 48 2017 Preprint –Formal Methods Use 59 Continued on next page General Feedback Well chosen questionsnot which leave me guessing. do Relevant to future FM research and practice. Q 13 Further obstaclesuse to FM Engineers will not be able toply ap- formal methods. Aligning academicwith industrial research need.tool developers to invest in Getting FM. Lack of tools and widemethods range in of existence. Q 12 Future useother for purp. Inteaching lectures FMs. and Q 11 Future useother FMs of UMLmal withdependent for- tailoring. application- other purp. Q 6 Other FMs used Q 7 The TRIO Used FMsspecification for language erentiation ff *whole* process automation engineering should have the same mathematical underpinning as regular engineering (I majored in engineering) companies using ”traditional” means fault-free software most systems best practice that I have solid evidence perform correctly under all scenarios. di high-quality software Q 3 Further motivations to use FMs reliability 29 productivity of30 the Soundness checking06 of 01 I believe that software 26 Thesis - tool verification 30 Niche market, outperform 31 verify protocols 23 Achieving correct, 25 no 06 None -06 impractical for Identified as06 industrial Desire to develop systems 19 business and02 research 16 I don’t know I haven’t I haven’t I don’t know Academia I don’t know 23 Teaching how23 to build 23 Increasing engineering / / / / / / / / / / / / / / / / / / 11 11 12 08 11 07 07 11 11 12 12 12 12 01 03 11 11 11 / / / / / / / / / / / / / / / / / / 7980 2017 83 2017 2017 97 2018 77 2017 94 2018 96 2018 7376 2017 2017 8485 2017 87 2017 2017 9091 2017 93 2018 2018 7172 2017 2017 No. Date 68 2017 Preprint –Formal Methods Use 60 set by the ff Continued on next page Formal Methods are perceivedbe to expensive to use,be, and they but can thisbenefits; can there does be not seem o much to be work published oncould this be used that to persuade thetomer Cus- and the budget holder... Looks really interesting and Ido will some research from mylearn side a little to bit more General Feedback I need to provide assuranceis that understandable to thoseneed to who be assured Current corporationsconsider do valuable FMsproduct not development, during as asequence, con- Engineerseven try will to use not it Q 13 Further obstaclesuse to FM future technological design methodolo- gies Q 12 Future useother for purp. Protective Analysis (PAN) Q 11 Future useother FMs of None specification of hy- brid communicating systems neering other purp. reviewing consistent set of requirements expressed using Z, OBJ, etc Independent reviewer of FM specifications FermaT Program Transformations Z, VDM, AXES Requirements engi- Q 6 Other FMs used Q 7 Used FMs for Product / ered FMs in a tender ff response to address an assurance requirement Trevor Cockram, and John McDermid. Experiences with the application of discrete formal methods to the development of engine control ... boundaries of knowledge. Making programs which are correct by construction. and technology requirements use FMs o obtain rigorous verification. large enterprises, where formal modelling can reveal gaps in the clients understanding of their own ecosystems. FM is a consistent a logical way to describe a System reliability Q 3 Further motivations to use FMs 02 See Andy Galloway, 02 Pushing back the 03 critical infrastructures 03 No further03 motivations to A candidate supplier 03 Formal is the only way to 04 Strategic facilitation in 04 As per my interpretation, 02 no none no no 01 To increase01 quality and needed ------/ / / / / / / / / / / 08 08 08 08 08 08 08 08 08 08 08 / / / / / / / / / / / 105 2018 107 2018 108 2018 109 2018 111 2018 113 2018 114 2018 115 2018 102 2018 No. Date 98100 2018 2018 Preprint –Formal Methods Use 61 Continued on next page Closed questionnaire is just a start General Feedback Awareness of commercialtential po- See answer above.publicise We some have successes to inder or- to lift theabout veil FM. of ignorance Q 13 Further obstaclesuse to FM - ffi analysis of Systems Research onvative inno- formalisms, foundationalsearch re- I haveculty great in di persuading anyone tointerest take an inrepresented FM. I UK on a the committee European cerned with con- related safety gas systems. The ignorance and lack ofFM interest in was- the startling Germanresentatives rep- particularly were hostile. So I only use FMspecial in casesspecific with customers. Thereprejudice is out there against FM. much Q 12 Future useother for purp. No Safety and security Q 11 Future useother FMs of Security and safety analysis Requirement engi- neering other purp. TRIO , timed Automata, B Q 6 Other FMs used Q 7 Used FMs for reliability safety-related gas sensors and other measurement devices. I was invited to become an assessor of these systems by a certifying body. I was shocked by what I discovered when I scrutinised the methods and approaches used by the embedded software developers whose work was being certified. I resolved to improve my own knowledge and practice so that I could deal authoritatively with some of the unsatisfactory situations that arose during various certifications. world produces software Q 3 Further motivations to use FMs 05 Improve general system 06 I have developed various 05 To change the way the / / / 08 08 08 / / / 118 2018 122 2018 No. Date 117 2018 Preprint –Formal Methods Use 62 Continued on next page General Feedback Q 13 Further obstaclesuse to FM Most tools havetions many and limita- dodomains not by themselves support wide Complexity SystemSoftware design. analysis. Q 12 Future useother for purp. Q 11 Future useother FMs of System-level analy- sis.sign. Software de- other purp. national standards analyse / contains formal verification approaches. Formal methods are mathematically grounded approaches to design artefacts, e.g. the B method is a formal method to devise software components. I use the B method in my current job. Q 6 Other FMs used Q 7 Used FMs for FM tool backbone of the science in ”Computing Science”. phenomena in social sciences Q 3 Further motivations to use FMs software 08 The lists essentially 07 Protocol analysis Protocol analysis 06 We provide customers an 07 Formal methods are the 07 To model complex 07 No 07 Improving confidence in 0707 Quality requirements Scientific curiosity.07 It is a hard problem. Formalising inter- / / / / / / / / / / 08 08 08 08 08 08 08 08 08 08 / / / / / / / / / / 141 2018 138 2018 No. Date 123 2018 134 2018 136 2018 137 2018 133 2018 126129 2018 2018 131 2018 Preprint –Formal Methods Use 63 Continued on next page Thank you very much forvey. this sur- Itimportant. is It very handles constructiveissues most and encountered of by the anytioner and practi- user of formal methods. I only encountered FMs asin a my topic studies once.play any They role do inthe not my moment. profession at Sorry for not disclosing our Future Extent of Formal Methods Use I think FMstudy is program misplaced SE,being in mandatory. especially It our is it used very in rarely industry (for athere reason) are and so manythings more to important learn inthe order to Software become Engineers ofrow tomor- (eg. Entrepreneurship). General Feedback orts ff None. I am convincedFMs that can only help us from shittybuggy and software. Hypes (eg AI) thatsion makers direct in deci- other directions Time constraints, financial con- straints, refactoring e Q 13 Further obstaclesuse to FM Sorry forclosing not our dis- Extent Future ofMethods Use Formal I am a PhDusing FM student for formal verification. Ion plan continuing to do this... Q 12 Future useother for purp. Q 11 Future useother FMs of other purp. Machines, Petri Nets, LTL, CTL, SAT and SMT solvers Q 6 Other FMs used Q 7 Used FMs for Research As testing tools. Industry mind set / ”real” problem model of our startup algorithms we were reasoning about. preciseness that the formal methods discipline can provide via its specifications and verifications techniques Q 3 Further motivations to use FMs 13 They are13 fun and solve a Abstract state 09 Main part of business 08 gain confidence in the 09 The robustness and 09 09 Quality 10 10 11 08 Interest 08 Curiosity None / / / / / / / / / / / / 08 08 08 08 08 08 08 08 08 08 08 08 / / / / / / / / / / / / 167 2018 168 2018 154 2018 147 2018 150 2018 151 2018 153 2018 157 2018 160 2018 161 2018 146 2018 No. Date 145 2018 Preprint –Formal Methods Use 64 er much more ff Continued on next page less frequent than before), it / General Feedback Nice questions overall.with In those more relativewould answers be nice to have myanswers previous (e.g., oncheck the what I answered same exactly). page (to very strange you putSection Petri P2 nets as to nique, a and description process tech- calculition to P3 Sec- as aIn fact, reasoning Petri technique. nets o reasoning techniques than process calculi do I’ve missed questions towards why people do not use formalin methods a given domain.obstacles that What are theyfacing? the are currently descrip- / Q 13 Further obstaclesuse to FM Lack of appreciation and under- standing from other (CS and en- gineering) communities Software Projects oftenlarge have a existing codeassumptions base. that The holdcode in base that oughly need analyzed to first.over, I be More- ammapping thor- missing from automatic sourceformal code logic models to tions. Changed focus of interest atwork my The way academiacates FMs communi- to engineerstice. in prac- Q 12 Future useother for purp. online assur- / Guarantee education / Q 11 Future useother FMs of Requirementsification fal- (through simulation),namic dy- anceand reasoning) (monitoring formal andformal methods semi- Rely nization bugs for circuit design other purp. Guarantee, / CSP Finding synchro- Q 6 Other FMs used Q 7 Used FMs for VDM, Rely Separation Logic code generation based on allegories only method able Guarantee concepts / = existence since 1967. they have never made it into the mainstream. guarantees unobtainable by other methods to solve the problem Q 3 Further motivations to use FMs VDM and Rely systems right or doing it not at all. 24 FMs have been in 04 business in FM mcrl2 13 Strong (potential) 19 FM 26 Z Karnaugh Maps ? 14 Petri Nets 17 Teaching FM Combinations of 13 I have developed both 1315 model-based testing 15 Interest in16 reliable critical I believe in doing things 16 Interest / / / / / / / / / / / / / 09 09 08 08 08 08 08 08 08 08 08 08 08 / / / / / / / / / / / / / 197 2018 195 2018 No. Date 169 2018 187 2018 193 2018 176 2018 186 2018 171 2018 174 2018 178 2018 179 2018 182 2018 183 2018 Preprint –Formal Methods Use 65 ok General Feedback ective. ff Move to management level customers ability toof them make as use well intion the and specifica- development process of critical systems 1. The tools don’t evenThe work. 2. tools aren’t3. cost We would e have to trainprogrammers, all the the QA team, the systems engineering team,(importantly) and the companyecutives. ex- Q 13 Further obstaclesuse to FM support testgeneration case validation and Q 12 Future useother for purp. ratherare semi-FMs, moreand practical easy to apply UML Static analyzer e.g. PVS Studio Q 11 Future useother FMs of To clearlyconcepts define other purp. Modelling languages Q 6 Other FMs used Q 7 Used FMs for get my results formal methods standards and their application system security specification and verification engineering and automation Q 3 Further motivations to use FMs 05 Only reasonable05 way to I never16 felt tempted to use 04 Regulatory authority Functional safety 20 22 increase confidence20 in 24 need for rigorous 24 State of the art in control 26 / / / / / / / / / / 10 10 12 02 03 03 04 09 09 09 / / / / / / / / / / 207 2018 208 2018 213215 2018 2019 217218 2019 2019 221 2020 No. Date 198 2018 199 2018 204 2018 Preprint –Formal Methods Use 66

A.10 Copy of the Advertisement Flyer

A.11 Screenshot of the Twitter Poll

A.12 Copy of the Questionnaire

The PDF export of our on-line questionnaire on the next page corresponds to the questionnaire we used for the sample taken until 31.3.2019 with N = 220. Since 26.5.2019, an extended version of the questionnaire had been available online at

https://goo.gl/forms/FnKNQtTmI3A6BekM2.

We crafted this questionnaire using Google Forms (Google, 2018). We use numbered identifiers for each question category, demographic questions are prefixed with a “D”, questions about past FM use (UFMp) with a “P”, about future or intended FM use (UFMi) with an “F”, questions about obstacles with an “O”. Open questions are suffixed by an “o”. Preprint –Formal Methods Use 67

Use of Formal Methods Dear participant, thank you for your interest in this 8-10min survey on the use of formal methods (FMs).

This survey does NOT require previous knowledge in FMs or in their actual application in a practical context. However, this survey targets persons with an educational background in engineering and sciences OR with a practical engineering background in a reasonably critical systems or product domain.

By "FMs", we refer to explicit mathematical models and sound formal logical reasoning about critical properties---such as reliability, safety, availability, data privacy or, more generally, dependability and security---of electrical, electronic, and programmable electronic or software systems in critical application domains. FMs include, for example, , theorem proving, model checking, formal contracts, SMT solving, process algebras.

By "use of FMs", we refer to the application of FMs to engineered systems in the context of education, research, and, particularly, the field of industrial practice and by using formal languages together with manual or automated tool-based techniques.

This survey is anonymous. However, you can provide your email address if you are interested in receiving our final results afterwards.

The underlying study is conducted by Mario Gleirscher at University of York and Diego Marmsoler at Technical University of Munich.

* Erforderlich

Demographic Questions Preprint –Formal Methods Use 68

1. D1. In which application domain(s) in industry or academia (if any) have you mainly used FMs? * Wählen Sie alle zutreffenden Antworten aus.

I have not used FMs in any academic or industrial domain. Critical infrastructures (e.g. telecom, energy, road/air/naval/rail traffic, smart buildings or cities) Process automation (e.g. chemical process plants, power plants, warehouse logistics, production lines) Industrial machinery (e.g. stationary robotics, production machines) Transportation (e.g. automotive, utility vehicles, naval, aeronautics, train systems, freight logistics, cable cars, mobile robotics, drones/UAVs) Device industry (e.g. medical, health-care, semi-conductors, consumer electronics) Military systems not in the above domains (e.g. for command, control, surveillance) Business information (e.g. database applications, banking, finance, ERP, PLM, web services, cloud apps) Platforms (e.g. operating systems, middle-ware, firmware, drivers, database systems, libraries) Supportive (e.g. CASE tools, checking or verification tools, CAD/CAM systems)

Sonstiges:

2. D2. How many years of FM experience (including the study of FMs) have you gained? * Markieren Sie nur ein Oval.

I do not have any knowledge of or experience in FMs. less than 3 years 3 to 7 years 8 to 15 years 16 to 25 years more than 25 years Preprint –Formal Methods Use 69

3. D3. Which have been your motivations (if any) to use FMs? * Markieren Sie nur ein Oval pro Zeile.

no moderate strong motivation (or motivation motivation requirement) Regulatory authorities Customers / scientific community Employer / research collaborators Superior(s) / principal investigator(s) Study or research program Own (private) interest On behalf of an FM tool or service provider

4. D3o. Which have been your further motivations to use FMs (if any)?

Past and Current Use of Formal Methods The following questions aim at your EXPERIENCE with the use of FMs in your PAST and CURRENT activities and projects.

NOTE: If you are not able to say anything about past or current use, please, choose the corresponding "not yet used...", "no experience...", or "not at all" options and proceed to the next page!

5. P1. In which role(s) have you used FMs? * Wählen Sie alle zutreffenden Antworten aus.

I have not used FMs in any specific role. Engineering practitioner in industry (e.g. programmer) Consulting or managing practitioner in industry (e.g. architect, requirements or systems engineer) External consultant (e.g. external requirements or systems engineer) Researcher in industry Researcher in academia Lecturer, teacher, trainer, or coach Bachelor, master, or PhD student Stakeholder of an FM tool or service provider

Sonstiges:

Experience in Formal Methods Use Preprint –Formal Methods Use 70

6. P2. Describe your level of experience with each of the following classes of formal description techniques? * Markieren Sie nur ein Oval pro Zeile.

applied no applied in applied studied in several experience lab, once in (university) times in or no experiments, engineering course engineering knowledge case studies practice practice Predicative, relational, or algebraic specification Modal and temporal logic specification Process models (e.g. Petri nets, Mealy machines, LTS, Markov processes) Dynamical systems (i.e. differential equations) Preprint –Formal Methods Use 71

7. P3. Describe your level of experience with each of the following classes of formal reasoning techniques? * Markieren Sie nur ein Oval pro Zeile.

applied no applied in applied studied in several experience lab, once in (university) times in or no experiments, engineering lectures engineering knowledge case studies practice practice Abstract interpretation Assertion checking (e.g. for pre/post specification, contracts) Process calculi (e.g. CSP, CCS, pi, mu, hybrid) Model checking, SMV (of e.g. temporal or probabilistic properties) Constraint (SAT, SMT) solving (e.g. for static code analysis), optimisation techniques Generic (first- order, HOL) theorem proving (using e.g. term rewriting, functional programming) Computational engineering, simulation (using e.g. differential calculus, numerical methods) Symbolic execution (e.g. scenario testing, model animation) Consistency checking (e.g. syntax or bug pattern checking) Preprint –Formal Methods Use 72

8. P3o. List other FMs you have experience with (if any):

Purposes of Formal Methods Use

9. P4. I have mainly used FMs for ... * Markieren Sie nur ein Oval pro Zeile.

... not ...... in 2 to 5 ... in more than 5 at all. once. separate tasks. separate tasks. ... clarification (i.e. explicit description for analyzing a problem) ... specification (e.g. contracts, documentation and communication of requirements and design) ... inspection (i.e. error detection, e.g. non- conformance checking, model-based testing) ... synthesis (e.g. transformation, compilation) ... assurance (e.g. error removal, property verification, refinement or equivalence proofs, argumentation)

10. P4o. I have used FMs for other purposes (if any):

Intended Future Use of Formal Methods The following questions aim at your INTENT to use FMs in your FUTURE activities and projects.

NOTE: Your intend to use FMs will also be interpreted as the "mere possibility of FM usage in the corresponding ways" according to and based on your responses.

Future Extent of Formal Methods Use Preprint –Formal Methods Use 73

11. F1. In which application domain(s) in industry or academia (if any) would (or do) you intend or recommend to use FMs? * Wählen Sie alle zutreffenden Antworten aus.

I would not or do not intend (or recommend) to use FMs in any academic or industrial domain. Critical infrastructures (e.g. telecom, energy, road/air/naval/rail traffic, smart buildings or cities) Process automation (e.g. chemical process plants, power plants, warehouse logistics, production lines) Industrial machinery (e.g. stationary robotics, production machines) Transportation (e.g. automotive, utility vehicles, naval, aeronautics, train systems, freight logistics, cable cars, mobile robotics, drones/UAVs) Device industry (e.g. medical, health-care, semi-conductors, consumer electronics) Military systems not in the above domains (e.g. for command, control, surveillance) Business information (e.g. database applications, banking, finance, ERP, PLM, web services, cloud apps) Platforms (e.g. operating systems, middle-ware, firmware, drivers, database systems, libraries) Supportive (e.g. CASE tools, checking or verification tools, CAD/CAM systems)

Sonstiges:

12. F2. In which role(s) would (or do) you intend to use FMs? * Wählen Sie alle zutreffenden Antworten aus.

I do not or would not intend to use FMs in any specific role. Engineering practitioner in industry (e.g. programmer, test or verification engineer) Consulting or managing practitioner in industry (e.g. architect, requirements or systems engineer) External consultant (e.g. external requirements or systems engineer) Researcher in industry Researcher in academia Lecturer, teacher, trainer, or coach Bachelor, master, or PhD student Stakeholder of an FM tool or service provider

Sonstiges: Preprint –Formal Methods Use 74

13. F3. I (would) intend to use ... * Markieren Sie nur ein Oval pro Zeile.

... less ... as ... no ... more often than often as I don't more or often than in the in the know. not at all. in the past. past. past. ... predicative, relational, or algebraic specification ... modal or temporal logic specification ... process models (e.g. Petri nets, Mealy machines, LTS, Markov processes) ... dynamical system models (i.e. differential equations) Preprint –Formal Methods Use 75

14. F4. I (would) intend to use ... * Markieren Sie nur ein Oval pro Zeile.

... less ... as ... no ... more often than often as I don't more or often than in the in the know. not at all. in the past. past. past. ... abstract interpretation ... assertion checking (e.g. for pre/post specification, contracts) ... process calculi (e.g. CSP, CCS, pi, mu, hybrid) ... model checking, SMV (of e.g. temporal or probabilistic properties) ... constraint (SAT, SMT) solving (e.g. for static code analysis), optimisation techniques ... generic (first-order, HOL) theorem proving (using e.g. term rewriting, functional programming) ... simulation (i.e. computational engineering using e.g. differential calculus, numerical methods) ... symbolic execution (e.g. scenario testing, model animation) ... consistency checking (e.g. syntax or bug pattern checking)

15. F4o. I (would) intend to use other FMs, semi-FMs (i.e. without formal semantics and proof system), or highly systematic procedure (if any, please, provide some details):

Future Purposes of Formal Methods Use Preprint –Formal Methods Use 76

16. F5. I (would) intend to use FMs for ... * Markieren Sie nur ein Oval pro Zeile.

... less ... as ... no ... more often than often as I don't more or often than in the in the know. not at all. in the past. past. past. ... clarification (i.e. explicit description for analyzing a problem) ... specification (e.g. contracts, documentation and communication of requirements and design) ... inspection (i.e. error detection, e.g. non-conformance checking, model- based testing) ... synthesis (e.g. transformation, compilation) ... assurance (e.g. error removal, property verification, refinement or equivalence proofs, argumentation)

17. F5o. I (would) intend to use FMs for other purposes (if any):

Potential Obstacles to the Intended Use of Formal Methods Preprint –Formal Methods Use 77

18. O1. For any potential use of FMs in my future activities and projects, I consider ... * Markieren Sie nur ein Oval pro Zeile.

... as a ... not as ... as a tough I don't moderate an issue. challenge. know. challenge. ... scalability (e.g. towards large or heterogeneous systems) ... proper (automated) abstractions from irrelevant details ... maintainability of verification results (e.g. stable proofs) ... reusability of verification results (e.g. parametric proofs) ... transfer of verification results from models to reality ... automation or tool support (incl. notations, DSLs, IDEs) ... skills and education (e.g. methods known and ready to use)

19. O1o. Which further obstacles (if any) would potentially hinder you to use FMs as intended?

Thank you for your participation!

For SurveyCircle users (www.surveycircle.com): The Survey Code is: XXXX-XXXX-XXXX- XXXX

20. Please, feel free to provide us any feedback on the questionnaire or on its topic: Preprint –Formal Methods Use 78

21. If you are interested in our results you can provide us your email address below:

Bereitgestellt von