Static Code Analysis

Thesis no: MSSE-2016-09

Static Code Analysis A Systematic Literature Review and an Industrial Survey Static Code Analysis Static Code AnalysisA Systematic Literature Review and an Industrial Survey Static Code Analysis Static Code Analysis StaticIslam Elkhalifa Code & Analysis Bilal Ilyas A Systematic Literature Review and an Static Code Analysis Industrial Survey

StaticA Systematic Code Analysis LiteratureA Systematic Review Literature and an ReviewIndustrial and an Survey &Static CodeIndustrial Analysis SurveyA Systematic Literature Review and an Industrial Survey Bilal Ilyas

Faculty of Computing Blekinge Institute of Technology SE – 371 79 Karlskrona, Sweden i

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Software engineering. The thesis is equivalent to 20 weeks of full time studies.

Contact Information: Authors: Islam Elkhalifa E-mail: [email protected]

Bilal Ilyas E-mail: [email protected]

University advisor: Kai Petersen Faculty of Computing

Faculty of Computing Internet: www.bth.se Blekinge Institute of Technology Phone: +46 455 38 50 00 SE – 371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

ABSTRACT

Context: Static code analysis is a software verification technique that refers to the process of examining code without executing it in order to capture the defects in the code early, avoiding later costly fixations. The lack of realistic empirical evaluations in software engineering has been identified as a major issue limiting the ability of research to impact industry and in turn preventing feedback from industry that can improve, guide and orient research. Studies emphasized rigor and relevance as important criteria to assess the quality and realism of research, the rigor defines how adequately a study has been carried out and reported, while relevance defines the potential impact of the study on industry. Despite the importance of static code analysis techniques and its existence for more than three decades, the number of empirical evaluation (surveys, systematic literature reviews) in this field are less in number and do not take into account the rigor and relevance into consideration.

Objectives: The aim of this study is to contribute toward bridging the gap between static code analysis research and industry by improving the ability of research to impact industry and vice versa. This study has two main objectives. First, developing guidelines for researchers which will explore static code analysis research to identify the current status, shortcomings, rigor and industrial relevance of research, reported benefits/limitations of static code analysis techniques, and finally, give recommendations to researchers to help improve the future research to make it more industry oriented. Second, developing guidelines for practitioners, which will investigate the adoption of static code analysis techniques in industry and identify benefits/limitations of static code analysis techniques as perceived by industrial professionals. Then cross-analyze the findings of the survey and SLR to draw the study conclusions, and finally, give recommendations to professionals to help them decide which techniques to adopt.

Methods: A sequential exploratory strategy characterized by the collection and analysis of qualitative data (Systematic literature review) followed by the collection and analysis of quantitative data (survey), will be used in this research work. To achieve the first study objective, a thorough systematic literature review has been conducted using Kitchenham guidelines. To achieve the second study objective a questionnaire-based online survey was conducted targeting practitioners working in software the field to collect their responses about the usage of different static code analysis techniques, as well as their benefits and limitations as perceived by the industrial professionals. The quantitative data obtained was subjected to statistical analysis for the further interpretation of the data and draw results based on it.

Results: In static code analysis research: 1) static analysis tools and inspections received significantly more attention than other techniques. 2) The benefits and limitations of static code analysis techniques were extracted and seven recurrent variables were used to report them. 3) Static code analysis research significantly lacks rigor and relevance and the reason behind it has been identified 4) recommendations were developed outlining how to improve static code analysis research and make it more industrial oriented. In static code analysis industry: 1) static analysis tools are widely used followed by informal review, while inspections and walkthroughs are rarely used 2) benefits/limitation as perceived by industrial professionals have been identified along with the influential factors.

Conclusions: The SLR concluded that 1) Techniques which have a formal, well-defined process and process elements receive more research attention and scrutiny because studies will evaluate its process and process elements, however, this doesn’t necessarily mean that technique is better 2) experiments are widely used as a research method in static code analysis research but the outcome variables in the majority of the experiments are inconsistent 3) The use of experiments with student subjects in academic context contributed significantly to degraded relevance, while the inadequate reporting of validity threats and their mitigation strategies contributed significantly to poor rigor of research 4) the benefits/limitations identified by the SLR could not complement the survey findings on benefits/limitation because the rigor and relevance of most of the studies reporting them is weak. The survey concluded that 1) the adoption of static code analysis techniques in the industry is influenced by the software life cycle model, while software product type and company size doesn’t have an influence. 2) The amount of attention a static code analysis technique has received in research doesn’t necessarily influence its adoption in industry indicating a gap between research and industry 3) company size, product type, and life cycle model do influence professionals perception on benefits/limitations.

Keywords: Static code analysis, systematic literature review, empirical evaluation, industrial survey.

CONTENTS

STATIC CODE ANALYSIS: A SYSTEMATIC LITERATURE REVIEW AND AN INDUSTRIAL SURVEY...... I ABSTRACT ...... 2 CONTENTS ...... 3 1 INTRODUCTION ...... 9 2 STATIC CODE ANALYSIS ...... 11 3 RELATED WORK ...... 12 4 RESEARCH METHODOLOGY ...... 14 4.1 RESEARCH MOTIVATION ...... 14 4.2 AIMS AND OBJECTIVES ...... 14 4.3 RESEARCH QUESTIONS ...... 14 4.4 RESEARCH DESIGN ...... 16 4.4.1 Systematic literature review ...... 16 4.4.2 Survey ...... 17 5 SYSTEMATIC LITERATURE REVIEW ...... 19 5.1 THEORY AND METHODOLOGY ...... 19 5.1.1 Objective ...... 19 5.1.2 Inclusion/exclusion criteria ...... 19 5.1.2.1 Inclusion criteria ...... 19 5.1.2.2 Exclusion criteria ...... 19 5.1.3 Search strategy ...... 20 5.1.4 Studies inclusion/exclusion process ...... 24 5.1.5 Kappa analysis...... 26 5.1.6 Quality assessment criteria ...... 29 5.1.7 Data extraction ...... 35 5.1.8 Data synthesis strategy ...... 35 5.1.9 Validity threats ...... 36 5.1.9.1 Bias ...... 36 5.1.9.2 Internal validity ...... 36 5.1.9.3 External validity ...... 37 5.2 RESULTS ...... 37 5.2.1 State of research in static code analysis techniques ...... 37 5.2.1.1 Static code analysis techniques which received most attention in research ...... 38 5.2.1.2 Type of research practices ...... 38 5.2.1.3 Identifying variables used to investigate benefits and limitation of static code analysis technqiues ...... 39 5.2.2 State of rigor and relevance in static code analysis research ...... 41 5.2.2.1 Influence of time on rigor and relevance ...... 43 5.2.3 Benefits and limitations of static analysis techniques reported by researchers ...... 44 5.2.3.1 Inspection ...... 45 5.2.3.2 Static analysis tools ...... 48 5.2.3.3 Informal reviews ...... 53 5.2.3.4 Walkthroughs ...... 53 5.2.4 Benefits and limitations related with different variations in inspection ...... 54 5.2.4.1 Factors that influence inspection process ...... 54 5.2.4.2 Changes to inspection structure...... 55 5.2.4.3 Support to inspection structure ...... 57 5.2.4.4 Support for re-inspection ...... 57 6 SURVEY ...... 58 6.1 THEORY AND METHODOLOGY ...... 58 6.1.1 Objective ...... 58 6.1.2 Data collection method ...... 58 6.1.3 Sample and population ...... 58 6.1.4 Questionnaire development ...... 59 6.1.5 Questionnaire distribution ...... 59

6.1.6 Validity threats ...... 59 6.1.6.1 Internal Validity ...... 60 6.1.6.2 External Validity ...... 60 6.1.6.3 Construct Validity ...... 61 6.1.6.4 Conclusions Validity ...... 61 6.2 RESULTS ...... 61 6.2.1 Demographics ...... 61 6.2.1.1 Information about the respondents ...... 61 6.2.1.2 Information about the organization ...... 63 6.2.2 Static code analysis techniques in practice ...... 65 6.2.2.1 Static code analysis techniques in practice – Global view ...... 65 6.2.2.2 Static code analysis techniques in practice – Company size view ...... 66 6.2.2.3 Static code analysis technologies in practice - Product type view ...... 69 6.2.2.4 Static code analysis techniques in practice – Software life cycle model view ...... 71 6.2.3 Benefits and limitations of different static analysis techniques from industry professionals ...... 75 6.2.3.1 Effectiveness ...... 78 6.2.3.2 Number of false positives...... 88 6.2.3.3 Fault content ...... 99 6.2.3.4 Cost efficiency ...... 110 6.2.3.5 Ease of use ...... 120 6.2.3.6 Internal code quality ...... 131 6.2.3.7 Product quality ...... 142 7 DISCUSSION AND GUIDELINES ...... 154 7.1 GUIDELINES FOR RESEARCHERS ...... 154 7.1.1 RQ1 – State of static code analysis research ...... 154 7.1.2 RQ2 – Rigor and relevance of static code analysis research ...... 155 7.1.3 RQ3 & RQ4 – Benefits and limitations reported by researchers ...... 156 7.1.3.1 Inspection ...... 157 7.1.3.2 Informal reviews ...... 158 7.1.3.3 Walkthroughs ...... 158 7.1.3.4 Static analysis tools ...... 158 7.1.4 Recommendations for researchers ...... 158 7.2 GUIDELINES FOR PRACTITIONERS ...... 159 7.2.1 RQ5 – Techniques frequently used in industry ...... 160 7.2.2 RQ5.1 – Influential factors on the usage of static code analysis ...... 162 7.2.3 RQ5.2 – Attention in research vs. usage in industry ...... 162 7.2.4 RQ6 – Benefits and limitations as perceived by industry professionals ...... 162 7.2.4.1 Conclusions on effectiveness ...... 162 7.2.4.2 Conclusions on number of false positives ...... 165 7.2.4.3 Conclusions on fault content ...... 168 7.2.4.4 Conclusions on cost efficiency ...... 170 7.2.4.5 Conclusions on ease of use ...... 173 7.2.4.6 Conclusions on internal code quality ...... 176 7.2.4.7 Conclusions on product quality ...... 178 8 CONCLUSIONS ...... 181 9 FUTURE WORK ...... 183 10 REFERENCES...... 184 11 APPENDIX ...... 191

Figure 4.1 Overview of the research design ...... 18 Figure 5.1 Five steps search process proposed by Zhang et al [98] ...... 20 Figure 5.2 Study inclusion/exclusion process ...... 25 Figure 5.3 Number of studies evaluating different static code analysis techniques...... 38 Figure 5.4 Bubble chart showing the number of studies based on their rigor and relevance score ...... 41 Figure 5.5 Bubble chart showing the number of studies based on their rigor and relevance score ...... 42 Figure 5.6 Average rigor and relevance over time ...... 44 Figure 6.1 Overview of the survey participants ...... 62 Figure 6.2 Distribution of organizational size ...... 63 Figure 6.3 Distribution of product types among the companies of survey respondents ...... 63 Figure 6.4 Distribution of industries among responses ...... 64 Figure 6.5 Distribution of software life cycle models among responses ...... 64 Figure 6.6 Static code analysis techniques in practice - Global view ...... 65 Figure 6.7 Static code analysis techniques abandoned by practitioners ...... 66 Figure 6.8 Usage of static code analysis techniques in companies with less than 50 employees ...... 66 Figure 6.9 Usage of static code analysis techniques in companies with 50 – 249 employees ...... 67 Figure 6.10 Usage of static analysis techniques in companies with 250-4,449 employees ...... 68 Figure 6.11 Usage of static analysis techniques in companies with more than 4,500 employees ...... 68 Figure 6.12 Usage of static code analysis techniques in companies producing data-dominant software ...... 69 Figure 6.13 Usage of static analysis techniques in companies producing control-domain software .... 70 Figure 6.14 Usage of static code analysis techniques in companies producing system software ...... 70 Figure 6.15 Usage of Static code analysis techniques in companies producing computation-dominant software ...... 71 Figure 6.16 Usage of Static code analysis techniques in companies using the waterfall model ...... 72 Figure 6.17 Usage of Static code analysis techniques in companies using the incremental model ...... 72 Figure 6.18 Usage of static code analysis techniques in companies using the incremental model ...... 73 Figure 6.19 Usage of static code analysis techniques in companies using the agile model ...... 74 Figure 6.20 Usage of static analysis techniques in companies using hybrid models dominated by agile practices ...... 74 Figure 6.21 Usage of static analysis techniques in companies using hybrid models dominated by plan- driven practices ...... 75 Figure 6.22 Effectiveness – Global view ...... 80 Figure 6.23 Effectiveness Likert chart – Company size view ...... 82 Figure 6.24 Effectiveness Likert chart - Product type view ...... 84 Figure 6.25 Effectiveness Likert chart - Software like cycle model view ...... 86 Figure 6.26 False positives generated by different static analysis techniques – Global view ...... 90 Figure 6.27 Number of false positives Likert chart – Company size view ...... 92 Figure 6.28 Number of false positives Likert chart - Product type view ...... 94 Figure 6.29 Number of false positives Likert chart - Software life cycle model view ...... 97 Figure 6.30 Fault content – Global view ...... 101 Figure 6.31 Fault content Likert chart - Company size view ...... 102 Figure 6.32 Fault content Likert chart - Product type view ...... 105 Figure 6.33 Fault content Likert chart - Software life cycle model view ...... 107 Figure 6.34 Cost efficiency – Global view ...... 112 Figure 6.35 Cost efficiency Likert chart - Company size view ...... 113 Figure 6.36 Cost efficiency Likert chart - Product type view ...... 115 Figure 6.37 Cost efficiency Likert chart - Software life cycle model view ...... 118 Figure 6.38 Ease of use of static analysis techniques – Global view ...... 122 Figure 6.39 Ease of use Likert chart - Company size view ...... 124 Figure 6.40 Ease of use Likert chart - Product type view ...... 126 Figure 6.41 Ease of use Likert chart - Software life cycle model view ...... 129 Figure 6.42 Internal code quality – Global view...... 133 Figure 6.43 Internal code quality Likert chart - Company size view ...... 135 Figure 6.44 Internal code quality Likert chart - Product type view ...... 137

Figure 6.45 Internal code quality Likert chart - Software life cycle model view ...... 140 Figure 6.46 Product quality – Global view ...... 144 Figure 6.47 Product quality Likert chart - Company size view ...... 146 Figure 6.48 Product quality Likert chart - Product type view ...... 148 Figure 6.49 Product quality Likert chart - Software life cycle model view ...... 151

Table 5.1 Publication venues with their respective hosting libraries ...... 21 Table 5.2 Search strings and the number of retrieved studies ...... 23 Table 5.3 Kappa agreement levels ...... 26 Table 5.4 Kappa criteria for the research methods used in studies ...... 26 Table 5.5 Kappa criteria for the static code analysis techniques evaluated in studies ...... 27 Table 5.6 Scoring rubrics for evaluating rigor [74] ...... 29 Table 5.7 Rigor scoring of the studies retrieved from the SLR ...... 30 Table 5.8 Scoring rubrics for evaluating relevance [74] ...... 32 Table 5.9 Relevance scoring of the studies retrieved from the SLR ...... 33 Table 5.10 Design of data extraction forms, attributes and their mapping to the research questions ... 35 Table 5.11 Research practices used in research for evaluating different static analysis techniques ..... 38 Table 5.12 Variables & measuring criteria used to investigate benefits & limitations of static analysis techniques ...... 40 Table 5.13 Distribution of studies evaluating inspection technique ...... 45 Table 5.14 Outcome of studies (Category A) with respect to variables in inspection ...... 47 Table 5.15 Outcome of studies (Category B1) with respect to variables in inspection ...... 47 Table 5.16 Outcome of studies (Category B2) with respect to variables in inspection ...... 48 Table 5.17 Outcome of studies (Category C) with respect to variables in inspection ...... 48 Table 5.18 Distribution of studies evaluating static analysis tools among themes ...... 49 Table 5.19 Outcome of studies (Category A) with respect to variables in static analysis tools ...... 51 Table 5.20 Outcome of studies (Category B1) with respect to variables in static analysis tools ...... 52 Table 5.5.21 Outcome of studies (Category B2) with respect to variables in static analysis tools ...... 52 Table 5.22 Outcome of studies (Category C) with respect to variables in static analysis tools ...... 53 Table 6.1 Overview of the survey participants experience ...... 62 Table 6.2 Overview of participants experience in software development ...... 62 Table 6.3 Overview of participants experience in static code analysis techniques ...... 63 Table 6.4 Definition of the seven variables evaluated in the SLR and in the Survey ...... 76 Table 6.5 Effectiveness - Friedman test statistics ...... 78 Table 6.6 Effectiveness - Wilcoxen signed rank test statistics ...... 79 Table 6.7 Number of false positives - Friedman test statistics ...... 88 Table 6.8 Number of false positives - Wilcoxen signed rank test statistics ...... 89 Table 6.9 Fault content – Friedman test statistics ...... 99 Table 6.10 Fault content - Wilcoxen signed ranked test statistics ...... 100 Table 6.11 - Cost efficiency - Friedman test statistics ...... 110 Table 6.12 Cost efficiency - Wilcoxen signed rank test statistics ...... 110 Table 6.13 Ease of use - Friedman test statistics ...... 121 Table 6.14 Ease of use - Wilcoxen signed rank test statistics ...... 121 Table 6.15 Internal code quality - Friedman test statistics ...... 132 Table 6.16 Internal code quality - Wilcoxen signed rank test statistics ...... 132 Table 6.17 Product quality - Friedman test statistics ...... 143 Table 6.18 Product quality - Wilcoxen signed rank test statistics ...... 143 Table 7.1 Summary of inspection benefits and limitations based on the rigor and relevance categories ...... 157 Table 7.2 Summary of static analysis tools benefits and limitations based on the rigor and relevance categories ...... 158 Table 7.3 Summary of the usage of static code analysis techniques in industry ...... 161 Table 7.4 Summary of survey findings on the effectiveness of different static code analysis techniques...... 163 Table 7.5 Similarities & differences between the categories representing the different views on effectiveness ...... 164 Table 7.6 Summary of survey findings on the number of false positives produced by different static code analysis techniques ...... 166 Table 7.7 Similarities & differences between the categories representing the different views on number of false positives ...... 167

Table 7.8 Summary of survey findings on the perceived fault content of different static code analysis techniques...... 168 Table 7.9 Summary of survey findings on the cost efficiency of different static code analysis techniques ...... 170 Table 7.10 Similarities & differences between the categories representing the different views on cost efficiency...... 171 Table 7.11 Summary of survey findings on the ease of use of different static code analysis techniques ...... 173 Table 7.12 Similarities & differences between the categories representing the different views on ease of use...... 174 Table 7.13 Summary of survey findings on the internal code quality of different static code analysis techniques...... 176 Table 7.14 Similarities & differences between the categories representing the different views on internal code quality...... 177 Table 7.15 Summary of survey findings on the perceived product quality of different static code analysis techniques ...... 178

1 INTRODUCTION

Software verification and validation is a vital activity in a software development process. Static code analysis is a software verification technique, which refers to the process of examining the code without executing it in order to capture the defects in the code early avoiding costly later fixations [94]. Static code analysis has two main approaches: manual and automated. In manual approaches, different techniques are used to manually inspect the software artefacts by the personnels [94]. These techniques include inspections, walkthroughs and informal reviews [94]. Also several techniques exist to support the inspection process when using manual approach, such as different reading techniques are used to read the code document [84], and fault content estimation techniques are used to support re-inspection decisions [33]. In an automated approach, tools are used to automate the code verification process [85].

The lack of realistic empirical evaluations of different technologies in software engineering have been identified as a major issue, limiting the ability of research to impact industry, and hence, resulting in problems for practitioners and researchers [71]. Practitioners looking to adopt technologies from academia lack decision support when it comes to adopt new technologies from the research [72]. Researchers looking for an empirical basis on which to refine or build new technologies face difficulties to get hold of well described, evaluated, and validated studies [74]. To help transferring technologies from academia to industry, research needs to prove that research results or the technologies resulting from the research are beneficial for the industry [74]. In this regard, empirical evaluations of academic research can provide an evidence to motivate practitioners to adopt new software technologies proposed in research [74]. A number of studies [85, 94, 95, 96, 97] present empirical evaluations in the field of static code analysis. Despite software inspection existed since 1976 [112], the studies presenting empirical evaluations in the field are less in number. There is a lack of systematic literature reviews [85], and industrial surveys [96] performed in the area. None of the evaluations focused on comparing all the static code analysis techniques together, rather the evaluations focused on specific techniques such as static analysis tools [85, 96], reading techniques [95] and manual approaches [94, 97]. In addition, none of the studies took the rigor and relevance as quality assessment criteria in their empirical evaluations.

The aim of this study is to address the gap in research resulting from the lack of realistic empirical evaluations and contribute towards bridging the gap between static code analysis research and its industry. This study provides a thorough empirical evaluation considering various techniques in the field of static code analysis reported in literature, taking rigor and relevance of the research into consideration,

This study has two main contributions. First, it provides guidelines for researchers, which help them in different ways, as stated below:

 Identify which static code analysis techniques has been fairly evaluated, this will help directing research effort to fill the gap in research, and also tell if the static code analysis techniques which are fairly evaluated in research are actually used in industry, identifying if there is a gap between research and its industry.  Identify reported benefits/limitation of static code analysis techniques and which variables/measuring criteria were used to report them. Identifying these variables will make future research consistent, ease synthesizing benefits/limitation of the techniques, and allow us to investigate these variables in industry.  Score rigor and industrial relevance revealing the quality of existing static code analysis research, this will also determine if research results are good enough to be presented as strong evidence for industrial practitioners looking to adopt new techniques. In addition the study will determine the main reasons limiting rigor and relevance of existing research and help conducting high rigor and high relevance research in the future.

Provide recommendation for researchers outlining how to conduct high quality, consistent, easy to synthesize and industry oriented research.

Second, it provides guidelines for industry practitioners, which will help them in different ways, such as following:

 Investigate the adoption of static code analysis techniques in the industry, and what does it depends on. This will provide an overview of industrial practices and tell us if the techniques which are fairly evaluated in research are actually used in industry.  Identify benefits and limitations of different static code analysis techniques from industrial perspective, and then compare them to the benefits and limitations a technique reported by researchers to make facts available for practitioners and facilitate decision making for them when deciding which technique to adopt.

The remaining part of this report has been organized as follows: Chapter 2 discusses different static code analysis techniques that have been considered for this research. Chapter 3 presents the related work done in our research. Chapter 4 presents the aims and objectives, research questions, and research methods and procedures that have been followed to conduct the research. Chapter 5 presents the systematic literature review to identify relevant research to the subject matter. Chapter 6 presents the survey conducted to collect data from industrial practitioners. Chapter 7 analyzes the findings form the systematic literature review and survey giving recommendations for researchers and practitioners. Chapter 8 contains conclusions, and Chapter 9 presents some suggestions for the future work.

2 STATIC CODE ANALYSIS

Static code analysis is a software verification technique refers to the process of examining the code without executing it in order to capture the defects in the code early avoiding costly later fixations [94]. Static code analysis has two main approaches: manual and automated. Manual approaches involve human subjects performing the process of reviewing the code and capturing defects, while in automated approach, computer-based tools are used to detect defects.

Manual approaches are conducted in both formal and informal manners. The formal reviews follow a formal process that is well defined, structured and regulated. The informal reviews refer to examine software artefacts to detect defects without a prescribed process. It is normally applied during the early stages of the life cycle of the code and to the other artefacts. A two-person team can perform an informal review. The static code analysis tools provide computer programs to perform code reviews. These terms are defined more precisely as follow [117]:

 Inspection: Inspection is a well-defined and structured process in which a team of experts inspect a work product using a systematic reading technique in order to detect defects. An inspection meeting consists of six main steps: planning, kick-off, preparation, inspection meeting, rework, and follow-up).  Informal review: Informal review is a process in which software artefacts are examined by any team member without any prescribed process. They are also referred to as peer reviews.  Walkthroughs: Walkthrough is formal process in which a developer leads members of the development team through a segment of the code, and let the participants raise possible defects in the code.  Static analysis tools: Static analysis tools are computer programs used for code review purposes. There are many open-source and commercial tools available in the market.

3 RELATED WORK

In this section, we present a short overview of the most relevant studies focused on the state-of-the- art and the state-of-the-practice in the field of static code analysis. These studies were captured during the manual search and during the systematic literature review process. In the text below, we will highlight the main findings of those studies and how does this study differ from them.

In 2002, Aurum et al. [94] published a review paper focusing on the software inspection technique as firstly proposed by Michael E. Fagan in 1976 in his famous article [112]. In addition, he reviewed several variations to the Fagan inspection process, which were been developed over the 25 years of time to improve the performance of the inspection methods. The study overviewed and summarized the structural differences between the Fagan inspection process and the variations to the technique. These structural differences mainly include change of activities in preparation or inspection meeting, differences of goals in each stage, team size, and differences in the coordination strategy between team members. The paper has also discussed several methods and tools that support the structure of different inspection techniques, e.g., various reading techniques, electronic support tools, support to re- inspection, and different defect estimation techniques.

The study [94] has recognized the contributions of research to the general body of knowledge in software inspections, especially in the areas of inspection process, reading techniques and defect estimation techniques. However, it has also pointed out the limitations of research in the area for been unable to find any single study or proposal providing an idea on how software inspections are been researched or practiced. The study only focuses on the Fagan inspection and some variations to its process and no other static code analysis techniques.

In 2002, the International Software Engineering Research Network (ISERN) and the Fraunhofer Institute for Experimental Software Engineering (IESE) initiated a large online survey in order to investigate the state-of-the-practice in software review and to examine the review adoption in the industry [113]. The survey was conducted in two parts: the first part was conducted in the organizations only in Germany [97], and the second part was conducted in the organizations around the globe [114]. The main purpose of the survey was to investigate how the reviews are been carried out in the industry and what approaches are been used. The survey results provide an insight into how different companies design and conduct reviews. Software reviews comes in different shapes depending on the environment in which they are been carried out, such as including review goals and the development process. The review goals are ranging from early defect detection to better team communication. The results also show that the review approaches are more inclined towards the non-systematic methods and techniques.

In study [85] (Heckman and Williams, 2011) performed a systematic literature review to evaluate the alert identification techniques utilized in the tools used for automated static code analysis. They aimed at providing an evidence-based summary to enable selecting alert identification techniques that can provide actionable alerts, not faulty ones, increasing the use of tools for code analysis. The limitation in this study is that it only evaluates one technology that is static analysis tools and not the other technologies.

In study [95] (Ciolkowski, 2009) evaluated the perspective based reading technique by aggregating the evidence in different studies evaluating the effectiveness of the technique. The limitation in this study is that it only evaluates one reading technique and no other techniques for static code analysis.

In study [96] (D’Silva et al, 2008) conducted a survey on algorithms that perform automatic static analysis of software to detect programming errors or prove their absence. The three techniques considered are static analysis with abstract domains, model checking, and bounded model checking. The limitation in this study is that it only evaluates one technology that is static analysis tools and not the other technologies.

In study [97] (Laitenberger et al, 2002) conducted an on-line survey to investigate the state of the practice in peer reviews, walkthroughs and inspections in German software organizations. The survey reflected what static code analysis techniques are used in industry from different perspectives, such as company size, quality standards used, etc. The limitation of this survey is that the evaluation is performed in industry without referring and analyzing literature.

Study [94] focused on Fagan inspection, and the changes to the process structure which resulted in new techniques such as N-Fold inspection, phased inspection, the study also focused on the techniques that support inspection structure such as reading techniques and fault content estimation techniques, however the study doesn’t consider the automated static analysis tools. Study [85, 96] focused on automated techniques and the algorithms employed on the tools to detect defect, however, the manual techniques are not discussed in this study. Study [95] focused on evaluating two reading techniques: perspective based and checklist based reading technique, however, other manual and automated techniques are not considered. Study [97] focused on all the static code analysis technologies. This study is performed in the same field as the other related studies. This study focuses on all the techniques in static code analysis, both manual and automated techniques.

Study [94] performed a literature search, however, the study design is not explained well enough to extract further information. Study [96] performed a survey on the tools available on market. Study [85, 95] performed a systematic literature review. Study [97] used online survey. This study uses both systematic literature review and survey to better realize research aims form both academic and industrial perspective.

Study [94] did not reveal what assessment criteria it used to judge and select the primary studies. Study [96] considered the scalability and ability of automation for the selected tools. Study [95] did not reveal the quality assessment criteria. Study [85] used a quality checklist comprising questions. Study [97] performed a survey where the opinions from all participants are considered. This study uses the rigor and relevance after surveying several quality assessment criteria in literature, the rigo

Study [94] did not reveal what sources has been searched. Study [96] didn’t perform database search, rather it surveyed tool in the market. Study [95] searched ACM Digital Library, IEEE Xplore, Kluwer Online, ScienceDirect, Elsevier, SpringerLink, and Wiley InterScience. Study [85] performed database search on the following databases ACM Digital Library, Compendex/Inspec, Computers and Applied Sciences Complete, ISI Web of Knowledge, IEEE Xplore, ScienceDirect, Springer Link. Study [97] collected the opinions form industrial population. This study complements the automated search by using a hybrid strategy consisting of automated and manual search and backward snowballing.

Studies [85, 94, 97] used descriptive synthesis. Study [96] didn’t perform data synthesis. Study [95] performed meta-analysis. This study used thematic analysis and descriptive synthesis to effectively synthesis different results.

Comparing the previous studies to our study, none of the previous studies considered the rigor and relevance criteria. Most of the previous studies focuses on one single technology of static code analysis, while this study evaluates all the technologies in the field of static code analysis, not a single technology, and capture the rigor and relevance of the individual studies using the Rigor and Relevance model [74].

4 RESEARCH METHODOLOGY 4.1 Research motivation

Despite the importance of static code analysis techniques and its existence for more than three decades the number of empirical evaluation in this field such as systematic literature reviews and surveys are less in number and don’t take into account the rigor and industrial relevance. The limitations of existing evaluations are explained in details in Chapter 3.

4.2 Aims and objectives

The aim of this study is to perform a thorough empirical evaluation in the field of static code analysis to improve existing research and make it industry oriented, as well as improving the ability of industry to impact static code analysis research, contributing towards bridging the gap between static code analysis research and its industry. The evaluation takes the rigor and industrial relevance as quality assessment criteria to measure the quality of research. The two main objectives of the study are as follow:

 The first objective is to provide guidelines for researchers. The researcher’s guidelines will first explore static code analysis research to identify which static code analysis techniques has been fairly evaluated, their reported benefits/limitation and which variables/measuring criteria were used to report them, identifying these variables will also allow us to investigate them in industry. Second, it measures the rigor and industrial relevance of existing static code analysis research using scoring rubrics introduced in study [74]. Finally, the guidelines will provide recommendations on how to conduct high quality, industry oriented research.  The Second objective is to provide guidelines for practitioners. The guidelines will first investigate the adoption of static code analysis techniques in industry and the factors that support them in their adoption process. Second, the practitioner’s guidelines will identify the benefits and limitations of static code analysis techniques as perceived by industrial professionals and compare them to benefits/limitations reported by researchers to possibly generalize and aggregate the findings on benefits and limitations. This will facilitate decision making for practitioners when adopting techniques and ultimately allow feedback from industry to research.

4.3 Research questions

The research questions have been formulated in a way to adequately address the aims and objectives of the study:

 RQ1: What is the state of research in static code analysis? o RQ1.1 Which static code analysis techniques received fair attention in research? o RQ1.2 What kind of research practices are taking place? o RQ1.3 What variables and measuring criteria are used to report benefits and limitation of different static code analysis techniques by researchers?

The first research question aims at exploring static code analysis research to identify which static code analysis techniques has been fairly evaluated, what kind of research practice are taking place and what variables/measuring criteria are used to report benefits/limitation of different static code analysis techniques, identifying these variables will help us investigate them in industry as we will se in RQ6.2. The first research question partially fulfills the first study objective.

 RQ2: What is the state of rigor and relevance of static code analysis research? And why?

The second research question aims at diagnosing the rigor and industrial relevance of existing static code analysis research to identify its current status and if improvement is needed. If the status of rigor and relevance is good (high rigor and high relevance), the outcome of the studies (reported benefits/limitations) will be used in conjunction with the survey findings on benefits/limitation to provide evidence for practitioners looking to adopt new techniques, facilitation decision making for them. The second research question partially fullfils the first study objective.

 RQ3: What are the benefits and limitations of different static code analysis techniques reported in literature?  RQ4: What is the strength of evidence (in terms of rigor and relevance) supporting the claimed benefits and limitations of static code analysis?

RQ3 and RQ4 aims at identifying if the reported benefits and limitations are of a kind that will appeal to industrial professionals looking to adopt new techniques. In other words, does the reported benefits/limitations of different static code analysis techniques originate from studies conducted with a high degree of rigor and in a setting close to industry? In addition, can we compare their findings (reported benefits/limitation) to benefits/limitations reported by industrial professionals? Do researchers and practitioners agree or disagree regarding the benefits and limitations of different static code analysis techniques? RQ3 and RQ4 partially fulfills the first study objective. They also provide the input needed to be cross analyzed with RQ6.2 to partially fulfill the second study objective. See figure 4.1.

 RQ5: Which static code analysis technologies are most frequently used in industry? o RQ5.1: Does company size, software product type or software lifecycle model influence the usage of static code analysis techniques? o RQ5.2: Does the usage of static code analysis techniques relates to the amount of attention they received in research?

RQ5 aims at investigating the adoption of static code analysis techniques in industry by identifying what static code analysis techniques are actually used in industry, and what does that depends on. Specifically, we want to know if the adoption depends on the company size, software product type or life cycle model. In addition, we want to know if the static code analysis techniques that received fair attention in research and identified by RQ1.1 are actually used in industry. By doing all this, it will give us an idea if there is a gap between static code analysis research and its industry. RQ5 has two sub research questions. RQ5 partially fulfills the second study objective.

 RQ6: What are the benefits and limitations of static code analysis techniques as perceived by industrial professionals? o RQ6.1: Does the company size, software product type or software life cycle influence the effectiveness of static code analysis techniques? How do these views compare to each other and to the global view? o RQ6.2: How does the benefits and limitations relate to the benefits/limitations, rigor/relevance identified in the SLR?

RQ6 aims first at identifying the benefits/limitations as perceived by industrial professionals and how do they relate to benefits/limitations reported by researchers (RQ3 and RQ4). Second, it aims at identifying if the practitioner’s opinion on benefits/limitations is influenced by company size, software product type or life cycle model. Having these different views on benefits/limitations will allow us to analyze them and see if they agree or differ and provide different differnet perspectives on them. Based on these views we can give recommendations for industrial practitioners to facilitate decision making for them when they are looking to adopt new static code analysis techniques. RQ6 has two sub research

15 questions. Cross analyzing, i.e., comparing the findings of the SLR and the survey research questions RQ3, RQ4 with RQ6 will also partially fulfill the second study objective.

4.4 Research Design

In Sections 4.2 and 4.3, the authors explained how the research questions adequately address the study objectives. In this section, the research methodology is carefully selected and designed to adequately answer the research questions and fulfill the study objectives.

In order to fulfil study objectives, the research design will be based on a sequential exploratory strategy, i.e. a mixed-methods research design [86] will be used, that is characterized by the collection and analysis of qualitative data followed by the collection and analysis of quantitative data [86].

Our research design will be based on multiple empirical research methods: systematic literature review, and questionnaire-based online survey. The systematic literature review will be conducted to collect relevant studies evaluating static code analysis techniques, followed by an evaluation of state-of- research according to rigor and relevance of the studies. After that, a survey will be conducted to collect practitioner’s opinions, the opinions will be subject to statistical analysis for interpretation, the survey will also take some input form the systematic literature review to achieve the second study objective and provide guidelines for practitioners.

4.4.1 Systematic literature review

To answer research questions RQ1, RQ2, RQ3 and RQ4, and to completely achieve the first study objective, and partially achieve the second study objective by providing needed input, a systematic literature review will be conducted to collect individual studies evaluating static code analysis techniques. The systematic literature review is chosen over the traditional reviews because the systematic review is more methodical and thorough. It allow us to systematically integrate the evidence from the research in an efficient way [93]. In the SLR, rigor and relevance will be used as quality assessment criteria. After conducting the systematic literature review, rigor and relevance of research will be scored according to the rigor and relevance model suggested by Ivarsson and Gorschek [74]. This model will help us to measure the quality of the studies evaluating static code analysis techniques, as studies will be classified in terms of rigor and relevance using scoring rubrics and quantification and visualization techniques will be used to analyse the data. This quality assessment of the studies will help us to assess the strength of evidence, and hence, reveal the gaps in research in the field of static code analysis.

The guidelines for researchers, which is the outcome of the first study objective, will be directly and completely developed based on the data resulting from the systematic literature review. Also, research questions RQ1, RQ2, RQ3 and RQ4 and their sub research question will be completely answered through the data resulting from the SLR. Further, the systematic literature review will partially achieve the second study objective, which is develpoing the guidelines for practitioners, because answering RQ1.1, RQ1.3, RQ2, RQ3, and RQ4 provides the necessary input needed to supplement the survey findings (Sections 7.2.3 and 7.2.4) and finally develop the guidelines for practitioners. See figure 4.1 for research design details, and see Table 4.1 for the mapping of study objectives to research questions and research methods.

Kitchenham guidelines will be followed to perform the systematic literature review [87]. Additional guidelines for performing search [88], study selection strategies [90], analysis and synthesizing of evidence from studies [89] will also be followed. The main goal to perform the systematic literature review is to evaluate and interpret most relevant research to our topic area, and to ensure that the review will be methodical, repeatable, and thorough. Moreover, it will help us to minimize the level of bias that can be prevalent in Traditional Literature Review [93]. The details on the SLR can be found in Chapter 5.

4.4.2 Survey

As we have mentioned earlier, our research design will consist of two main research methods, systematic literature review and survey, along with a suitable data gathering and analysis approach. Survey research provides quantitative data about the trends, attitudes, or opinions of a population by studying a sample of that population, and the results from the sample helps in generalizing and making claims about the population [86]. It can employ one or a combination of several data gathering techniques, from self-administered questionnaires to interviews, and from few others [91]. The quantitative data gathered from the survey will be subject to statistical analysis to acquire its interpretations and answer corresponding research questions.

An online survey will be performed focusing on practitioners working in the software verification and validation field, in order to collect data necessary to answer RQ5 and RQ6. The benefits and limitations of different static code analysis technologies identified in the survey (RQ5, RQ6) will be validated against the benefits and limitations reported in the literature (RQ3, RQ4), and the results will be used, if possible, to generalized the findings (see Section 7.2.4). In addition, the techniques used in industry will be identified by RQ5 and then compared to RQ1.1 to see if the techniques that are most evaluated are actually used in industry, this will allow us to identify if there is a gap between static code analysis research and its industry, see section 7.2.3 for results. Using the input from the SLR and conducting the survey will completely develop the practitioner guidelines fulfilling the second study objective, see figure 4.1.

We have selected survey for this research study because it is most appropriate for gathering data from a suitable number of respondents, which is larger in number in case of questionnaire-based online survey as compared to relatively smaller in number in case of interviews [92]. Also, it is easy to apply different statistical approaches on collected data for analysis purposes. A questionnaire-based online survey will be used as data collection approach. The questionnaire will be developed by following Kitchenham and Pfleeger’s guidelines for survey research in software engineering [92]. The target population will be practitioners from industry who are more closely related with software verification and validation. The questionnaires was published online and remained open for four weeks. The details on the survey can be found in Chapter 6.

Figure 4.1 Overview of the research design

5 SYSTEMATIC LITERATURE REVIEW 5.1 Theory and Methodology 5.1.1 Objective

Kitchenham et al [87] emphasized the importance of adopting evidence based approach in software engineering research. Systematic Literature Reviews is a way of synthesizing data and deploying evidence based approach in software engineering. The objective of performing the SLR is to:

 Collect the evidence in literature reporting benefits and limitations of different technologies in the field of static code analysis. Evidence here is defined as any peer reviewed research including conference articles, workshops, proceedings and journal articles.  Aggregate the evidence to support evidence based guidelines for researcher and practitioners in the field.

The reason for performing a systematic literature review instead of a traditional literature review is to ensure that the review will be methodical, repeatable, and thorough. Moreover, it will help to minimize the level of bias that can be prevalent in Traditional Literature Review [93]. Kitchenham guidelines [87] will be followed to perform the systematic literature review. Since the guidelines has been first published in 2004 improvement has been provided on the preliminary guidelines, additional guidelines for performing search [88, 98], study selection strategies [90], and analysis and synthesizing the evidence from studies [89] will also be followed.

5.1.2 Inclusion/exclusion criteria

The following inclusion/exclusion criteria are considered to include/exclude relevant studies resulting from the search process.

5.1.2.1 Inclusion criteria  The study should present an evaluation, studies presenting a technology without evaluating it are excluded, and evaluation could be any sort of an empirical evaluation.  The evaluation must have a clear focus on evaluating a static analysis technology, meaning the evaluation must be part of the objectives of the study.  The study must be published after the year 1976. The motivation behind is the development of inspection technique which was introduced by Michael Fagan in 1976 [94].  The study must evaluate a static code analysis technology. In this context, technology is defined as (techniques, methods, models etc.) evaluation is any sort of empirical evaluation.  Journals, Conferences and Workshops are included.  Studies from academia and industry that involve professionals as well as students are included.  The study must be accessible in full text.  The study must be documented in English.

5.1.2.2 Exclusion criteria  Secondary studies in the field are excluded  Studies in languages other than English are excluded.  All studies that are not yet published and are in the editorial phase are excluded.  Studies not accessible in full text are excluded.  All duplicate studies are excluded.  Studies presenting a technology without evaluating it are excluded.

The protocol for applying the inclusion/exclusion criteria consider the strategies for study selection described in study [90]. The inclusion/exclusion decision is made by both researchers. If both researchers agree on a study, the study is included. If one researcher agrees and the other is not sure, the study is included. If one researcher disagree on a study and the other is not sure, the study is excluded. If one researcher agree, and the other disagree, further discussion will be performed and the study will be screened in full text, if the conflict is not resolved, an expert opinion will be taken. This will limit the bias in including/excluding studies.

5.1.3 Search strategy

The search strategy utilized in this study is a hybrid strategy comprising two different search techniques, manual search, and automated search, and backward snowballing, the reason of using this hybrid approach is overcome the limitations of each technique. In manual searches a researcher search different publication venues (where studies can be published and retrieved) for example conferences, journal and workshops year by year, issue by issue to identify relevant studies, this Step 1: Identifying venues and search process is rigorous but consumes a lot of time and engines effort in eliminating irrelevant studies, and a researchers may not know about all venues risking to miss a venue [98]. Step 2: Establishing the Quasi Gold Standard In automated search a search string is applied on (Manual search) search engines (digital libraries) such as IEEE, ACM, etc., to identify the relevant studies, compared to manual search automated search is less time consuming, but it lacks rigor and depends Step 3: Defining search strings solely on the quality of the search string [98]. In the existing guidelines for performing manual or automated search no mechanism is proposed to evaluate the performance of the search strategy [98]. The search strategy in this study follows the Step 4: Conducting the automated search search strategy proposed by Zhang et al [98], it combines manual and automated search, and it allows an evaluation of the search strategy. Further to enhance the quality of the search the authors performed backward snowballing which includes Step 5: Evaluating search performance browsing through the reference list to identify further relevant studies [88]. The search method presented in study [98] consists of five steps described below:

Quasi Seneitivity > NO Step1: Identifying relevant venues and search 80% engines (digital libraries)

A venue is where studies can be published and yes retrieved, e.g., conferences and journals, manual searches are performed on venues, venues are Move forward located at search engines or digital libraries, while automated search is performed on digital libraries. Venues and search engines can have a many to Figure 5.1 Five steps search process proposed by many relationship, one venue can be located across Zhang et al [98]

20 several search engines, while one search engine can host many venues. The objective of this step is to identify as much publication venues as possible located in as minimum search engines as possible [98].

The following venues are identified with the help of an expert in the field, the authors also conducted a check to ensure their relevance to the research subject area. The manual search will be applied on the venues in Table 5.1.

The below search engines are selected for the automated search, all the venues identified earlier are hosted in one or more of the selected search engines. There is a risk that in step 1 although venues are identified with a help of an expert, still a venue could be missed, to mitigate this risk the authors carefully selected the search engines that host most of the venues for software engineering including the venues identified in the previous step. Accessibility of the search engines to the authors is also considered. The search engines selected are:

 ACM Digital Library.  IEEE Xplore.  Engineering Village (Inspec/Compendex).  Science Direct (Elsevier).  ISI Web of Science.

Some software engineering search engines are excluded such as Kluwer Online, SpringerLink, Scopus and Wiley InterScience since the previously conducted studies revealed that the research papers extracted from these databases are also returned by either Engineering Village or ISI Web of science [99, 100], the authors did manually verify that.

Table 5.1 Publication venues with their respective hosting libraries

Name Abrreviation Type Hosting Library International Symposium on Empirical ESEM Conference ˗ IEEE Xplore Software Engineering and Measurement ˗ ACM

Empirical Software Engineering EMSE Journal Springer International Conference on Software ICSSP Conference ˗ IEEE Xplore and System Process ˗ ACM ˗ Springer link

Euromicro Conference on Software Euromicro SEAA Conference IEEE Xplore Engineering and Advanced Applications International Conference on Software ICSE Conference ˗ IEEE Xplore Engineering ˗ ACM

IEEE Transactions on Software TSE Journal IEEE Xplore Engineering Software Testing Verification & STVR Journal Wiley Reliability Journal of Systems and Software JSS Journal Science Direct International Conference on Software ICST Conference IEEE Xplore Testing, Verification and Validation Software Quality Journal SQJ Journal Springer link ACM Transactions on Software TOSEM Journal ACM Engineering and Methodology

Step 2: Performing the manual search and establishing the Quasi Gold Standard (QGS)

QGS is a set of well known, high quality studies in the related venues [98], to form a QGS a manual search is performed on the selected venues in step 1 in order to establish the QGS, the venues are searched year by year , paper by paper to retrieve these high quality studies. To ensure the reliability of inclusion decision the studies are checked in full text and an expert reviewed the selected studies and ensured their quality. When searching the venues there was no time span defined for the search venues, rather the authors aimed at reaching a diverse set of articles, at that point the search process was stopped, diversity here is measured in terms of author, year of publication, static analysis technology been evaluated, and publication venue. Achieving diversity in the articles will allow a good evaluation of the automated search as the automated search string will be checked against the QGS as explained later in step 4. The author’s retrieved 25 studies in this step.

Step 3: Definition of search strings

Screening the QGS studies, the keywords forming the search string is derived subjectively with the help of an expert, the authors also used their domain knowledge and past experience. Different search strings are derived for different search engines as search engines vary in their search syntax. The search string consist of 4 sections shown below, then logical operators are used to link the keywords.

 Keyword related to the study domain (software).  Keywords related to the sub area in the domain (code).  Keywords related to the different technologies of static code analysis (reading technique OR inspection OR recapture OR {static analysis tool} OR review OR walkthrough).  Keywords related to research methods (empirical OR case study OR experiment OR {action research} OR interview OR survey).

Software AND (reading technique OR inspection OR recapture OR {static analysis tool} OR review OR walkthrough) AND (code OR program) AND (empirical OR case study OR experiment OR {action research} OR interview OR survey)

Step 4: Performing the automated search

In this step the defined search strings in step 3 are applied on the selected search engines in step 1, Table 5.2 below shows different search strings and their search engines and the studies retrieved, a total number of 2093 studies was retrieved by the automated search. The search strings were coded to fit the syntax requirements and capability of each search engine.

The authors utilized Zotero and JabRef as reference management tools, to assist with reference management, categorization of the studies and finalizing the study selection process, both tools are available for free.

Step 5: evaluating the automated search performance

In this step in order to evaluate the performance of the automated search, the studies retrieved from the automated search in step 4 are checked against the QGS established in step 2. The rule is that the automated search should capture more than 80% of the QGS, this is called Quasi sensitivity, in other words the quasi sensitivity of the automated search should be more than 80%, and otherwise the process is repeated from step 3 until a quasi-sensitivity of more than 80% is achieved. The search string derived in this study captured 21 out of the 25 studies forming the QGS, and achieved a sensitivity of more than 80%.

Table 5.2 Search strings and the number of retrieved studies

No. of Search Search String Retrieved Engine Studies Engineering (((software AND ({reading technique} OR {reading techniques} recapture 1011 village OR inspection* OR static analysis OR review OR walkthrough) AND code AND (empirical OR case stud* OR experiment OR survey OR interview OR {action research}))) AND ({english} WN LA)) ISI web of (TS=(review) OR TS=(walkthrough) OR TS=(static analysis) OR 385 science TS=(inspection) OR TS=("reading technique") OR TS=(recapture)) AND Language=(English) AND Topic=(code) AND Topic=(software) AND Topic=(empirical) OR Topic=(case study) OR Topic=(experiment) OR Topic=(survey) OR Topic=(interview) OR Topic=("action research")

Refined by: Web of Science Categories=( COMPUTER SCIENCE SOFTWARE ENGINEERING OR COMPUTER SCIENCE THEORY METHODS OR ENGINEERING ELECTRICAL ELECTRONIC OR COMPUTER SCIENCE INFORMATION SYSTEMS OR COMPUTER SCIENCE HARDWARE ARCHITECTURE OR COMPUTER SCIENCE ARTIFICIAL INTELLIGENCE OR COMPUTER SCIENCE INTERDISCIPLINARY APPLICATIONS ) AND Research Areas=( COMPUTER SCIENCE OR ENGINEERING OR MATHEMATICS OR EDUCATION EDUCATIONAL RESEARCH ) Science direct (pub-date > 1975 and TITLE-ABSTR-KEY(code)) AND (pub-date > 1975 83 and TITLE-ABSTR-KEY(software)) AND ((pub-date > 1975 and TITLE- ABSTR-KEY(empirical) or TITLE-ABSTR-KEY(case study)) OR (pub-date > 1975 and TITLE-ABSTR-KEY(interview) or TITLE-ABSTR-KEY("action research")) OR (pub-date > 1975 and TITLE-ABSTR-KEY(experiment) or TITLE-ABSTR-KEY(survey))) AND ((pub-date > 1975 and TITLE- ABSTR-KEY(inspection) or TITLE-ABSTR-KEY(review)) OR (pub-date > 1975 and TITLE-ABSTR-KEY(walkthrough) or TITLE-ABSTR-KEY(static analysis)) OR (pub-date > 1975 and TITLE-ABSTR-KEY(recapture) or TITLE-ABSTR-KEY("reading technique"))) IEEE (software AND ("reading technique" OR "reading techniques" OR recapture 482 OR inspection* OR static analysis OR review OR walkthrough) AND code AND (empirical OR case stud* OR experiment OR survey OR interview OR "action research")) ACM ((Abstract:software)AND(Abstract:code)AND(Abstract:"reading technique" 141 OR Abstract:"reading techniques" OR Abstract:static analysis OR Abstract:inspection* OR Abstract:review OR Abstract:walkthrough)AND(Abstract:empirical OR Abstract:case stud* OR Abstract:experiment OR Abstract:interview OR Abstract:survey OR Abstract:"action research"))

5.1.4 Studies inclusion/exclusion process

To reach a final list of studies, the inclusion/exclusion criteria is applied on the studies retrieved by the automated search, the aim is to exclude the irrelevant and duplicated studies and the list is finalized. Following this, the studies retrieved by the manual search and not captured in the automated search are merged as well as studies retrieved by performing snowballing from the reference list, the detail and sequence of the process is outlined below:

1. Studies retrieved by the automated search. 2. Duplicates removed at database level. 3. Duplicates are removed based on names and titles. 4. Irrelevant studies removed based on names titles abstract. 5. Irrelevant studies removed upon full text screening. 6. Studies not accessible in full text removed. 7. Including studies retrieved by manual search and not captured by automated search. 8. Including studies from backward snowballing.

In the first step the automated search retrieved 2093 studies from the search engines, among them 190 were found to be duplicates at the database level and were removed at step 2. In step 3 the studies are sorted alphabetically and duplicates are removed based on name title and abstract. in step four irrelevant studies were excluded after screening names titles and abstracts, in step 5 irrelevant studies were excluded upon full text screening, in step 6, 4 studies were excluded because they are inaccessible in full text. Previously when the manual search were performed 25 studies were retrieved and they formed the QGS, out of these 25 studies 21 studies was captured by the automated search, 4 studies were not, these 4 studies are included in step 7, in step 8 studies resulting from backward snowballing were included. Total of 70 studies were finalized from the search process at the end. The details of the process are shown in Figure 5.2.

IEEE Xplore ACM Engineering village ISI Web of Science Science Direct (1012) (141) (472) (83) (385)

1. Studies retrieved by the search string (2093) Duplicates at database level (190)

2. Studies after removing database duplicates (1903) title duplicates (751)

3. Studies after removing title duplicates (1152) Irrelevant studies upon name, title and absteact (981) 4. Studies after removing irrelevant studies upon name title and abstract (171) Irrelevant studies after full text screeining (105)

5. Studies after removing irrelevant studies upon full text screening (66)

Studies inaccessible in full text (4)

6. Studies after excluding studies not accessible in full text (62) Studies retrieved by manual search and not captured by autoamated search (4)

7. Studies after including studies retrieved by manual search and not captured in automated search (66) Snawballing studies (4)

8. Studies after including studies retrieved by backward snowballing (final list) (70)

Figure 5.2 Study inclusion/exclusion process

5.1.5 Kappa analysis

Kappa Analysis is a mechanism to measure the agreement level between researchers, in systematic literature reviews kappa analysis is used to mitigate the validity threats related to the bias in interpretation of criteria [101]. In this study Kappa analysis is used to measure the agreement level between the authors and identify and resolve any disagreement before starting the data extraction process, this will limit the bias in interpretation of the assessment criteria used in the data extraction and synthesis process. Guidelines for performing kappa analysis in study [101] are followed, the exit criteria for the kappa analysis are to reach a substantial level of agreement between the authors. Agreement levels are defined in the table below:

Table 5.3 Kappa agreement levels

Kappa Value Strength of Agreement <0.00 Poor 0.00-0.20 Slight 0.21-0.40 Fair 0.41-0.60 Moderate 0.61-0.80 Substantial 0.81-1.00 Almost Perfect

Kappa analysis is performed in the type of criteria or values which can be categorized, the assessment criteria subjected for the Kappa analysis is the criteria that will be used in the data extraction and synthesis process namely the research method and the type of static analysis technology been evaluated. 20 sample studies were carefully selected and each authors individually and independently assigned each study one of the values in the table below for each criteria.

Table 5.4 Kappa criteria for the research methods used in studies

Value Category 1 Action research 2 Lesson learned 3 Field study 4 Case study 5 Interview 6 Survey 7 Conceptual analysis 8 Experiment 9 SLR 10 Other 11 Multiple research methods 12 N/A

Table 5.5 Kappa criteria for the static code analysis techniques evaluated in studies

Value Category 1 Inspections 2 Walkthroughs 3 Reviews 4 Fault estimation 5 Automated analysis 6 Reading techniques 7 Multiple technologies

Finally the authors calculated the Kappa agreement level using IBM SPSS which is a statistical tool capable of calculating Kappa values. The following results are obtained for each criteria:

Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent raterB * raterA 20 100.0% 0 0.0% 20 100.0%

raterB * raterA Crosstabulation raterA Case Study Experiments SLR Count 4 0 0 Case Study Expected Count 1.0 2.8 0.3 Count 0 11 0 Experiments Expected Count 2.2 6.1 0.6 Count 0 0 1 raterB SLR Expected Count 0.2 0.6 0.1 Count 0 0 0 Others Expected Count 0.2 .6 0.1 Count 0 0 0 Multiple Research Methods Expected Count 0.4 1.1 0.1 Count 4 11 1 Total Expected Count 4.0 11.0 1.0

Symmetric Measures Value Asymp. Std. Approx. Tb Approx. Sig. Errora Measure of Agreement Kappa 0.920 .076 6.630 0.000 N of Valid Cases 20

The Kappa value reached for the research method is 0.920 which corresponds to “Almost Perfect” level of agreement. The exit criteria were to reach a substantial level of agreement therefore the Kappa analysis is done for the research method.

For the second category, the following results were obtained:

Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent raterB * raterA 20 100.0% 0 0.0% 20 100.0%

raterB * raterA Crosstabulation raterA Inspections Reviews Fault Estimation Techniques Count 3 2 0 Inspections Expected Count 1.0 0.8 0.8 Count 0 1 0 Reviews Expected Count 0.2 0.2 0.2 Count 0 0 3 Fault Estimation Techniques Expected Count 0.6 0.5 0.5 raterB Count 0 0 0 Automated Code Analysis Expected Count 0.8 0.6 0.6 Count 0 0 0 Reading Techniques Expected Count 0.8 0.6 0.6 Count 1 0 0 Multiple Technologies Expected Count 0.6 0.5 0.5 Count 4 3 3 Total Expected Count 4.0 3.0 3.0

Symmetric Measures Value Asymp. Std. Approx. Tb Approx. Sig. Errora Measure of Agreement Kappa 0.695 0.112 6.921 0.000 N of Valid Cases 20

The agreement level reached was 0.695 which corresponds to a substantial level of agreement, which match the objective, therefore the Kappa analysis is finished for criteria 2.

5.1.6 Quality assessment criteria

A number of studies [72, 76, 77, 78] emphasized on two important factors to assess the quality of studies, the rigor and relevance of the studies. The rigor of a study is defined as how a study is carried out and how it is reported [74]. It consists of two aspects, first the preciseness or exactness of the utilized research method in the study, and second is the way the study has been reported. If the rigor aspects are not reported adequately, it will be difficult to verify if a study has been performed rigorously. The relevance is defined as the potential value of a study for practitioners looking to adopt new technologies [74]. Contributing to the improvement of empirical evaluations of different technologies in software engineering, Ivarsson and Gorschek [74] developed a model that captures the rigor and relevance of technology evaluations. The model defines different aspects of the studies which characterizes rigor and industrial relevance of the studies.

The quality assessment criteria used in this study is the rigor and relevance criteria, the quality assessment criteria will be used to score the quality of the studies in terms of rigor and relevance, but this criteria will not be used to exclude any study, even the studies with low quality will be considered. The model providing a mechanism for scoring rigor and relevance of the studies developed by Ivarsson and Gorschek [74] Is used to score rigor and relevance of the studies. In study [74] 3 aspects define the rigor of the study, the context , study design and validity discussion, the rigor aspects is scored with 3 levels, weak, medium and strong presentation, these aspects will be used to calculate the final rigor value, table 6 below give the details for scoring the rigor as proposed by Ivarsson and Gorscheck [74].

Table 5.6 Scoring rubrics for evaluating rigor [74]

Aspect Strong description (1) Medium description Weak description (0) (0.5) Context described The context is described The context in which the There appears to be no to the degree where a study is performed is description of the context reader can understand mentioned or presented in which the evaluation and compare it to another in brief but not described is performed. context. to the degree to which a reader can understand and compare it to another context. Study design The study design is The study design is There appears to be no described to the degree briefly described. description of the design where a reader can of the presented understand, e.g., the evaluation. variables measured, the control used, the treatments, the selection/sampling used etc. Validity discussed The validity of the The validity of the study There appears to be no evaluation is discussed in is mentioned but not description of any threats detail where threats are described in detail. to validity of the described and measures evaluation. to limit them are detailed.

Context elements described in study [104] such as product, tools, techniques, experience of subjects, size of the object, product and duration of the observation will be considered to evaluate the context. The context of the study will be scored as strong presentation (1) if most of the context related factors mentioned above are described. If more than one context related factors are missing, the study will be

29 classified as medium presentation (0.5). The study will be classified as weak presentation (0), if no description of the context is described in which the evaluation is performed.

The elements considered to score study design are variables, subjects, treatments, sampling technique, measuring criteria. The study design will be scored as strong presentation (1), if most of the below design elements described above are mentioned. If most of the study designs related factors are missing or unexplained, the study will be classified as medium presentation (0.5). The study will be classified as weak presentation (0), if no description of the design is presented.

To score study validity, the study will be classified as strong presentation (1), if at least internal and external validity threats are discussed along with their mitigation strategies. If some of the validity threats are missing or presented briefly, or no mitigation strategy is described, the study will be classified as medium presentation (0.5). The study will be classified as weak presentation (0), if no description of the validity threats is presented. After assigning scores to context, study design and validity, the rigor is calculated by summing these scores, table 4.6 below illustrates the rigor scores applied on the studies retrieved for the search process.

Table 5.7 Rigor scoring of the studies retrieved from the SLR

Rigor Context Study Design Validity Study Citation Key Score Description Description Discussion (Total) [1] Kelly and Shepard, 2004 1 1 0.5 2.5 [2] Abdelnabi et al., 2004 1 1 1 3 [3] (Wedyan et al., 2009) 0.5 0.5 1 2 [4] (Host and Johansson, 2000) 0.5 0.5 1 2 [5] (Porter et al., 1997) 1 1 1 3 [6] (Laitenberger et al., 2001) 1 1 1 3 [7] (Wilkerson et al., 2012) 0.5 0.5 1 2 [8] (Nagappan and Ball, 2005) 1 1 0.5 2.5 [9] (Dunsmore et al., 2000) 0.5 0.5 0 1 [10] (Dunsmore et al., 2003) 1 1 1 3 [11] (Rigby et al, 2008) 0.5 0.5 0 1 [12] (Seaman and Basili, 1997) 1 1 0 2 [13] (Liu et al., 2012) 1 1 1 3 [14] (Hayes et al, 2011) 1 1 1 3 [15] (Perry et al., 2002) 0.5 0.5 0 1 [16] Johnson et al., 2013 0.5 1 1 2.5 [17] (Sol et al, 2002) 0.5 1 0 1.5 [18] (Dunsmore et al, 2002) 1 1 1 3 [19] (Jalote and Haragopal, 1998) 1 0.5 0 1.5 [20] (wojcicki and Strooper, 2007) 0.5 1 1 2.5 [21] (Kienle et al, 2012) 0.5 0.5 0 1 [22] (Wanger et al, 2008) 0.5 0.5 0.5 1.5

[23] (Zheng et al, 2006) 1 1 0.5 2.5 [24] (Chimdyalwar, 2012) 0.5 0.5 0 1 [25] (Macdonald and Miller, 1998) 0.5 1 1 2.5 [26] Porter et al., 1997 1 1 0 2 [27] (Li, 1995) 0.5 0.5 0 1 [28] (Koneri et al., 2005) 0.5 1 0.5 2 [29] (Hirayama et al., 2006) 0.5 0.5 0 1 [30] (Nagoya et al., 2006) 0.5 0.5 0 1 [31] (Vetro et al., 2011) 0.5 0.5 1 2 [32] (Laitenberger and DeBaud, 1997) 1 1 1 3 [33] Runeson and Wohlin, 1998 1 0.5 1 2.5 [34] (Fehnker and Huuck, 2013) 0.5 0.5 0 1 [35] (Harel and Kantorowitz, 2005) 1 0.5 0 1.5 [36] (Wagner et al., 2005) 1 1 0.5 2.5 [37] (Grbac et al., 2012) 1 1 1 3 [38] (Manzoor et al., 2012) 0.5 0.5 0.5 1.5 [39] (Siy and Votta, 2001) 1 1 1 3 [40] (Belli & Crisan, 1997) 0.5 0.5 0 1 [41] (De Lucia et al, 2009) 1 1 1 3 [42] (Baca et al, 2008) 0.5 0.5 0 1 [43] (Knight & Myers, 1993) 1 0.5 0 1.5 [44] (Baca et al, 2009) 0.5 0.5 0.5 1.5 [45] (McMeekin et al, 2009) 1 0.5 1 2.5 [46] (Kester et al, 2010) 0.5 0.5 0.5 1.5 [47] (Land et al, 2000) 0.5 1 0.5 2 [48] (Baca et al, 2013) 0.5 1 1 2.5 [49] (Stalhane and Awan, 2005) 0.5 0.5 1 2 [50] (land et al, 1997) 0.5 1 1 2.5 [51] (Nelson and Schumann, 2004) 1 0.5 0 1.5 [52] (Russell, 1991) 0,5 0 0 0,5 [53] Emanuelsson and Nilsson, 2008 0,5 0,5 0 1 [54] (Austin et al., 2013) 0,5 0,5 1 2 [55] (Petersson & Wohlin) 0,5 0,5 0 1 [56] (Marchenko and Abrahamsson, 2007) 0,5 0,5 0 1 [57] (Berling and Thelin, 2003) 1 0,5 0 1,5 [58] (Lanubile and Mallardo, 2007) 0,5 0 0,5 1 [59] (Nadeem et al, 2012) 0 0 0 0 [60] (laitenberger, 1998) 1 1 1 3

[61] (Hemeury, 1999) 0,5 0,5 0 1 [62] (Gattis and Cheatham, 1999) 0,5 0,5 0 1 [63] (Denger and Kolb, 2006) 1 1 1 3 [64] (Nakamura et al, 2006) 0,5 0,5 0 1 [65] (Mantyla and Lassenius, 2009) 1 1 1 3 [66] (Johnson and Tjahjono, 1998) 1 1 1 3 [67] (Carlsson and Baca, 2005) 0,5 0,5 0 1 [68] (Briand et al, 2004) 0,5 1 0 1,5 [69] (Welller, 1993) 1 0,5 0 1,5 [70] (Rodgers and Dean, 1999) 0,5 0,5 0 1

Four aspects define the relevance, subjects, context, scale, and research method, 2 values are used to score the relevance aspect, 1 if the aspect contribute to relevance and 0 if the aspect don’t contribute to relevance. Table 8 below gives the details for scoring the relevance as proposed by Ivarsson and Gorschek [74].

Table 5.8 Scoring rubrics for evaluating relevance [74]

Aspect Contribute to relevance (1) Do not contribute to relevance (0) Subjects The subjects used in the evaluation are The subjects used in the evaluation are representative of the intended users of the not representative of the envisioned users technology, i.e., industry professionals. of the technology (practitioners). Subjects included on this level is given below: • Students • Researchers • Subject not mentioned

Context The evaluation is performed in a setting The evaluation is performed in a representative of the intended usage setting, laboratory situation or other setting not i.e., industrial setting. representative of a real usage situation. Scale The scale of the applications used in the The evaluation is performed using evaluation is of realistic size, i.e., the applications of unrealistic size. applications are of industrial scale. Applications considered on this level is: • Down-scaled industrial • Toy example

Research Method The research method mentioned to be used The research method mentioned to be in the evaluation is one that facilitates used in the evaluation does not lend itself investigating real situations and that is to investigate real situations. Research relevant for practitioners. Research methods classified as not contributing to methods that are classified as contributing relevance are listed below: to relevance are listed below: • Conceptual analysis • Action research • Conceptual • Lessons learned analysis/mathematical • Case study • Laboratory experiment (human • Field study subject) • Interview • Laboratory experiment • Descriptive/exploratory survey (software) • Other • N/A

The relevance score is calculated by summing the context, research method, subjects and scale scores. Table 4.8 below illustrates the relevance scores for the studies retrieved by the search process.

Table 5.9 Relevance scoring of the studies retrieved from the SLR

Relevance Research Study Temp Citation Key Subjects Context Scale Score Method (Total) [1] Kelly and Shepard, 2004 1 1 1 1 4 [2] Abdelnabi et al., 2004 0 0 0 0 0 [3] (Wedyan et al., 2009) 0 0 1 1 2 [4] (Host and Johansson, 2000) 1 0 0 1 2 [5] (Porter et al., 1997) 1 1 1 0 3 [6] (Laitenberger et al., 2001) 1 0 0 0 1 [7] (Wilkerson et al., 2012) 0 0 0 0 0 [8] (Nagappan and Ball, 2005) 1 1 1 1 4 [9] (Dunsmore et al., 2000) 0 0 0 0 0 [10] (Dunsmore et al., 2003) 0 0 0 0 0 [11] (Rigby et al, 2008) 1 1 1 0 3 [12] (Seaman and Basili, 1997) 1 1 1 1 4 [13] (Liu et al., 2012) 0 0 0 0 0 [14] (Hayes et al, 2011) 1 0 0 0 1 [15] (Perry et al., 2002) 0 0 0 0 0 [16] Johnson et al., 2013 1 0 0 1 2 [17] (Sol et al, 2002) 0 0 0 0 0 [18] (Dunsmore et al, 2002) 0 0 0 0 0 [19] (Jalote and Haragopal, 1998) 1 1 1 0 3 [20] (wojcicki and Strooper, 2007) 0 0 1 0 1 [21] (Kienle et al, 2012) 0 1 1 1 3 [22] (Wanger et al, 2008) 1 1 1 1 4 [23] (Zheng et al, 2006) 1 1 1 1 4 [24] (Chimdyalwar, 2012) 1 1 1 0 3 [25] (Macdonald and Miller, 1998) 0 0 0 0 0 [26] Porter et al., 1997 1 1 1 0 3 [27] (Li, 1995) 0 0 0 1 1 [28] (Koneri et al., 2005) 1 0 0 1 2 [29] (Hirayama et al., 2006) 0 0 0 1 1 [30] (Nagoya et al., 2006) 0 0 0 1 1 [31] (Vetro et al., 2011) 0 0 0 0 0 [32] (Laitenberger and DeBaud, 1997) 1 0 0 0 1

[33] Runeson and Wohlin, 1998 0 0 0 0 0 [34] (Fehnker and Huuck, 2013) 0 1 1 0 2 [35] (Harel and Kantorowitz, 2005) 0 1 1 1 3 [36] (Wagner et al., 2005) 0 1 1 1 3 [37] (Grbac et al., 2012) 1 1 1 1 4 [38] (Manzoor et al., 2012) 0 0 0 0 0 [39] (Siy and Votta, 2001) 1 1 1 1 4 [40] (Belli & Crisan, 1997) 0 1 0 1 2 [41] (De Lucia et al, 2009) 0 0 0 0 0 [42] (Baca et al, 2008) 0 1 1 1 3 [43] (Knight & Myers, 1993) 1 0 0 0 1 [44] (Baca et al, 2009) 1 1 0 1 3 [45] (McMeekin et al, 2009) 1 1 0 0 2 [46] (Kester et al, 2010) 0 0 0 0 0 [47] (Land et al, 2000) 0 0 0 0 0 [48] (Baca et al, 2013) 1 1 1 1 4 [49] (Stalhane and Awan, 2005) 0 0 0 0 0 [50] (land et al, 1997) 0 0 0 0 0 [51] (Nelson and Schumann, 2004) 1 1 1 1 4 [52] (Russell, 1991) 1 1 1 0 3 [53] (Emanuelsson and Nilsson, 2008) 1 1 1 0 3 [54] (Austin et al., 2013) 0 0 0 1 1 [55] (Petersson & Wohlin) 0 0 0 0 0 [56] Marchenko and Abrahamsson, 2007 0 0 0 1 1 [57] (Berling and Thelin, 2003) 1 1 1 1 4 [58] (Lanubile and Mallardo, 2007) 0 0 0 0 0 [59] (Nadeem et al, 2012) 0 1 1 1 3 [60] (laitenberger, 1998) 0 0 0 0 0 [61] (Hemeury, 1999) 1 1 0 0 2 [62] (Gattis and Cheatham, 1999) 0 0 0 0 0 [63] (Denger and Kolb, 2006) 0 0 0 0 0 [64] (Nakamura et al, 2006) 0 0 0 1 1 [65] (Mantyla and Lassenius, 2009) 1 1 1 1 4 [66] (Johnson and Tjahjono, 1998) 0 0 0 0 0 [67] (Carlsson and Baca, 2005) 0 1 1 0 2 [68] (Briand et al, 2004) 0 1 1 1 3 [69] (Welller, 1993) 1 1 1 1 4 [70] (Rodgers and Dean, 1999) 1 1 1 1 4

5.1.7 Data extraction

The objective of performing data extraction is to record the information obtained in the primary studies. Data extraction forms are designed in a way to adequately address the research questions. To ensure the quality of the process the two authors first read and discuss the extraction process together, in the first ten studies the authors perform the process together to increase and unify the objectivity of the authors, and resolve their disagreements. Then the load is divided by the authors and each author performs the process individually, finally each author peer review the other authors work. A data extraction sheet is designed to accurately extract the relevant information necessary to answer the research questions. This is shown in table 4.9 below.

Table 5.10 Design of data extraction forms, attributes and their mapping to the research questions

Corresponding research Attributes and sub attributes to be extracted questions Meta information • Reviewer • RQ1 • Checker • Unique identifier • Title • Author(s) • Publication year • Publication venue • Citations • Research method Quality assessment, Aspects • Context description • RQ2 characterizing rigor and industrial • Study design description • RQ4 relevance of the studies • Validity discussion • Subjects • Context/setting • Scale • Research method

Static code analysis techniques • Inspections • RQ1 evaluated • Reviews • Walkthroughs • Reading techniques • Capture recapture techniques • Static analysis tools Results • Reported benefits • RQ3 • Reported limitations • Other Significant results

5.1.8 Data synthesis strategy

In data synthesis the evidence extracted from the studies is aggregated to draw the study conclusions. Two techniques will be used to perform data synthesis and aggregate the results: thematic analysis and descriptive synthesis. Thematic analysis is used to identify recurrent and common and re occurring patterns and themes among the data and summarize the findings under these thematic headings [89, 102]. Descriptive synthesis provides narrative description and ordering of primary evidence with commentary and interpretation. It is suitable for large evidence base, comprising diverse evidence types [89, 103]. Meta-analysis can’t be used because the studies identified by the systematic review contains both experimental and other non-experimental studies (case studies, surveys, interview) where their outcome variables are heterogeneous and no statistical data is available to calculate the effect size.

5.1.9 Validity threats

We identified some potential threats to the validity of the systematic literature review and the results drawn out of it. In this section we present these threats to the validity and the adopted mitigation strategies to counter these threats. These threats are structured following the Kitchenham guidelines for performing systematic literature reviews [87].

5.1.9.1 Bias

The results of the systematic literature review may be affected by the researchers’ bias [87]. We adopted Kappa analysis [101] to mitigate the validity threats related to the bias in interpretation of the review criteria. Total 70 primary studies were selected, and categorized based on the research method used and the static analysis technique evaluated in these studies. We sampled out 20 studies from the primary studies and both researchers individually categorized them based on the research methods used and the static analysis techniques evaluated in these studies. Kappa agreement level between both researchers was calculated statistically, which was found to be “Almost perfect” in case of the research method used in the selected studies and “Substantial” in case of the static analysis techniques evaluated in the selected studies (Section 5.1.5). In case of a disagreement, we managed to resolve it by mutual discussion and by discussing it with an expert researcher.

There was another threat of the publication bias. In some cases, influential organizations sponsor certain techniques, methods or tools to promote and to stop any negative research results against them [87]. In order to mitigate this threat, we didn’t limit our information sources to a certain journal, conference or a workshop. We expanded our search to all the available journals, conferences and workshops related to our topic area in five major databases, including ACM Digital Library, IEEE Xplore, Engineering Village (Inspec/Compendex), Science Direct (Elsevier), and ISI Web of Science. Some other databases like Kluwer Online, SpringerLink, Scopus, Wiley InterScience were not considered since studies [99, 100] reported that the research papers returned on these databases are also returned by either Engineering Village or ISI Web of Science (Section 5.1.3). Also, in order to keep the quality, we decided not to consider any grey literature, such as technical reports, unpublished research, work in progress or not peer-reviewed publications.

5.1.9.2 Internal validity

The internal threats to validity deal with the systematic errors in design of the study and the way it was conducted [87]. In order to deal with the threats to internal validity of the systematic literature review, a rigorous review protocol was established in advance and it was reviewed for its completeness and soundness by an expert researcher.

The keywords for the search string were carefully derived from the pool of 25 Quality Gold Standard (QGS) studies. The threat related to the performance of the search strategy and the threat of missing relevant articles pushed us to perform both manual and automated searches in order to capture as many as possible research papers relevant to our topic area. Using manual search, both researchers searched through relevant journals and conferences issue by issue and year by year. The process is rigorous but consumes a lot of time. Also, one may not know about all the publication venues risking to miss some important journals and conferences [98]. The automated search is fast but lacks rigor and depends solely on the quality of the search string [98]. There are no guidelines provided in the Kitchenham guidelines [87] to evaluate the performance of the search strategy [98]. Hence, we decided to follow the search strategy proposed by the Zhang et al. [98], as it combines both manual and automated searches and also

36 allows an evaluation of the search performance. The process is started from identifying the most relevant journals and conferences related to our topic area, and by identifying which databases are hosting them, as shown in Table 4.1. In the second step, these journals and conferences were manually searched through for finding out high quality articles to establish a Quasi Gold Standard (QGS). We identified 25 studies in the second step. In the third step, keywords were derived out by screening QGS studies to construct the search string. Then in step 4, the automated search is performed in the selected five databases using the search string. In step 5, the performance of the automated search is evaluated against the QGS established in step 2. The goal is to achieve the quasi-sensitivity, which is achieved when the automated search captures more than 80% of the studies of the QGS. Otherwise, the process is repeated from step 3 until the quasi-sensitivity is achieved. In our case while performing automated search, we successfully captured 21 out of 25 studies forming the QGS, and hence, fizzled out the threat of a non-performing search strategy. We also include these 4 uncaptured studies in the final pool of primary studies for data extraction.

There was a threat associated with the understandability of the data between the two researchers. It was decided to extract data from few studies together. After developing mutual understandability of what exactly need to be extracted from the studies, the work was divided among the two researchers and carried out individually, and later reviewed together for each and single study for both of the researchers. In case of any confusion, an expert advice was asked for.

In order to further enhance the quality of the search and minimizing the threat of missing relevant articles, backward snowballing was also performed which included browsing through the reference list of selected primary studies to identify further relevant studies [88]. It helped us to find 4 more studies.

The full-text for the 4 studies could not be found within the time allocated to the systematic literature review, but this can be neglected as the number of the missing studies is not very high.

5.1.9.3 External validity

The threats to the external validity of the research deals with the generalization of the conclusions of the research performed [87]. It cannot be guaranteed that we have captured all the material in our research area, but we have taken concrete steps to minimize the effect of this threat.

In order to further enhance the quality of the search and minimizing the threat of missing relevant articles, backward snowballing was also utilized which includes browsing through the reference list of selected primary studies to identify further relevant studies [88]. Total, 4 more studies found with the help of backward snowballing technique.

In order to ensure the generalizability of the results of the systematic review, we validated them against the findings of the survey.

5.2 Results 5.2.1 State of research in static code analysis techniques

This section gives an insight into static code analysis techniques research, and shows what kind of research practices have been conducted. Specifically, it reflects three findings, first, what static code analysis techniques has received most of the researcher’s attention, second for each static analysis technique what kind of research practices has been performed and third what variables have been used to evaluate static code analysis techniques.

5.2.1.1 Static code analysis techniques which received most attention in research

Figure 5.3 depicts the different static code analysis techniques identified through the systematic literature review. The table shows that Inspection is the technique that received most attention in research with 50 studies evaluating inspection. Static code analysis tools are the second technology that mostly received attention in research with 21 studies. Informal reviews come in the third rank with 4 studies. Walkthroughs is the technique which received less attention in research with only 1 study. Please note that, out of the total of 70 studies identified from the systematic literature review, some studies evaluate more than one technique and are shown in Table 5.11, thus the total number of studies in Figure 5.3 and Table 5.11 is greater than 70.

50 50

30 21 20

Numberof studies 10 4 1 0 Inspection Informal Walkthroughs Static Code Reviews Analysis Tools Static analysis techniques

Figure 5.3 Number of studies evaluating different static code analysis techniques

5.2.1.2 Type of research practices

This part represents the different research practices among the techniques shown in Figure 5.3. This SLR included only empirical evaluations of static code analysis techniques (this is considered in the inclusion criteria of the SLR), therefore, by research practices we mean the purpose of the empirical evaluation, we need to find out if the evaluation is performed to evaluate a specific technique, compare one technique to another, propose a new technique or evaluate a factor that influence a technique.

The research practices are shown in Table 5.11 below, for one technique many type of practices can exist like the case of inspection, also one study can evaluate inspection and static analysis tools at the same time. The total number of studies for each practice type is shown. Studies in red color denote that the study spans more than one practice type, for example studies [23, 61].

Table 5.11 Research practices used in research for evaluating different static analysis techniques

Total Technology Practice Type Corresponding Studies Studies Inspection Evaluation of Inspection [7, 11, 17, 19, 20, 23, 25, 21 37, 39, 41, 51, 52, 57, 58, 60, 61, 62, 63, 65, 68, 69] Evaluating factors that Team size [5, 49, 68] 3

influence inspection. Multiple sessions [5] 1

Procedural roles [47] 1 Experience of [49, 57, 61, 70] 4 participant

Group design [50] 1 Communication in [12] 1 inspection meeting

Process maturity [70] 1 Process environment [26] 1 Change in Inspection Phased inspection [43] 1

Structure Inspection without a [66] 1 meeting

Other inspection [1, 20, 27, 28, 29, 30, 41, 8 techniques 64] Support to structure Code Reading [2, 6, 9, 10, 13, 14, 18, 9 techniques 32, 45] Computer support to [15, 25, 40] 3 inspection process

Support to re- [33, 35, 55] 3 inspection Informal Reviews Evaluation of informal [4, 41, 61, 11] 4 reviews Walkthroughs Evaluation of [20] 1 walkthroughs Automated static Evaluation of automated [3, 8, 16, 21, 22, 23, 24, 21 code analysis tools static code analysis 31, 34, 36, 38, 42, 44, tools 46, 48, 53, 54, 56, 59, 61, 67]

For the studies evaluating Inspection four practice types are identified, the first practice type evaluated inspection solely as a code analysis technique. The second practice type evaluated the factors that influence the inspection process such as team size [5, 49, 68], number of sessions [5], use of procedural roles [5], experience of participants [49, 57, 61, 70], group design [50], communication in inspection meetings [12], process maturity [70] and process environment [26]. The third practice type evaluated the change to the inspections structure, it includes phased inspection [43], inspection without a meeting [66] and other inspection techniques [1, 20, 27, 28, 29, 30, 41, 64]. The fourth practice type evaluated the techniques that support the inspection structure, this includes reading techniques [2, 6, 9, 10, 13, 14, 18, 32, 45], computer support to the inspection process [15, 25, 40] and support for re inspection [33, 35, 55]. Figure 4.4 depicts the inspection practice types and figure 4.5 shows the number of studies for different practice types.

For the studies evaluating informal reviews, walkthroughs and static code analysis tools one practice type is identified, which is solely evaluating them as code analysis techniques.

5.2.1.3 Identifying variables used to investigate benefits and limitation of static code analysis technqiues

The authors analyzed the variables across the studies to identify common variables (themes) describing benefits and limitations among the data. The authors identified eight variables (patterns) or themes that were recurrent among data, they are shown in table 5.12, and for each theme the measuring criteria used to measure it among the studies are listed. The benefits and limitations of each technique is extracted with respect to these variables.

39 details in appendix A). The same was found for the other non-experimental studies the variables investigated were heterogeneous and the metrics used to measure them were not consistent enough. Some non-experimental studies did not reveal their variables and some did not reveal the metrics used to investigate the variables.

Table 5.12 Variables & measuring criteria used to investigate benefits & limitations of static analysis techniques

Variables/Themes/Patterns Definition Measuring Criteria Effectiveness The degree to which a static code • Number of defects detected analysis technique is successful in • Defects detection capability producing a desired result. • Number of false positives • Number of false negatives • Number of true positives vs. number of false positives • Ratio of defects detected to the total number of (known) defects • Defects reduction rate

Fault content Ability of a static code analysis • Defects detected specific to technique to detect different types testing of faults. • Defects detected by reviews • Domain specific defects • Logical bugs • Concurrency bugs • Duplicate defects • Security vulnerabilities • Delocalized defects • Implementation bugs • Defects related to missing lines of codes • Critical defects (more severe defects) • Implementation-related defects

Time efficiency Ability of a static code analysis • Time saving as a result of technique to accomplish a job early inspection with a minimum expenditure of • Time saving as a result of time. early tool usage • Elapsed time • Response time • Number of inspection meetings reduced.

Effort efficiency Ability of a static code analysis • Number of man hours technique to accomplish a job • Effort saving as a result of with a minimum expenditure of early inspection effort. • Effort saving as a result of early tool usage • Human dependency • Number of people required • Number of defects detected per invested effort • Effort spent on training

Cost effectiveness Economical in terms of results • Cost saving as a result of received for the money spent on a early inspection static code analysis technique. • Cost saving as a result of early tool usage • Implementation cost • Cost per detected defect

Internal code quality The quality of the source code • Code maintainability under analysis in terms of • Code readability readability, maintainability and • Code understandability participants’ understandability. • Code refactoring

External quality The quality of the final product in Number of defects reduced as a of terms of defects reduced because Early Verification (EV) of early verification.

5.2.2 State of rigor and relevance in static code analysis research

After measuring the rigor and relevance of the SLR in section 5.1.6 using scoring rubrics in study [74], this section reflects the state of rigor and relevance in static code analysis research. Knowing the state of rigor and relevance in static code analysis research will tell to which extent the reported benefits and limitations, in the studies, are rigorous and relevant to industrial practitioners.

Figure 5.4 illustrates the rigor and relevance scores of the studies presented in a bubble chart. The rigor is plotted in the x-axis and the relevance is plotted in the y-axis. The highest relevance score is 4 and the highest rigor score is 3. The bubble size is proportional to the number of studies corresponding to the rigor and relevance on x and y-axis respectively.

Figure 5.4 Bubble chart showing the number of studies based on their rigor and relevance score

Figure 5.4 illustrates the rigor and relevance scores of the studies presented in a bubble chart. The rigor is plotted in the X-axis and the relevance is plotted in the Y-axis. The highest relevance score is 4 and the highest rigor score is 3. The bubble size is proportional to the number of studies corresponding to the X-Y-axis.

By looking at Figure 5.4 we can tell if studies with high rigor usually have high relevance scores and vice versa. A total of 15 studies have the highest rigor score of three, 11 of the 15 studies have a low relevance score between zero and one. On the other hand 13 studies have the highest relevance score of four, 8 of the 13 studies have high rigor scores between two and three.

Here we can conclude that in static code analysis research, most of the technology evaluations conducted with a high degree or rigor doesn’t necessarily have a good ability to impact industry, while most of the technology evaluations conducted in an industrial setting (high relevance) usually have a high degree of rigor.

Figure 5.5 represents the rigor and relevance with respect to four categories, high-rigor high-relevance (category A), high-rigor low-relevance (category B1), low-rigor high-relevance (category B2) and low- rigor low-relevance (category C). The rigor is plotted in the X-axis and the relevance is plotted in the Y-axis. The rigor score is considered high if it’s greater than 2, the relevance is considered high score if it’s greater than 3.

Figure 5.5 Bubble chart showing the number of studies based on their rigor and relevance score

Figure 5.5 show that 11 studies out of 70, comprising 15.71% Percent of the total studies falls in Category A. 16 studies out of 70, comprising 22.86% Percent of the total studies falls in Category B1. 25 studies out of 70, comprising 35.71% Percent of the total studies falls in Category B2. 18 studies out of 70, comprising 25.71% Percent of the total studies falls in Category C.

Only 11 studies (15.71 %) falls in category A. Only this number of studies can act as a solid empirical basis for researchers. This is disappointing from a technology transfer point of view as the rest of the studies have less potential for actually influencing practice.

Looking at Figure 5.5 we can see that more than 61% of the studies falls in the lower part of the figure with low relevance scores, we conclude that more than half of the studies are fair poorly relevant for static code analysis industry when evaluated based on Ivarsson and Gorscheck criteria [74]. This is disappointing from a technology transfer point of view, as these evaluations have less potential for actually influencing practice [74].

Looking at Figure 5.5 we can also see that more than 48% of the studies falls in the left part of the figure, here we conclude that almost half of the studies are poorly rigorous for static code analysis research, this is a disappointing form research point of view as these studies it hinders the progress of research.

Having these results we can see there is a need to improve the rigor and relevance of research. The rigor of studies needs to be improved so they can act as solid empirical basis, while the relevance of the studies need to be improved to improve the ability of research to impact industry. To do this we need to look at the individual aspects that define rigor and relevance in study [74] which are summarized in Table 5.6 and Table 5.8, against the actual rigor and relevance scores for individual studies presented in section 5.1.6.

Four aspects define relevance: research method, context, subjects and scale, table 4.8 represents the relevance score for individual studies. We will analyze each relevance aspect to identify which relevance aspects need to be improved.

First, we will look into the research methods and context aspects. Experiments has been used as the research method in 31 studies comprising 44% of the research methods used in the studies, also 38 studies including most of the experiments are performed in academic context using students as subjects. Since Ivarsson and Gorscheck [74] give experiments and academic contexts a relevance score of zero, we identified this as the major factor that degrades the relevance of research.

Second, we look into the subjects and scale, regarding subjects 40 studies out of 70 comprising 57% used researcher or students as subjects, regarding scale 40 studies out of 70 comprising 57% used down sized or toy examples. Since Ivarsson and Gorscheck [74] give students and researcher subjects as well as down sized scales a zero relevance score, they contributed to degrading relevance.

Three aspects define rigor: context, study design and validity threats, referring to Table 5.7 the average context score is 0.7 out of a maximum of 1, the same goes for the study design, while the average score for the validity threats is 0.5 which contributed significantly in degrading rigor. If the average rigor scores are summed together it will result in 1.9 total average rigor, which is still considered low as high rigor scores starts from two [74]. This indicates a need for improving the rigor.

5.2.2.1 Influence of time on rigor and relevance

Figure 5.6 represent the rigor and relevance variables over time. The average rigor and relevance value per year is used, the highest score for rigor is 3 and for relevance 4. The total number of studies per year

43 is shown as well. The average rigor, relevance and total number of papers are plotted in the Y-axis, while the years are plotted in the X-axis.

For rigor, before 2000 the rigor values fluctuate between .5 and 2.5. The 3 least scores occurred at this period 1991, 1995 and 1999. After the year 2000 the rigor never fall less than 1.5 except in 2008, also in this period 2 peaks are reached in 2001 and 2011. This indicates an improvement in rigor in the 20th century.

For relevance, the value of relevance keeps fluctuating for all the time period between different values ranging from 0 to around 3.5. The peak is reached at 2008, while 0 values are achieved twice at 2002 and 2010. This indicates that there is no noticeable improvement of relevance over time, however for the whole time period certain years have high relevance scores and certain years do not.

3.5 8 3.0 7 2.5 6 5 2.0 4 1.5 3 1.0 2

0.5 1 Numberof studies 0.0 0 Average rigor Averagerelavanceand 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Publishing year

Number of Papers Average Rigor Average Relevance

Figure 5.6 Average rigor and relevance over time

5.2.3 Benefits and limitations of static analysis techniques reported by researchers

This section aggregates the benefits and limitation of different static code analysis techniques. The rigor and relevance of the studies reporting the claimed benefits and limitations will be taken into consideration. As motivated in section 5.1.8 in thematic analysis will be used to identify common variables across the studies to help synthesizing and aggregating the benefits and limitations of the four static code analysis techniques.

In section 5.2.1.3 the authors analyzed the data and identified common variables across the studies along with their measuring criteria, they are shown in table 5.12. The benefits and limitations of each technique will be identified with respect to these variables.

After identifying the common variables, the conclusions of the benefits and limitations are drawn for each technique using vote counting, which is simply comparing the number of positive studies (studies showing benefit) with the number of negative studies (studies showing limitations) for each variable and for each one of the four static code analysis techniques, vote counting has been used because it’s the simplest method for synthesizing evidence from multiple evaluations [99].

First the vote counting will be done on the total number of studies disregarding the rigor and relevance of the studies, then the studies will be divided into the four rigor and relevance category and the vote counting will be done on each category individually.

5.2.3.1 Inspection A total of 21 studies have been found in inspection category as shown in Table 5.11. Table 5.13 shows the results of the studies in inspection category with respect to the outcome variables considered by the studies in this category. For each study, it shows one of three, either the study reports a benefit, or a limitation, or a “no difference” which is not considered a benefit or a limitation.

Table 5.13 Distribution of studies evaluating inspection technique

Variables Benefits Limitations No difference Absolute effectiveness [51] [69] Relative effectiveness [7, 19, 52, 60, 61, 62, [17] [23, 25, 41] 63] Time efficiency [17] [11, 19, 20, 41] Effort efficiency [52] [19, 68] Cost effectiveness [17, 37, 52] [7, 19] [23] Internal code quality [39, 58, 61] External quality [37] Fault content [19, 52, 65] [17, 51, 57, 60, 63] Total 20 15 4

In Table 5.13, With respect to effectiveness, we have a total of 13 studies, eight of them [51, 7, 19, 52, 60, 61, 62, 63] are in favor of inspection effectiveness, two of them [17, 69] are not in favor of inspection effectiveness, and three studies [23, 25, 41] reports a no difference. Now we will report the findings of these studies in the next three paragraphs.

Study [51] reports that code reviews are good at catching defects, studies [19, 52] report that inspection defects more defects than testing. Study [7] reports that inspection is more effective at defects reduction than test-driven development (TDD). Studies [60, 63] reports that inspection is significantly more effective than testing. Studies [61, 62] investigated the defect detection of inspection relative to less formal code review and other testing techniques, and found out to be positive at performance.

Study [69] reports that code inspections catch only a fewer number of defects if performed after unit testing. Study [17] reports that inspection detects fewer defects as compared to testing.

Study [23] reports that the defect removal efficiency of inspections and static analysis tools is not significantly different. Studies [25, 41] report that there is no significant difference between defect detection of formal inspection and other inspection techniques.

In table 5.13, With respect to time efficiency, we have a total of five studies, four of them [11, 19, 20, 41] are in favor of inspection´s time efficiency, study [17] are not in favor of inspection`s time efficiency. Now we will report the findings of these studies in the next paragraph.

Studies [11, 19] report that inspections are not time efficient, they consume more time as compared to peer review techniques. Studies [20, 41] report that inspection is not time efficient and consumes more time. Study [17] reports that inspection is time efficient in terms of number of man hours required and thus cost effective also.

In Table 5.13, With respect to effort efficiency, we have a total of three studies, one study [52] are in favor of inspection´s effort efficiency, two studies [19, 68] are not in favor of inspection`s effort efficiency. Now we will report the findings of these studies in the next paragraph.

Study [52] reports that inspection saves effort by detecting the defects early in the development lifecycle, but on the other hand studies [19, 68] report that inspection is not really efficient at effort consumption.

In Table 5.13, With respect to cost effectiveness, we have a total of six studies, three of them [17, 37, 52] are in favor of inspection`s cost effectiveness, two of them [7, 19] are not in favor of inspection`s cost effectiveness, and one study [23] reports a no difference in inspection`s effectiveness compared to other peer techniques. Now we will report the findings of these studies in the next three paragraphs.

Study [37] reports that the inspection increased the cost-benefits when introducing early in the verification process, though the increase was not significant. Study [52] reports that inspection helps to reduce the overall development cost by detecting the defects early in the development lifecycle. Study [17] reports that inspection is time efficient in terms of number of man hours required and thus cost effective also.

Study [19] reports that inspections are costly as compared to unit testing. Study [7] investigated the cost effectiveness of inspection as compared to TDD, and found it to be more expensive to implement. Study [23] reports a no difference between the costs per detected defect for inspections and static analysis tools. In Table 5.13, With respect to internal code quality, we have a total of three studies [39, 58, 61] all of them are in favor of the argument that inspection improves the internal code quality. Study [39] reports that the inspections improved the readability and maintainability of the code. Studies [58, 61] reports that, inspection delivers improved code quality.

In Table 5.13, With respect to external code quality, we have a total of one study [37] in favor of the argument that inspection improves the external code quality. Study [37] reports a significant increase in the product quality when using inspection to review the code.

In Table 5.13, With respect to fault content, we have a total of eight studies, three studies [19, 52, 65] are in favor of the argument that inspection is capable of detecting various kinds of defects, five studies [17, 51, 57, 60, 63] are not in favor of that argument. Now we will report the findings of these studies in the next two paragraphs.

Study [65] reports that the inspection is good at detecting code evolvability defects which cannot be found in later phases of testing, because they do not affect the software’s visible functionality. Studies [19, 52] reports that inspection can capture more types of defects and can also detect those defects which are harder to catch in testing.

Studies [17, 57] reports that inspection cannot capture those defects which can be captured by testing. The study [51] reports that race conditions and other concurrency defects are harder to catch using manual code reviews such as inspection. Studies [60, 63] report that inspection cannot catch the defects which can be caught by other techniques.

Conclusions regardless of rigor and relevance: Based on vote counting the studies showed positive findings for inspection’s effectiveness, internal code quality and cost effectiveness. On the other hand the studies showed negative findings for time efficiency, effort efficiency and fault content. For external code quality the total number of studies are very low, only one study showing a positive finding, which means the conclusions for external code quality is stands on a weak foundation and more studies are needed to come to a more sound conclusions.

After drawing the conclusions disregarding the rigor and relevance in the previous section, the conclusions will be drawn for each rigor and relevance category.

Category A – High Rigor and High Relevance Studies

In category A, only 4 studies have been identified. Three of them used case studies and one used a survey as their research methods. Table 5.14 below shows the results of the vote counting for studies in category A with respect to the outcome variables considered by the studies in this category. The detailed findings of the studies are already shown earlier in this section.

Table 5.14 Outcome of studies (Category A) with respect to variables in inspection

Variables Benefits Limitations No difference Relative effectiveness [23] Cost effectiveness [37] [23] Internal code quality [39] External quality [37] Fault content [65] Total 5 0 2

Conclusions for category A studies: Based on vote counting the results showed positive findings for cost effectiveness, internal code quality, external code quality and fault content. However, the number of studies is very low (almost only one per variable in this category) which means that the overall conclusion stands on a weak foundation, many more studies in category A are needed to come to more sound conclusions. This is also a disappointing finding because the studies in this category are the studies of high quality and the best to draw conclusions from.

Category B1 – Low Rigor and High Relevance Studies

In category B1, only seven studies have been found. Three of them used the case studies, 1 study used the experiment, and 1 study used a survey as the research methods. While two of the studies didn’t mention their research methods. Table 5.15 below shows the results of the vote counting.

Table 5.15 Outcome of studies (Category B1) with respect to variables in inspection

Variables Benefits Limitations No difference Absolute effectiveness [51] [69] Relative effectiveness [19, 52] Time efficiency [11, 19] Effort efficiency [52] [19, 68] Cost effectiveness [52] [19] Fault content [19, 52] [51, 57] Total 7 8 0

Conclusion for category B1 studies: Based on vote counting the studies showed positive findings for effectiveness. On the other hand, it showed negative findings for time efficiency and effort efficiency. For fault content and cost effectiveness the results are inconclusive.

Category B2 – High Rigor and Low Relevance Studies

This category consists of only 6 studies, and all of them have used experiments as research methods. Table 5.16 below shows the results of the vote counting for studies in this category.

Table 5.16 Outcome of studies (Category B2) with respect to variables in inspection

Variables Benefits Limitations No difference Relative effectiveness [7, 60, 63] [25, 41] Time efficiency [20, 41] Cost effective [7] Fault content [60, 63] Total 3 5 2

Conclusion for category B2 studies: based on vote counting, the studies showed positive results for effectiveness, negative results for time efficiency and fault contents and cost effectiveness. But for cost effectiveness we have only one study, thus the conclusions are based on a weak foundation.

Category C – Low Rigor and Low Relevance Studies

Only 4 studies have been found in this category, 3 experimental and 1 didn’t mention the research method used. Table 5.17 shows the vote counting for category C.

Table 5.17 Outcome of studies (Category C) with respect to variables in inspection

Variables Benefits Limitations No difference Relative effectiveness [61, 62] [17] Time efficiency [17] Cost effectiveness [17] Internal code quality [58, 61] Fault content [17] Total 6 2 0

Conclusions for category C studies: based on vote counting, the studies showed positive results for effectiveness, time efficiency, cost effectiveness and internal code quality, and showed negative findings for fault contents.

5.2.3.2 Static analysis tools

A total of 21 studies have been found in static analysis tools category as shown in Table 5.11. Table 5.18 shows the results of the studies in the static analysis tools category with respect to the outcome variables considered by the studies in this category. For each study it shows one of three, either the study reports a benefit, or a limitation, or a “no difference” which is not considered a benefit or a limitation.

In Table 5.18, with respect to effectiveness we have a total of 17 studies, only five of them [42, 53, 38, 56, 61] are in favor of static analysis tools effectiveness. The majority of eleven studies [3, 16, 21, 22, 31, 46, 48, 59, 67, 36, 54] are not in favour of the effectiveness of static analysis tools, while one study [23] reports there is no difference between the effectiveness of static analysis tools and other peer techniques. In the following three paragraphs we will report the findings of these studies.

Table 5.18 Distribution of studies evaluating static analysis tools among themes

Variables Benefits Limitations No difference Absolute effectiveness [42, 53] [3, 16, 21, 22, 31, 46, 48, 59, 67] Relative effectiveness [38, 56, 61] [36, 54] [23] Time efficiency [16, 38, 53, 61, 67] [3, 36, 48, 54] Effort efficiency [16, 22, 36, 38] Cost effectiveness [38, 42, 67] [3, 36, 48] [23] Internal code quality [3] Tool quality [53] [3, 16, 36] Fault content [23, 36, 38, 48, 54, 61] [22, 24, 46, 53, 67] Total 25 26 2

Studies [42, 53] report that static analysis tools are effective at detecting hard to find defects and the faults that can propagate into full vulnerabilities. Study [38] evaluated 3 different static analysis tools tor finding concurrency bugs, and found out that CheckThread has a high percentage of finding defects with less number false positives as compared to other 2 tools. Study [56] reports that CodeScanner is good in terms of defect detection rate. Study [61] reports that static analysis tools are more effective than code reviews.

On the other hand, studies [21, 31, 36, 22, 48, 59] report that static analysis tools generate a high number of false positives. Studies [3, 16, 54] report that the ratio of false positives is too high as compared to the true defects using different static analysis tools. Same goes with the identification of refactoring issues. Study [67] investigated that static analysis tools have false negatives (issues which are defects actually but not reported). Study [46] concluded that tools individually produce a large number of false positives, combination of multiple static analysis techniques have been suggested in order to reduce the number of false positives.

Study [23] reports that defect removal efficiency of SATs is not significantly different than that of the inspections.

In Table 5.18, with respect to time efficiency we have a total of nine studies, five of them [16, 38, 53, 61, 67] are in favor of time efficiency of static analysis tools, while four of them [3, 36, 48, 54] are not in favor of this argument. In the following two paragraphs we will report the findings of these studies.

Study [16] says that static analysis tools are time efficient as opposed to the more time involved in manually searching for defects. Study [38] report that static analysis tools can save effort in terms of time and cost if introduced early in the code analysis process. Study [61] reports that SATs are faster than code reviews and thus can save time. Study [67] reports that the tools can save time and thus save the cost of later work. Study [53] reports that it is possible to analyze million lines of code using static analysis tools in a short time.

Study [36] reports that SATs generate a high number of false positives, thus it demands more time to fix them, and thus the cost to resolve them become higher than the cost saved by the automation. Something similar is reported by Study [48] saying that static analysis tools often incorrectly report a fault, and fixing that fault could results into introducing more faults which need more time to fix them and thus increase the cost. Studies [3, 54] state that large number of false positives generated by the tools need to the pruned which is a very time consuming process and surely not cost effective.

In Table 5.18, with respect to effort efficiency we have a total of four studies [16, 22, 36, 38] all of them are in favor of effort efficiency of static analysis tools. Study [22] reports that static analysis tools can find error prone portions of the code and helps in effort saving later in inspection and testing. Study [36] reports that it is beneficial to first use a static analysis tool before inspecting the code, so that tool can remove some of the defects earlier and hence save some effort. Study [16] says that static analysis tools are effort efficient as opposed to the more effort involved in manually searching for defects. Study [38] report that static analysis tools can save effort if introduced early in the process.

In Table 5.18, with respect to cost effectiveness we have a total of seven studies, three of them [38, 42, 67] are in favor of the cost effectiveness of static analysis tools. Three of them [3, 36, 48] are not in favor and one study [23] states a no difference in the cost effectiveness of static analysis tools as compared to peer techniques. In the following three paragraphs we will report the findings of these studies.

Studies [38, 67] reports that the time saved by the use of static analysis tools save cost as well. Study [42] reports that static analysis tools can help in cost savings on an average of 17%.

Study [36] reports that SATs generate a high number of false positives, thus it demands more time to fix them, and thus the cost to resolve them become higher than the cost saved by the automation. Something similar is reported by Study [48] saying that static analysis tools often incorrectly report a fault, and fixing that fault could results into introducing more faults which need more time to fix them and thus increase the cost. Study [3] state that large number of false positives generated by the tools need to the pruned which is a very time consuming process and surely not cost effective.

The study [23] talks about the cost per detected defect using SATs and inspections which is same for both.

In Table 5.18, with respect to internal code quality we have only one study. Study [3] report that static analysis tools can help to identify unused data and code, also it can help finding program structures that are candidates for refactoring.

In Table 5.18, with respect to tool quality we have a total of four studies. One of them [53] are in favor of the high quality of static analysis tools, while three studies [3, 16, 36] are not in favor of that argument. In the following paragraph we will report the findings of these studies.

Study [16] reports a negative finding on static analysis tools usability, they can be easily used and integrated into development environments, but many tools don’t present results in a way that gives enough information to the developer to assess the problem, the reason behind the problem, and what steps should be taken to resolve these problems. Study [3] report that most of the tools lack in providing quick fixes or code suggestions. Study [36] report that the tools accuracy (in terms of real defects vs. false positives) and efficiency varied among different tools and also varied from project to project and the sensitivity of the tools vary greatly depending on the bug patterns they possess. Study [53] reports a positive finding on static code analysis tools usability.

In Table 5.18, with respect to fault content we have a total of eleven studies. six of them [23, 36, 38, 48, 54, 61] are in favor of the are in favor of the argument that static analysis tools have the ability to detect various types of defects, while five studies [22, 24, 46, 53, 67] are not in favor of that argument. In the following two paragraphs we will report the findings of these studies.

Study [23] reports that static analysis tools are good at finding security vulnerabilities and also capable of identifying defect prone modules of the code prior to testing. Studies [36, 48] reports that static analysis tools are good at catching more severe defects. Study [54] reported that static analysis tools found greatest variety of vulnerabilities and greatest number of implementation bugs while comparing with other testing techniques. Study [38] reports that static analysis tools are great help in finding concurrency bugs, which is extremely hard doing manually. Study [61] finds out that SATs are good at detecting more major defects than the code reviews.

Study [46] reports that tools ability of catch concurrency bugs is limited. Study [67] found out that tools are not effective in finding security vulnerabilities. Study [22] reports that static analysis tools are not good at capturing field defects. Studies [24, 53] report that static analysis tools are not precise when it comes to large scale applications and that defect detection totally depend on the bug patterns of the tools.

Conclusions not considering rigor and relevance: Based on vote counting the studies showed positive findings for static analysis tools time efficiency, effort efficiency and fault content. For cost effectiveness the results are inconclusive. For internal code quality there is only one study reporting a positive finding, but the finding stands on a weak foundation. For effectiveness the studies showed a negative finding, the super majority of the studies states that the higher number of false positives produced by static analysis tools is the major factor that limit its effectiveness. The studies also showed a negative finding for static analysis tools quality in terms of usability. It also worth mentioning that no study evaluated the effect of static analysis tools on external code quality.

Category A – High Rigor and High Relevance Studies

In category A, only 13 studies have been found, and all of them used case studies as the research method. The table 4.19 shows the results of the vote counting for studies in category A with respect to the outcome variables considered by the studies in this category. The outcome variables are classified into benefits, limitations, and no difference.

Table 5.19 Outcome of studies (Category A) with respect to variables in static analysis tools

Variables Benefits Limitations No difference Absolute effectiveness [48] Relative effectiveness [36] [23] Time efficiency [36, 48] Effort efficiency [36] Cost effectiveness [36, 48] [23] Tools quality [36] Fault content [23, 36, 48] Total 4 7 2

Conclusion for category A studies: the studies showed negative findings for effectiveness, time efficiency and cost effectiveness. For fault content the studies showed positive findings. For effort efficiency only one study showed positive finding which means the conclusions stands on a weak foundation, the same goes for tool quality where there is only one study reporting a limitation. Overall the number of studies in this category is low which means the overall conclusions stand on a weak foundation, this is disappointing form a technology transfer point of view [74].

Category B1 – Low Rigor and High Relevance Studies

In this category, 6 studies have been identified, 4 of them have used case studies and 2 of them didn’t mention the research methods used. Table 5.20 below shows the results of the vote counting for studies belonging to category B1.

Table 5.20 Outcome of studies (Category B1) with respect to variables in static analysis tools

Variables Benefits Limitations No difference Absolute effectiveness [42, 53] [21, 22, 59] Time efficiency [53] Effort efficiency [22] Cost effectiveness [42] Tool quality [53] Fault content [22, 24, 53] Total 6 6 0

Conclusion for category B1 studies: the studies showed negative findings for effectiveness and fault contents, for all the other variables there is a very less number of studies, one for each variable, which means no solid conclusions can be drawn for these variables in this category.

Category B2 – High Rigor and Low Relevance Studies

Only 4 studies have been categorized in this category B2. Two studies used case studies, 1 study used interviews and 1 study used experiment as research methods. Table 4.21 below shows the results of vote counting for studies belongs to this category.

Table 5.5.21 Outcome of studies (Category B2) with respect to variables in static analysis tools

Variables Benefits Limitations No difference Absolute effectiveness [3, 16, 31] Relative effectiveness [54] Time efficiency [16] [3, 54] Effort efficiency [16] Cost effectiveness [3] Internal code quality [3] Tool quality [3, 16] Fault content [54] Total 4 9 0

Conclusion for category B2 studies: The studies showed negative findings for effectiveness and tool quality, and time efficiency. For internal code quality, fault content, cost effectiveness there number of studies for each variable is too low, although the studies showed positive findings for effort efficiency, internal code quality and fault content and negative findings on cost effectiveness, no solid conclusions can be drawn for these variables in this category and the conclusions for these variables stands on a weak foundation.

Category C – Low Rigor and Low Relevance Studies

The total 6 studies have been found in this category. 3 of them used experiments, 1 study used case study and 2 of them didn’t mention the research methods used. Table 5.22 below shows the results of vote counting for studies belongs to this category.

Table 5.22 Outcome of studies (Category C) with respect to variables in static analysis tools

Variables Benefits Limitations No difference Absolute effectiveness [46, 67] Relative effectiveness [38, 56, 61] Time efficiency [38, 61, 67] Effort efficiency [38] Cost effectiveness [38, 67] Fault content [38, 61] [46, 67] Total 11 4 0

Conclusions for category C study: the studies showed positive findings for effectiveness, time efficiency and cost effectiveness, for fault contents the results are inconclusive, for effort efficiency only one study showed a positive finding, no solid conclusions can be drawn for this variable in this category and the conclusions for this variable stands on a weak foundation.

5.2.3.3 Informal reviews

As shown in Table 5.11 we have only four studies evaluating informal reviews [4, 11, 41, 61].

With respect to time efficiency, Studies [11, 41] reported that informal code reviews are time efficient compared to inspection. While study [61] reported they are not time efficient compared to static code analysis tools.

With respect to fault content, study [4] reported that informal code reviews can detect defects that are detected by testing while study [61] reports that informal code reviews are not capable of detecting different type of defects.

With respect to cost effectiveness, study [4] reported that informal code reviews save cost by detecting defects earlier.

Studies [4, 41] falls in category B2. Study [11] falls in Category B1. Study [61] falls in category C.

Conclusions: the studies in category B1 and B2showed positive findings for time efficiency and cost effectiveness. For fault contents the results are inconclusive. Also it worth mentioning that the number of studies are very low, which means the conclusions stand son a weak foundation, and more studies evaluating informal reviews are needed to come to more sound conclusions.

5.2.3.4 Walkthroughs

We have only one study [20] evaluating walkthroughs against static analysis tools and inspection, the study reported that walkthroughs are not effective compared to inspections and static code analysis tools. Study [20] falls in category B1.

5.2.4 Benefits and limitations related with different variations in inspection

Since inspection is a process, some studies investigated the process elements such as the factors that influence the inspection process, the change to inspection structure and the support to inspection structure as shown in table 5.11. In this section their benefits and limitations will be reported.

5.2.4.1 Factors that influence inspection process

5.2.4.1.1 Team size There are total number of three studies [5, 49, 68] evaluating the effect of increasing the team size on the inspection effectiveness and effort. Studies [49, 68] reported that effectiveness is increased when team size is increased however study [5] reports that increasing the team size doesn’t have an effect on inspection effectiveness. Study [5] further investigated the impact of reducing team size on the effort and reported that reducing team size significantly reduces the effort without reducing the effectiveness. Studies [49, 68] used the number of defects detected to measure the effectiveness while study [5] used the number of defects found to total number of defects. Study [5] falls in the high rigor high relevance category with both rigor and relevance score of 3. Study [49] falls in the category high rigor low relevance with a rigor score of 2 and a relevance score of 0. Study [68] falls in the low rigor high relevance category with a rigor score 1.5 and a relevance score of 3. In conclusion there is no enough evidence to argue that the increase in the number of participants improve the inspection effectiveness, 2 studies [49, 68] are in favor of this argument while one study [5] with a high rigor and relevance score contradicts. The same goes for the argument that reducing the number of participants will significantly reduce the effort spent on inspection as only one study [5] support this argument.

5.2.4.1.2 Number of sessions One study [5] evaluates the effect of using multiple sessions in inspections on the inspection effectiveness. The study reports that inspection interval and effectiveness of defect detection were not significantly affected by number of sessions (single versus multiple) and multiple session inspections may not be worth their extra effort. Although study [5] has a rigor and relevance score of 3 and falls in the high rigor high relevance category the finding of a single study is not a strong evidence to draw solid conclusion.

5.2.4.1.3 Use of procedural roles One study [47] evaluates the use of procedural roles on inspection effectiveness, the study reports that the use of procedural roles has a positive impact on the inspection effectiveness. The total number of detected defects was used to measure the effectiveness. Study [47] falls in the category high rigor low relevance with a rigor score of 2 and a relevance score of 0. The evidence is not strong enough to generalize the findings of the study.

5.2.4.1.4 Experience of participants The studies [49, 57, 61, 70], evaluates the impact of participants experience on inspection effectiveness. All studies reported that inspectors experience improve the inspection effectiveness. Study [49] falls in the high rigor low relevance category with a rigor score of 2 and a relevance score of 0, studies [57, 70] falls in the low rigor high relevance category with rigor scores of 1, 1.5 and relevance scores of 4 respectively, study [61] falls in the low rigor low relevance category with a rigor score of 1 and a relevance score of 2. We can conclude that having experienced participants in the inspection process improves inspection effectiveness.

5.2.4.1.5 Group design One study [50] evaluates the impact of group design on inspection effectiveness. The study evaluates the impact of using interacting groups in inspections sessions on the inspection effectiveness. The study used number of false positives to measure the effectiveness and reports that using interacting groups in inspection sessions improve inspection effectiveness. The study falls in the high rigor low relevance category with a rigor score of 2.5 and a relevance score of zero. However, the findings of one study is not enough to draw solid conclusions.

5.2.4.1.6 Communication in inspection meetings One study [12] evaluates the effect of communication during inspection meetings on inspection interval, the study reports that during inspection meetings the time spent discussing defects and the time spent discussing global issues lengthens the meeting and inspection interval. Study [12] falls in the category high rigor high relevance with a rigor score of 2 and a relevance score of 4. Having one study is not a strong evidence draw solid conclusions.

5.2.4.1.7 Process environment One study [26] evaluates the impact of process environment on inspection´s interval and reports that process environment lengthens inspection interval, Since some inspection tasks have very low priority when a developer’s workload is high, low priority tasks are deferred. The study falls in the high rigor high relevance category with a rigor score of 2 and a relevance score of 3. More studies are needed to substantiate the findings.

5.2.4.1.8 Process maturity One study [70] evaluates the impact of the inspection process maturity on inspection effectiveness and reports that process maturity improves the inspection effectiveness. Study [70] fall in the low rigor high relevance category with rigor score of 1.5 and relevance scores of 4. However it’s hard to draw solid conclusions from one study.

5.2.4.2 Changes to inspection structure

5.2.4.2.1 Phased inspection Study [43] evaluated a new inspection technique called phased inspection that utilize some of the ideas in active design reviews, Fagan inspection and N-fold inspection are adopted for phased inspection. In this approach the software product is reviewed in a series of partial inspections called phases. The study reported that the new inspection technique is effective at finding defects and improve the individual inspector’s confidence. Study [43] falls in the low rigor low relevance category with a rigor score of 1.5 and relevance score of 1. Given the fact that only one study evaluated this technique and the study trustworthiness is low in terms of rigor and relevance the evidence is not enough to draw solid conclusions and generalize the findings for benefits reported in this category.

5.2.4.2.2 Inspection without a meeting This category evaluates the effect of meeting versus non meeting inspections. One study [66] falls in this category and reported that the meeting-based review required more total effort and more effort per defect and did not find significantly more defects than the non-meeting-based method. On the other hand the meeting-based review method is significantly better at reducing the level of false positives and the subjects subjectively preferred meeting-based review over non-meeting-based review. Study [66] falls in the low rigor high relevance category with a rigor score of 1.5 and a relevance score of 3.

Although the study has a high relevance score, the evidence is not strong enough to generalize the findings of the study, more evaluation is needed.

5.2.4.2.3 Other inspection techniques This category evaluates new inspection techniques other than Fagan inspection, phased inspection and inspection without meeting. 8 studies [1, 20, 27, 28, 29, 30, 41, 64] fall in this category.

Study [1] evaluates an inspection process called Task-directed inspection (TDI), this technique is based on combining inspection with other software development tasks. It is a lighter weight process, and varies from a Fagan-style inspection by increasing the emphasis on the work of individual inspectors, suppressing Fagan-style inspection meetings, introducing a coordinator role, and modifying well established processes as little as possible when introducing inspection. The study report that this new technique improve the process quality, effort and time efficiency as compared to Fagan inspection. Study [1] falls in the high rigor high relevance category with a rigor score of 2.5 and relevance score of 4.

Study[20] evaluates a new inspection technique that combine manual inspection and 2 static code analysis tools namely Findbugs and PClint, the study repot that this combining inspection and the tools improve inspection effectiveness compared to when Inspection and the tools are used separately. However the defect detection rate of the technique is not different from the inspection and the tools are used separately. Study [20] falls in the high rigor low relevance category with a rigor score of 2.5 and relevance score of 1.

Study [27] evaluated a new approach for inspection, named as comparison-based approach, which has been designed to effectively catch certain types of program faults that are often domain-specific and difficult to catch by existing inspection techniques. The study report that this approach improves inspection effectiveness and reduces inspection effort in terms of number of man hour. Study [27] falls in the low rigor low relevance category with a rigor score of 1 and relevance score of 1

Study [28] evaluates a repeatable collaborative code inspection process that was designed using Collaboration Engineering principles and techniques. The study reported that this inspection process improve the inspection effectiveness compared to Fagan style, it improves time efficiency and process quality and able to capture different types of defects. Study [28] falls in the high rigor low relevance category with a rigor score of 2 and a relevance score of 2.

Study [29] evaluates a revised source code review process for embedded software, namely Selective Review Process (SRP) and report that this process improves the defect detection rate and the source code quality. On the other hand the study reported that some reviewers using this process found problems with the process usability. Study [29] falls in the low rigor low relevance category with a rigor score of 1 and a relevance score of 1.

Study [30] Evaluated of a review approach (based on the formal specifications) and a review tool to detect potential defects in the implemented programs. The study report that this approach improves the effectiveness and time efficiency. Study [30] falls in the low rigor low relevance category with a rigor score of 1 and a relevance score of 1.

Study [41] evaluates a geographically distributed dispersed inspection process, which has been also implemented in WAIT (web-based artifact inspection tool) a web-based software system. The process extends the Fagan’s method and encourages inspection team members to perform a preliminary asynchronous discussion after a preparation phase and before an optional meeting. The study report that process improves time efficiency. Study [41] falls in the high rigor low relevance category with a rigor score of 3 and a relevance score of 0.

Study [64] evaluates a new inspection technique using inspection and change history. The methodology uses existing software with source code history. The study reports that the technique is capable of domain specific defects. Study [64] falls in the falls in the low rigor low relevance category with a rigor score of 1 and a relevance score of 1.

The studies in this category have different rigor and relevance scores, some of the studies are trustworthy as they have high rigor and relevance, however the fact that only one study evaluates a technology make is very hard to generalize the findings. More evaluation for the individual new inspection techniques is needed to draw solid conclusions and generalize the findings.

5.2.4.3 Support to inspection structure

5.2.4.3.1 Computer support to inspection process This category evaluates the effect of computer support to the paper based manual inspection process. 3 studies [15, 25, 40], falls in this category. Study [15] reported improved effectiveness for using automated tools to support inspection process while study [25] reported the using automated tools doesn’t affect inspection effectiveness. Studies [15, 40] reported improved time efficiency for using automated tools on inspection interval. Study [15] reported improved code quality for using automated tools to support inspections. 12 studies [15, 40] falls in the low rigor low relevance categories with a rigor scores of 1 and relevance score of 0 and 2 respectively. Study [25] falls in the high rigor low relevance category with a rigor score of 2.5 and relevance score of 0. For all the reported benefits the studies are less in number, thus it’s very hard to draw solid conclusions.

5.2.4.4 Support for re-inspection

This category evaluates the techniques to estimate fault contents in inspection sessions facilitating re- inspection decision making. Total 3 studies [33, 35, 55] fall in this category. Study [33] evaluated The experience-based capture re-capture estimator against the maximum likelihood estimator, the study reported the experience-based estimator gives significantly better estimates than the maximum- likelihood method and the estimates are not very sensitive to changes in the inspection data. Study [35] evaluated a fault estimator for Iterative Code Review (ICR) process and reported the estimator was sufficiently accurate for determining when to stop the ICR process. Study [55] evaluated 3 capture recapture models namely jacknife, DPM, and EDPM, the study reported that method, no model is superior or generally better to the others, all models seem to be very dependent on the actual data set being studied such as code document, and the number of reviewers involved. Study [33] falls in the high rigor low relevance category with a rigor score of 2.5 and a relevance score of 0. Study [35] falls in the low rigor high relevance category with a rigor score of 1.5 and a relevance score of 3. Study [55] falls in the low rigor low relevance category with a rigor score of 1 and a relevance score of 0. All 3 studies evaluate different estimators, no enough evidence to evaluate an estimator thus it’s hard to draw solid conclusions and generalize the results.

6 SURVEY 6.1 Theory and Methodology 6.1.1 Objective

The aim of conducting the survey is to explore the state of practice in static code analysis techniques to achieve the second study objective and collect quantitative data necessary to validate and generalize the findings of the systematic literature review. The data collected from the survey represents real world practitioners' opinions in a population. Exploring the state of practice will give information regarding what static code analysis techniques are used in industry, what are the benefits and limitations reported for them, helping in answering research questions RQ6, RQ7 and ultimately allow giving guidelines for practitioners (see Chapter 7). As explained in section 4.4.2, the survey is the most suitable method to contribute valuable knowledge to the existing literature because it is more effective and suitable in generalizing findings as compared to case studies and experiments [107].

6.1.2 Data collection method

Interviews and questionnaires can be used to collect data in surveys, interviews are more flexible, interactive, and provide better discussion management as compared to questionnaires [108]. However online questionnaires are selected as a data collection method for the following reasons:

 Suitable for larger number of responses.  Require less time and preparation effort.  Easier to analyze, manage and validate data for large number of responses.  Allow modifiability, repeatability and extendibility of the survey to a larger population.  Allow confidentiality, some respondents prefer confidentiality, online questionnaire allow respondents to optionally reveal their personal contact and company information [108].

6.1.3 Sample and population

The population of the survey consists of people whose opinion is going to be collected in the survey. The sampling (selection of a subset of the population) is needed when it might be impossible to study a complete population. The target population in our case is the software development community, especially the professionals working software quality assurance. As this population is very large, and it is not possible to identify and include all members of the entire population, nor there is any single institution that maintains a database of software development organiations around the globe. Confronted with this situation, it was necessary to select a sample of the population. We chose the convenient sampling technique for this study, where most reachable subjects such as communities and direct contacts are selected to answer the survey questions [92]. Convenient sampling is selected because it is very fast, easy, and cost effective and time efficient [92] in addition, the authors took advantage of easy access to:

 Personal, supervisor’s, and friends’ contacts in the industry.  Professionals on different social media groups (especially on LinkedIn) associated with the topic area.

In this case convieneint sampling does not affect the validity as each of the respondents has to define his context and the survey mainly targeted companies from Sweden and US. The survey is also targeted to limit the population to organizations that explicitly claim to use code analysis techniques heavily over past years.

6.1.4 Questionnaire development

The Questionnaire is developed to collect the necessary information to answer research questions RQ5 and RQ6. The questionnaire is divided into three main parts:

The first part captures demographic information. This part does not specifically collect data to answer any research question, rather demographic information is important when drawing conclusions, also it will reveal if the results are biased or drifted toward a specific demographic element [104]. In this part the respondents will be asked to provide the following demographics about the subject and the company:

 Current position.  Experience in the current position.  Experience with software development.  Experience with static code analysis techniques.

Company information:

 Company location.  Company size.  Industrial sector.  Software product type.  Life cycle model used.

The second part of the questionnaire captures the practices information necessary to answer research question RQ5. In the part respondents are asked to provide information on what static code analysis techniques they are frequently using, further they are asked to provide information regarding what techniques they have abandoned and for which reasons.

The third part captures the benefits and limitation of static code analysis techniques necessary to answer RQ6, the findings of benefits and limitations from the SLR is used as input for these questions, the reason is to ensure the data consistency an allow comparing the data from the SLR and the survey. Finally, the respondents are asked optionally to enter their email address and company information to allow their confidentiality, if the respondents enter their email they will be given the results.

The complete survey questionnaire can be found in Appendix-B.

6.1.5 Questionnaire distribution

A first version of the questionnaire was piloted and tested with 6 industrial professionals to make sure that the scientific terms and defenitions used in survey questions are correctly interpreted by practitioners, the feedback from these 6 professionals was taken into consideration and the questionnaire was updated accordingly. Later the questionnaire was advertised on software engineering communities, LinkedIn groups, and using personal contacts in different companies. An online tool Survey Monkey (https://www.surveymonkey.com) was used to distribute the questionnaire, and for the data collection. The survey was remained open for the four weeks time in the month of November 2013. Total 183 respondents started the survey, out of them 97 completed it, and their responses were subject to analysis.

6.1.6 Validity threats

In the validity analysis of the survey research, we identified some possible threats, we discussed them and decided how to deal with them. We structured these threats and their mitigation strategies based on the Wohlin et al [79] work on the experimentation in software engineering.

6.1.6.1 Internal Validity

Internal validity is concerned with the relationship between the treatment and the outcome of the research [79]. This relationship is statistically very significant which tells us about how sure we are about the treatment actually caused the outcome. There could be factors that actually effect the outcome, that is, the factors we have no control over or we have not measured.

There was a threat related to the finding of a representative group of participants those who have practical experience of different static analysis techniques. It was not a good idea to solely rely on our personal contacts hence induced bias in the results already. Therefore, in order to cover the wide population of subjects, we decided to perform an online survey. There is always a threat in online surveys that you might get some responses from irrelevant subjects, but it was irrelevant in our case because we focused on the practitioners in the field of software engineering, particularly those interested in static code analysis. We advertised our survey across the software testing and quality assurance communities, and related groups on the social media like LinkedIn etc. In order to further mitigate the threat, we also advertise our survey link through personal contacts. This kind of advertising reduces the risk of any irrelevant responses, and helps to collect responses from the subjects those who are very active and keep a good interest in the field [115].

There is another threat associated with the surveys, that is, the interaction between respondents can influence their responses, which in our case was very unlikely because most of our respondents were geographically distributed.

6.1.6.2 External Validity

The external validity is concerned with the generalization of the results [79], that is, how representative our results are. We got a relatively high number of 183 responses, out of which 84 responses were incomplete, and were discarded. Total 97 responses were selected that had fully answered the survey questionnaire. The survey questionnaire was designed in a way that can get reliable responses from the actual practitioners of the static analysis techniques. We made questionnaire in a way that could filter out the irrelevant respondents.

We were able to attract respondents from quite mixed demographics. Organizations of all sizes participated, from very small (less than 50 employees) to very large (more than 4,500 employees). About 48% of respondents were from small and medium sized enterprises having less than 250 employees, and about 52% of respondents were from big enterprises having employees from 250 to more than 4,500. The largest percentage (about 35%) of respondents were from the big enterprises having more than 4,500 employees.

Further, the respondents belonged to different roles of a software development project, from project manager and product manager (4.8% respectively), development manager (6.7%), quality assurance (9.6%), process engineer (4.8%), verification and validation (2.9%), software architect (15.4%), and software developer (42.3%). This also indicates that a large number of developers out there doing the testing work themselves.

Also, we got mixed responses from the organizations involved in different kinds of product types, from data-dominant software (e.g., web browsers, implementation tools, applications for displaying information, online booking etc.) with 58%, control-domain software (e.g., hardware control, embedded software, real-time software etc.) and system software (e.g., operating systems, support utilities, middleware etc.) have almost equal distribution with 27.9% and 24% respectively, and computation- dominant software (e.g., hardware control, information processing etc.) with 15.4 % of the respondents.

The responses were from different kinds of industries, with 29.8% is the biggest portion came from companies working in the computer hardware and software industry, following with the 23% came from companies working in the internet industry.

Based on our experience, we believe that our respondents cover a wide range of different organizations. Hence, the results can be generalized.

6.1.6.3 Construct Validity

Construct validity is concerned with the measurements and instruments to collect data, and whether if we were measuring the right things [79]. There was a threat related to the influence of the researchers on the subject. In order to eliminate this threat, it was decided to conduct a questionnaire-based survey. Further, to reach out a larger number of possible participants, we decided to conduct it online approaching the practitioners in the industry.

There was a threat related to the practices as well as benefits and limitations list of different static analysis techniques usage. This list was developed out of the findings of the systematic literature review, but there was still risk associated with the completeness and relevance of the list. We consulted with an expert researcher and founded no missing information.

Another threat was associated with the ambiguity of the questions. A simple language was used for the questions in the survey, and the terms were properly defined where necessary in order to avoid any confusion. In order to assess the understandability of the questions, the structure of the questionnaire, and to find out any sort of ambiguity in the questionnaire, a pretest was performed where few researchers and practitioners were requested to give their feedback on the survey questionnaire. Their feedback helped us to further refine the questionnaire into the final format.

6.1.6.4 Conclusions Validity The conclusion validity is concerned with the relationship between the treatment and the outcome [79]. There was a potential threat related with the misinterpretation of the results by the researchers. To overcome this threat, data analysis methods were discussed and results were reviewed by an experienced researcher, which partly reduced the risk of misinterpretation of the results.

There was another threat associated with the misinterpretation of the data. This threat is already reduced because of the quantitative nature of the data of the survey research, which would be otherwise high in case of qualitative data.

6.2 Results 6.2.1 Demographics

6.2.1.1 Information about the respondents

People with different, interest roles and experience participate in the software development process [111]. Figure 6.1 shows the distribution of roles among the participants who answered the survey, out of 183 participants who started the survey 97 respondents completed the survey and only their responses will be subject to analysis.

The respondents were asked to give their main role in a single choice question. Software developers have gave most of the responses with 42.3 %, software architects comes in the second rank and gave 15.4 %. Project managers, product managers and process engineers gave equal number of responses with 4.8% each, there was no requirements engineer who took the survey. Even though nearly half of

61 the responses come from developers, having answers from almost all roles limits the bias of the answers to a specific role.

Table 6.1 Overview of the survey participants experience

Minimum (Years) Maximum (Years) Mean Standard Deviation 1 37 6.51 7.572

Table 6.1 shows the overview of participants experience in their current role. The table represents the minimum and maximum years of experience provided by participants, the mean and standard deviation. The mean value of 6.51 is a good average value indicating the trustworthiness of opinion provided. Looking at the standard deviation value is almost as large as the mean value, however at least 10% of the respondents have an average experience close to the mean value, and due to the large number of respondents the participants experience in their current role is trustworthy enough to provide valuable opinions.

Software Project manager Product manager System analyst development 4.8% 4.8% 0.0% manager Other 6.7% 8.7% Software Software quality architect assurance 15.4% 9.6%

Software process engineer Software 4.8% developer Software 42.3% verification & validation 2.9%

Figure 6.1 Overview of the survey participants

Table 6.2 shows the overview of participants experience in software development. The table represents the minimum and maximum years of experience provided by participants, the mean and standard deviation. The mean value of 17.90 is a good average value indicating the trustworthiness of opinion provided. Looking at the standard deviation value it’s almost two third the mean value, however at least 30% of the respondents have an average experience close to the mean value which is already high, and due to the large number of respondents the participants experience in development is trustworthy enough to provide valuable opinions.

Table 6.2 Overview of participants experience in software development

Minimum (Years) Maximum (Years) Mean Standard Deviation 1 49 17.90 11.960

Table 6.3 shows the overview of participants experience in software development. The table represents the minimum and maximum years of experience provided by participants, the mean and standard deviation. The mean value of 10.45 is a good average value indicating the trustworthiness of opinion provided. Looking at the standard deviation value, it is almost close to the mean value, however at least 10% of the respondents have an average experience close to the mean value, which is already high, and due to the large number of respondents the participants experience in development is trustworthy enough to provide valuable opinions.

Table 6.3 Overview of participants experience in static code analysis techniques

Minimum Maximum Mean Standard Deviation 1 35 10.45 8.962

6.2.1.2 Information about the organization

The organizational culture differs in terms of rigidity in decision-making with respect to its size [109] organizational size can have an influence in deciding what policies, techniques and tools to use. Figure 5.2 shows the distribution of responses with respect to organizational size. The largest portion of the reponses 34% comes from large organizations with more than 4,500 employees. Following 26% come from organizations with less than 50 employees. The distribution of the responses is not significantly different among the four categories. This is a good indicator that the results are not biased to a specific category and respondents from all organization sizes answered the survey.

Organizational size More Less than than 50 4,500 26.0% 34.6%

250 – 4,499 50 – 249 17.3% 22.1%

Figure 6.2 Distribution of organizational size

Figure 6.3 shows the distribution of responses with respect to a taxonomy of 4 product types, as proposed in study [110]. The figure reflects what software products are produced in the companies the survey respondents are working in. 58% of the responses come from companies producing data- dominant software (e.g. web browsers, implementation tools, applications for displaying information, online booking etc). Control-domain software (e.g. hardware control, embedded software, real-time software) and system software (e.g. operating systems, support utilities, middleware) have almost equal distribution with 27.9% and 24% respectively, at last computation-dominant software (hardware control, information processing) and other have equal distribution with 15.4 % each. The other category includes companies that produce different software products such as, virtualizations, program analysis tools, meta tools. The distribution show diversity in the product types even dough significant response came from companies producing data-dominant software.

80.0% 58.8% 60.0% 40.0% 26.8% 26.8% 14.4% 14.4% 20.0% 0.0% Other Systems software software dominant software software Computation- Data-dominant Control-domain Percentage of OrganizationsPercentage Product Development Type

Figure 6.3 Distribution of product types among the companies of survey respondents

Figure 6.4 shows the industries investigated, 29.8% is the biggest portion of the responses coming from companies working the computer industry either hardware or software. Following 23% come from companies working in the internet industry. A good observation here is that most of the industries provided responses to the survey, and it is a good fact that the biggest portion come from the computer industry.

30.0% 29.8% 25.0% 23.1% 20.0% 16.3% 12.5%

15.0% 11.5% 11.5% 8.7% 8.7% 10.0% 7.7% 2.9% 2.9%

5.0% 1.9% 1.0% 1.0% 1.0% 0.0% 0.0% 0.0% 0.0% 0.0% Percentage of Organizations Percentage Internet Education Consulting Accounting Advertising Food Service Biotechnology Communications Construction/Home… Healthcare/Medical Government/Military Business/Professional… Computer (Hardware,… Engineering/Architecture Business Services (Hotels,… Entertainment/Recreation Finance/Banking/Insurance Aerospace/Aviation/Autom… Agriculture/Forestry/Fishing Industry

Figure 6.4 Distribution of industries among responses

Figure 6.5 represents the distribution of software development life cycle models among responses. This is important as the life cycle model can have an influence on what code analysis technology to use. Among responses the companies using agile development models or hybrid models dominated by agile provided more than 50% of the responses, this reflects that static code analysis is used in agile practices. Respondents from companies using spiral models and waterfall are the least who provided answers with 1.9% and 4.8% respectively. However, people from companies using the various life cycle models answered the survey.

Hybrid Waterfall process 4.8% Incremental (dominated 13.5% by plan- Other driven 10.6% Spiral practices, 1.9% with few agile … Hybrid process Agile (dominated 29.8% by agile practices, with few plan-driven practices)…

Figure 6.5 Distribution of software life cycle models among responses

6.2.2 Static code analysis techniques in practice

This part of the survey results represents the usage of static code analysis techniques in industry. The main objective of this part is to identify which static code analysis techniques are currently used in industry and how frequent. Collecting this information allows us to answer RQ5, further it allows us to compare the survey findings against the SLR findings and see if the static code analysis proposed in research are actually used in industry and to which extent.

6.2.2.1 Static code analysis techniques in practice – Global view

Survey respondents were asked to state what static code analysis technologies they use in their industrial practices, and to which extent. The respondents were asked to provide answers on the 4 main static code analysis technologies identified in the SLR, namely, inspections, informal reviews, walkthroughs and static code analysis tools. For each technology the respondents were asked to state the frequency of usage in a likert style representation with 6 levels of usage ranging from, always to very frequently, occasionally, rarely, very rarely, and never (see question 9 in Appendix-C). Using likert style representation to capture the frequency of usage is suitable here because it gives flexibility to the respondent to choose from a range of options indicate the usage frequency, as well as allowing the authors to capture the usage frequency precisely.

Figure 6.6 below represents the static code analysis techniques and their frequency of usage considering all the 97 total survey responses. It gives a global view. The X-axis represents the four static code analysis technologies and Y-axis represents the number of responses in terms of percentage. The bars represents the number of responses for each usage category, in terms of percentage, and calculated out of the total number of responses.

100% 9% 90% 22% 27% 80% 16% 38% 70% 18% 60% 20% 31% 14% 50% 29% 40% 12% 25% 17% 30% 9% 9% 20% 10% 13% 4% 8% 10% 25% 7% Percentage of RespondentsPercentage 10% 15% 11% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.6 Static code analysis techniques in practice - Global view

Figure 6.6 shows that, the most frequent technique used by industrial professionals to analyze code and capture defects is the static code analysis tools. The 67 % of the total respondents indicated that they use static analysis tools very frequently or always. Informal reviews come in the second rank as 47% of total respondents indicated they use them very frequently or always. On the other hand, 64% of total respondents indicated the use inspections rarely, very rarely or have never used it before, and for walkthroughs 45% of respondents indicated that they use walkthroughs rarely, very rarely or have never used it before. These findings are supported by Figure 6.7 below which reflects which techniques which have been previously used before they have been abandoned. Inspections comes in the first rank with 55% of respondents indicated they stopped using inspections.

60.0% 55.6% 50.0% 41.7% 40.0% 33.3% 33.3% 30.0% 20.0% 10.0% 0.0% Inspections Informal Walkthroughs Static Analysis Reviews Tools Percentage of RespondentsPercentage Static Analysis Techniques

Figure 6.7 Static code analysis techniques abandoned by practitioners

We can see that, overall, industrial professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews. On the other hand inspections and walkthroughs are rarely used comparitively.

6.2.2.2 Static code analysis techniques in practice – Company size view

In this this section the survey results are filtered with respect to company size to see if the company size has an influence on the use of different static code analysis technniques and if the company size view differs or not from the global view on practices. During the survey the respondents were given 4 options to enter their company size (see section 5.2.1 & question 5 in appendix C), these 4 company size categories shown in figure 5.2 will be used to provide the different views on usage of static code analysis technologies with respect to different company sizes.

6.2.2.2.1 Companies with less than 50 employees

Companies in this category are usually very small companies. 34.6% of the total responses came from respondents working in companies with less than 50 employees.

100% 12% 12% 90% 24% 26% 80% 15% 19% 70% 19% 24% 60% 23% 30% 50% 12% 40% 8% 28% 19% 19% 30% 20% 4% 12% 7% 35% 7% 10% 12% Percentage of RespondentsPercentage 15% 11% 0% 8% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.8 Usage of static code analysis techniques in companies with less than 50 employees

Figure 6.8 shows that, the most static code analysis technology used in small companies to analyze code and capture defects is the static code analysis tools. Total 56% of the respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews came in the second rank as 48% of total respondents indicated they use them very frequently or always. On the other

66 hand, 55% of total respondents indicated the use inspections rarely, very rarely or have never used it before, and 46% of respondents indicated that they use walkthroughs rarely, very rarely or have never used it before.

We can say that in small companies with less than 50 employees, industrial professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews. On the other hand inspections are rarely used followed by walkthroughs.

6.2.2.2.2 Companies with 50 – 249 employees

A total of 22.1 % of survey respondents work in companies with 50 – 249 employees. Figure 6.9 shows that, the most static code analysis technique, used in companies with 50 – 249 employees, to analyze code and capture defects is the static code analysis tools. 72% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews came in the second rank as 47% of total respondents indicated they use them very frequently or always. On the other hand, 62% of total respondents indicated they use inspections rarely, very rarely or have never used it before, and 29% of respondents indicated that they use walkthroughs rarely, very rarely or have never used it before.

We can say that the companies with 50 – 249 employees, professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews. On the other hand, inspections and walkthroughs are rarely used.

100% 10% 10% 90% 80% 14% 33% 29% 48% 70% 14% 60% 14% 50% 24% 24% 40% 24% 24% 30% 14% 14% 20% 14% 10% 10% 24% 5% 24% 19% Percentage of RespondentsPercentage 10% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.9 Usage of static code analysis techniques in companies with 50 – 249 employees

6.2.2.2.3 Companies with 250 – 4,449 employees

17.3 % of the total survey respondents work in companies with 250 – 4,499 employees. The companies in this category are relatively large organizations.

Figure 6.10 shows that, the most static code analysis technique, used in companies with 250 – 4,499 employees, to analyze code and capture defects is the static code analysis tools. 67% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews came in the second rank as 50% of total respondents indicated they use them very frequently or always. On the other hand, 62% of total respondents indicated they use inspections rarely, very rarely or have never used it before, and 60% of respondents indicated that they use walkthroughs rarely, very rarely or have never used it before.

Here results can be drawn that the professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews. On the other hand inspections are rarely used followed by walkthroughs.

100% 90% 20% 31% 80% 47% 40% 70% 20% 60% 19% 50% 13% 27% 40% 25% 40% 30% 13% 7% 13% 20% 13% 7% 7% 10% 20% Percentage of RespondentsPercentage 13% 13% 13% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.10 Usage of static analysis techniques in companies with 250-4,449 employees

6.2.2.2.4 Companies with more than 4,500 employees

Total 34.6% of survey respondents work in companies with more than 4,500 employees. The companies in this category is very large organizations.

Figure 6.11 shows that, the most static code analysis technique used in companies with more than 4,500 employees, to analyze code and capture defects is the static code analysis tools. 72% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Inspections came in the second rank as 51% of total respondents indicated they use inspections very frequently or always. Informal reviews come in the third rank as 45% of the respondents indicated they use them very frequently or always. On the other hand, walkthroughs are used rarely in large companies, 42% of total respondents indicated they them rarely, very rarely or have never used it before.

100% 9% 90% 27% 24% 80% 15% 39% 70% 21% 60% 24% 33% 50% 40% 15% 24% 33% 15% 30% 3% 9% 20% 12% 9% 18% 6% 10% 21% 9% Percentage of RespondentsPercentage 9% 9% 9% 0% 3% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.11 Usage of static analysis techniques in companies with more than 4,500 employees

We conclude that, in companies with more than 4,500 employees, professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by inspections and then informal reviews. On the other walkthroughs are rarely used.

6.2.2.3 Static code analysis technologies in practice - Product type view

In this this section the survey results on practices are filtered with respect to software product type to see if the product type have an influence on the use of different static code analysis techniques and if the product type view differs from the global and other views on practices. During the survey the respondents were given 4 options to enter the product type produced by their company (see section 5.2.1 & question 7 in appendix C), these 4 product type categories will be used to provide the different views on usage of static code analysis technologies with respect to different product types.

6.2.2.3.1 Companies producing data-dominant software products

The biggest portion of respondents, 57 respondents comprising 58% of 97 total respondents work in companies that produce data-dominant software products.

Figure 6.12 shows that, the most static code analysis technique used by companies producing data- dominant software, to analyze code and capture defects is the static code analysis tools. 59% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews came in the second rank as 40% of total respondents indicated they use them very frequently or always. On the other hand, 53% of total respondents indicated they use inspections rarely, very rarely or have never used it before, and 49% of respondents indicated that they use walkthroughs rarely, very rarely or have never used it before.

We conclude that, in companies producing data-dominant software, professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews. On the other walkthroughs and inspections are rarely used.

100% 7% 90% 21% 26% 13% 80% 32% 70% 14% 14% 32% 60% 13% 50% 27% 11% 28% 40% 18% 13% 11% 30% 11% 13% 5% 20% 11% 29% 7% Percentage of ResponsesPercentage 10% 14% 18% 14% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.12 Usage of static code analysis techniques in companies producing data-dominant software

6.2.2.3.2 Companies producing control-domain software products

Total 25 survey respondents out of 97, comprising 27.9% work in companies that produce control- domain software.

Figure 6.13 shows that, the most static code analysis technique used in companies producing control- domain software, to analyze code and capture defects is the static code analysis tools. 80% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews and inspection came in the second rank with 60% and 56% respectively. On the other hand, 44% of total respondents indicated they use walkthroughs rarely, very rarely or have never used it before.

We conclude that, in companies producing control-domain software, professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews and inspections. On the other walkthroughs and inspections are rarely used.

100% 90% 12% 80% 32% 40% 24% 48% 70% 60% 24% 50% 20% 20% 40% 12% 32% 30% 20% 8% 28% 20% 4% 16% 4% 10% 20% 8%

Percentage of RespondentsPercentage 8% 0% 4% 8% 8% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.13 Usage of static analysis techniques in companies producing control-domain software

6.2.2.3.3 Companies producing system software products

Total 26 survey respondents out of 97 total respondents comprising 24% work in companies that produce system software.

100% 16% 12% 90% 27% 80% 35% 70% 35% 60% 19% 44% 50% 31% 40% 27% 27% 30% 28% 15% 20% 15% 15% 4% 10% 12% Percentage of RespondentsPercentage 12% 12% 8% 0% 4% 4% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static analysis techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.14 Usage of static code analysis techniques in companies producing system software

Figure 6.14 shows that, the most static code analysis technology, used in companies producing system software, to analyze code and capture defects is the static code analysis tools. 66% of the total

70 respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews came in the second rank as 60% of total respondents indicated they use them very frequently or always. On the other hand, 27% of total respondents indicated they use inspections and walkthroughs rarely, very rarely or have never used it before. We conclude that, in companies producing system software, professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews. On the other walkthroughs and inspections are rarely used.

6.2.2.3.4 Companies producing computation-dominant software products

Total 14 respondents out of 97 total respondents comprising 15.4% work in companies that produce computation-dominant software products.

100% 90% 14% 80% 36% 50% 21% 43% 70% 60% 50% 21% 21% 40% 29% 14% 21% 30% 21% 7% 7% 20% 21% 10% 21% 21% 14% Percentage of RespondentsPercentage 0% 7% 7% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.15 Usage of Static code analysis techniques in companies producing computation-dominant software

Figure 6.15 shows that, the most static code analysis techniques, used in companies producing data- dominant software, to analyze code and capture defects is the static code analysis tools. 72% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews and inspection came in the second and third rank with 71% and 57% respectively. On the other hand, 41% of total respondents indicated they use walkthroughs rarely, very rarely or have never used it before.

We can conclude that in companies producing data-dominant software, professionals use static code analysis tools widely and very frequently to capture defects in the code, followed by informal reviews and inspections. On the other walkthroughs and inspections are rarely used.

6.2.2.4 Static code analysis techniques in practice – Software life cycle model view

In this this section the survey results on practices are filtered with respect to the software lifecycle model used at the company to see if the life cycle model have an influence on the use of different static code analysis techniques and if the life cycle model view differs or not from the global and other views on practices. During the survey the respondents were given 6 options to enter the life cycle model used by their company (see section 6.2.1 and question 8 in appendix C), these 6 life cycle model categories will be used to provide the different views on usage of static code analysis technologies with respect to different software development life cycle models.

6.2.2.4.1 Companies using waterfall model

Only 6 respondents out of 97 total respondents comprising 4.8% work in companies which use the waterfall lifecycle model to develop their software products.

Figure 6.16 shows the usage of different static code analysis techniques in companies using the waterfall model. However it is not possible to draw solid conclusions here because we only have 6 respondents in this category, the number of respondents is not enough to draw solid conclusions.

100% 90% 17% 80% 33% 33% 50% 70% 60% 50% 50% 33% 40% 50% 17% 30% 17% 17% 20% 33% 10%

Percentage of RespondentsPercentage 17% 17% 17% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.16 Usage of Static code analysis techniques in companies using the waterfall model

6.2.2.4.2 Companies using incremental model

Only 15 respondents out of 97 total respondents comprising 13.5 % work in companies which use the incremental model to develop their software products.

100% 90% 13% 13% 20% 36% 80% 20% 70% 33% 60% 13% 47% 50% 7% 36% 13% 40% 30% 27% 7% 27% 7% 20% 21% 7% 10% 20% Percentage of RespondentsPercentage 13% 13% 0% 7% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.17 Usage of Static code analysis techniques in companies using the incremental model

Figure 6.17 shows that, the most static code analysis technique, used in companies which use the incremental model as a software life cycle model, is the informal reviews. 72% of the total respondents in this category indicated that they use informal reviews very frequently or always. Static analysis tools

72 come in the second rank with 67%. On the other hand, 53% of total respondents indicated they use walkthroughs rarely, very rarely or have never used it before, and 47% of respondents indicated that they use inspections rarely, very rarely or have never used it before.

We conclude that in companies using incremental model to develop their software, professionals use informal reviews widely and very frequently to capture defects in the code, followed by static analysis tools. On the other walkthroughs and inspections are rarely used.

6.2.2.4.3 Companies using spiral model

Only 2 survey respondents out of 97 total respondents work for companies that use the spiral lifecycle model to develop their software products.

100% 90% 80% 50% 50% 50% 70% 60% 50% 100% 40% 30% 50% 50% 50% 20% 10% Percentage of RespondentsPercentage 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.18 Usage of static code analysis techniques in companies using the incremental model

Figure 6.18 represents the usage of different static code analysis technologies in companies using spiral model. However, it is not possible to draw solid conclusions here because we only have 2 respondents in this category, the number of respondents is not enough to draw solid conclusions.

6.2.2.4.4 Companies using agile model

Total 28 survey respondents out of 97 total respondents comprising 29.8% work in companies using agile methods as a lifecycle model to develop their software products.

Figure 6.19 shows that, the most static code analysis technology used in companies using agile methods as a software life cycle model is the static analysis tools. 78% of the total respondents in this category indicated that they use static analysis tools very frequently or always. Informal reviews come in the second rank with 67% and then walkthrough with 37%. On the other hand, 56% of total respondents indicated they use inspections rarely, very rarely or have never used it before.

We conclude that in companies using agile methods to develop their software, professionals use static analysis tools widely and very frequently to capture defects in the code, followed by informal reviews and walkthroughs. On the other hand, inspections are rarely used.

100% 90% 19% 15% 29% 80% 11% 70% 22% 52% 60% 15% 18% 50% 19% 40% 37% 32% 26% 30% 7% 20% 4% 30% 15% 11% 10% 7% Percentage of RespondentsPercentage 11% 11% 11% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.19 Usage of static code analysis techniques in companies using the agile model

6.2.2.4.5 Companies using hybrid model (dominated by agile practices)

Only 24 survey respondents out of total 97 respondents comprising 23.1% work in companies that use hybrid models dominated by agile practices, as a lifecycle model to develop their software products.

Figure 6.20 shows that, the most static code analysis technique used in companies using hybrid models dominated by agile practices as a software life cycle model is the static analysis. Total 54% of the respondents in this category indicated that they use static analysis tools very frequently or always followed by informal review with 34%. On the other hand, 71% of total respondents indicated they use walkthroughs rarely, very rarely or have never used it before.

100% 90% 17% 17% 13% 33% 80% 17% 17% 70% 25% 60% 8% 33% 21% 50% 33% 40% 21% 13% 4% 4% 30% 21% 20% 17% 17% 29% 10% Percentage of RespondentsPercentage 13% 17% 13% 0% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Techniques

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.20 Usage of static analysis techniques in companies using hybrid models dominated by agile practices

For inspections 42% of respondents indicated they use it very frequently or always, but also 50% indicated they use it very rarely or have never used it before. For informal reviews 43% percent of respondents indicated they use it very frequently or always, but also the same percent also indicated they use it rarely, very rarely or have never used it before.

We conclude that, in companies using hybrid models dominated by agile practices to develop their software, professionals use static analysis tools widely and very frequently to capture defects in the code followed by informal reviews. On the other walkthroughs are rarely used. For inspections the results does not show a clear conclusion.

6.2.2.4.6 Companies using hybrid model (dominated by plan-driven practices)

13 survey respondents out of 97 total respondents comprising 13.6% work in companies using hybrid models dominated by plan-driven practices, as a lifecycle model to develop their software products.

Figure 6.21 shows that, the most static code analysis technique used by companies which use hybrid models dominated by plan-driven practices as a software life cycle model, is the static analysis tools. Total 69% of the respondents in this category indicated that they use static analysis tools very frequently or always. Inspections and informal reviews comes in the second rank with 50% each, however also 41% indicated they use informal reviews rarely, very rarely or never used it before, for inspection only 43% indicated that. On the other hand, 49% of total respondents indicated they use walkthroughs rarely, very rarely or have never used it before.

We conclude that in companies using hybrid models dominated by plan-driven practices to develop their software, professionals use static analysis tools widely and very frequently to capture defects in the code followed by inspections and then informal reviews. On the other walkthroughs are rarely used.

100% 8% 90% 25% 23% 80% 42% 70% 42% 60% 25% 50% 8% 46% 8% 8% 17% 40% 8% 30% 25% 17% 8% 20% 33% 8% 10% 8%

Percentage of RespondentsPercentage 17% 15% 0% 8% Inspections Informal Reviews Walkthroughs Static Analysis Tools Static Analysis Technologies

Never Very Rarely Rarely Occasionally Very Frequently Always

Figure 6.21 Usage of static analysis techniques in companies using hybrid models dominated by plan-driven practices

6.2.3 Benefits and limitations of different static analysis techniques from industry professionals

This section represents the benefits and limitations of static code analysis techniques obtained through opinions of industrial professionals who have experience in software development and who have experience using different static code analysis techniques. The professionals provided their opinion on benefits and limitations of the same four static code analysis techniques evaluated in the SLR, namely:

 Inspections  Informal Reviews  Walkthroughs  Static code analysis tools

The survey respondents were asked to provide their agreement level on the benefits and limitations of the four techniques above, the benefits and limitations are represented in terms of seven variables for the practitioners. These seven variables match the variables used by researchers to investigate the benefits and limitations of the techniques summarized in Table 5.12, the matching of variables is done to ensure consistency and allow comparing the benefits and limitations obtained from the SLR as opposed to those obtained from the survey, this is shown in Table 6.4 below.

After matching the variables, for each variable the survey respondents were asked to evaluate each technique with respect to each variable by providing their agreement/disagreement level. The agreement level is measured at 5 occasion ranging from strongly agree (5) to strongly disagree (1) and represented in Likert style (see 12- 15 in Appendix-B).

However, Likert style presentation is not good enough to measure the significance of variance between the techniques with respect to the perceived seven variables representing the benefits and limitations [116], for example Likert style cannot tell you if there is a significant difference in the effectiveness of inspection compared to static code analysis tools. But obtaining the benefits at 5 occasions using Likert style allow us to perform statistical analysis and obtain more precise results on the benefits and limitations.

Table 6.4 Definition of the seven variables evaluated in the SLR and in the Survey

Terms representing Terms representing No. Variable Definition variable in SLR variable in survey 1 Effectiveness The degree to which a Effectiveness Effectiveness static code analysis technique is successful in producing a desired result. 2 False positives The ability of a technique Effectiveness (number False positive to produce a lower number of false positives is of false positives. reported in the SLR as a measuring criteria for the effectiveness variable). 3 Ease of use The quality of the process, Technique/tool quality Ease of Use technique or tool in terms of understandability, ease of use, and users’ satisfaction. 4 Fault content Ability of a static code Fault content Fault content analysis technique to detect different types of faults. 5 Cost efficiency Economical ability of a ˗ Cost effectiveness Cost efficiency static code analysis ˗ Effort efficiency technique to accomplish a ˗ Time efficiency job with a minimum expenditure of cost, time and effort. 6 Internal code The quality of the source Internal code quality Internal product quality quality code under analysis in terms of readability, maintainability and participants’ understandability. 7 product quality The quality of the final External code quality Product quality product in terms of defects

reduced because of early verification.

To allow precise testing of which technique is superior to the others with respect to each variable, statistical analysis [116] will be performed to test if the different techniques (inspection, informal reviews, walkthroughs, and static code analysis tools) significantly vary with respect to each outcome variable (e.g. effectiveness, ease of use, etc.). For this purpose, a Null and Alternate hypothesis are formulated as follows:

Null hypothesis: There is no significant different in the perception of the outcome variable between the different static code analysis techniques

Alternate hypothesis: There is significant different in the perception of the outcome variable between the static code analysis techniques

After formulating the hypothesis, a statistical test will be selected to test the hypothesis. Friedman test to test the two hypothesis. Friedman test is the non-parametric alternative to the one-way Analysis Of Variance (ANOVA) with repeated measures, it is used to test for differences between groups when the dependent variable being measured is ordinal [116].

In this study, the independent variable is static code analysis techniques which has 4 groups Inspection, Informal reviews, Walkthroughs, Static code analysis tools. The dependent variables are the seven variables representing benefits and limitations (Effectiveness, false positives, ease of use, etc.).

The reasons, which apply to our data set, for choosing Friedman test are:

 We have one within-subjects independent variable with 4 or more levels.  The independent variable have more than 2 groups, this is fulfilled in our study since our independent variable have 4 groups, inspection , static analysis tools, walkthroughs , informal reviews.  The data is collected using the same participants.  Our dependent variable is ordinal and not normally distributed (but at least ordinal)  One group that is measured on more than three different occasions, this is fulfilled in our study since each of our groups are measured on five 5 occasions using the Likert style.  Our dependent variable is measured at the ordinal level (Likert style).  Our samples do not need to be normally distributed.

Depending on the outcome of Friedman test, if the significance level value (Asymp. Sig.) is below the threshold (0.05) this indicates a significant difference [116], and thus the null hypothesis will be rejected. In this case the alternate hypothesis need to be tested but Friedman test is not capable of performing pair wise comparisons [116], for example Friedman test can tell us if there is a significant difference between the effectiveness of the different static code analysis techniques, but the test can’t compare two techniques and tell which one is significantly effective than the other.

To test the alternate hypothesis, a post hoc test is needed to follow up the results of Friedman’s test, thus Friedman test will be complemented by Wilcoxon Signed-Rank test to determine where the difference occurred, the Wilcoxon test will compare the techniques in a pair wise fashion. At the end of the test the four techniques will be ranked according to their impact on the variable, e.g. ranking the techniques starting with the most effective one.

The Wilcoxon signed-rank test is the nonparametric test which does not assume normality in the data, it can be used when this assumption has been violated and the use of the dependent t-test is inappropriate. It is used to compare two sets of scores that come from the same participants [116].

When performing Wilcoxon test, a Bonferroni correction will be made to adjust the significant value (Asymp. Sig. = .05) used in Friedman test, because unlike Friedman test there will be more than one test when performing stepwise comparisons in Wilcoxon test as shown below

The Bonferroni adjustment accept something as significant only if its significance is less than (Asymp. Sig. = .05)/number of comparisons. In our case we have 4 groups, so if we compare all of the groups we simply get six comparisons:

 Test1: Inspection vs Informal reviews  Test2: Inspection vs Static analysis tools  Test3: Inspection vs Walkthroughs  Test4: Static analysis tool vs Informal reviews  Test5: Static analysis tools vs Walkthroughs  Test6: Informal reviews vs Walkthroughs

Thus the significant value for Wilcoxon test will be 0.05/6 = 0.0083, and this value will be used to measure the significance difference between the techniques.

The application of these two tests (Friedman followed by Wilcoxon) is demonstrated in the following sections for each dependent variable. The full statistical test results are available in Appendix C.

6.2.3.1 Effectiveness

In this section the effectiveness of the four static code analysis techniques will be measured by applying the statistical tests on the data sets representing the opinions of the survey respondents on effectiveness. As a result of the statistical tests, the techniques will be ranked starting with the most effective technique according to practitioners opinion. The statistical tests will be performed first on the data set representing all survey respondents, then the data set will be filtered with respect to company size, software product type and software development model used, this will give us different views on effectiveness from different company sizes, different software product types and different development models. Later this will be done for each one of the seven variables representing benefits and limitations.

Table 6.5 Effectiveness - Friedman test statistics Chi- Asymp. N df Square Sig. Global View 97 8.257 3 0.041

< 50 employess 27 3.994 3 0.262 50 – 249 employees 21 14.66 3 0.002 250 - 4,499 employees 16 0.468 3 0.926 size view size Company Company > 4,500 employees 33 0.195 3 0.978

Data-dominant software 57 1.624 3 0.654 Control-domain software 26 3.771 3 0.287 System software 26 3.049 3 0.384 Product Product type view type Computation-dominant software 14 2.072 3 0.558

Waterfall 6 4.5 3 0.212 Incremental 15 4.402 3 0.221 Spiral 2 1.4 3 0.706 Agile 28 11.818 3 0.008

Software life life Software Hybrid (agile with few plan driven practices) 24 3.877 3 0.275 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 1.467 3 0.69

Table 6.6 Effectiveness - Wilcoxen signed rank test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -1.370b -.416b -1.141c -1.103c -1.913c -1.412c Global View Asymp. Sig. 0.171 0.677 0.254 0.27 0.056 0.158 (2-tailed) Z -.875b -.918c -.444c -2.230c -1.054c -.456b < 50 employess Asymp. Sig. 0.382 0.359 0.657 0.026 0.292 0.648 (2-tailed) Z -1.466b -1.807b -1.613c -.351b -2.195c -2.913c 50 – 249 employees Asymp. Sig. 0.143 0.071 0.107 0.725 0.028 0.004 (2-tailed) Z -.265b -.137c -.604c -.575c -.575c -.485c 250 - 4,499 employees Asymp. Sig. Company size view size Company 0.791 0.891 0.546 0.565 0.566 0.628 (2-tailed) Z -.269b -.484b -.119c -.121b -.400c -.484c > 4,500 employees Asymp. Sig. 0.788 0.629 0.905 0.903 0.689 0.628 (2-tailed) Z -.582b -.444b -.120c -.029c -.302c -.411c Data-dominant software Asymp. Sig. 0.561 0.657 0.904 0.976 0.763 0.681 (2-tailed)

Z -1.625b -1.308b -.033c -.294c -1.254c -.962c Control-domain software Asymp. Sig. 0.104 0.191 0.974 0.768 0.21 0.336 (2-tailed) Z -.595b -1.201b -.275c -.720b -.605c -1.430c System software Asymp. Sig. Product type view type Product 0.552 0.23 0.783 0.472 0.545 0.153 (2-tailed) Z -.647b -1.192b -1.043c -.965b -1.192c -1.724c Computation- dominant software Asymp. Sig. 0.518 0.233 0.297 0.335 0.233 0.085 (2-tailed) Z .000b -1.342c -1.342c -1.342c -1.342c .000b Waterfall Asymp. Sig. 1 0.18 0.18 0.18 0.18 1 (2-tailed) Z -1.922b -.471b -1.134b -1.897c -1.268c -.279b Incremental Asymp. Sig. 0.055 0.638 0.257 0.058 0.205 0.78 (2-tailed) Z -1.000b -1.000b -1.000b -1.000b -.447b .000c Spiral Asymp. Sig. 0.317 0.317 0.317 0.317 0.655 1 (2-tailed) Software life cycle model view model cycle life Software Agile Z -2.372b -.863b -.521c -2.112c -2.167c -1.326c

Asymp. Sig. 0.018 0.388 0.602 0.035 0.03 0.185 (2-tailed)

Hybrid (agile with Z -.940b -.103c -1.152b -.829c -.383b -1.762b few plan driven Asymp. Sig. practices) 0.347 0.918 0.249 0.407 0.702 0.078 (2-tailed)

Hybrid (plan-driven Z -1.098b -.486b -.811b -1.000c -.289c -.426b with few agile Asymp. Sig. practices) 0.272 0.627 0.417 0.317 0.773 0.67 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.1.1 Global view

In this section the effectiveness of static code analysis techniques is reflected through a global view representing all survey respondents working for companies with different sizes, producing different products and using different development models. Here the survey data on effectiveness will not be filtered with respect to any specific demographic element mentioned earlier.

Effectiveness - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.22 Effectiveness – Global view

Figure 6.22 shows the agreement level for all techniques with respect to effectiveness. The agreement level is shown in the right side of the X-axis and the disagreement level on the left side of the X-axis, the different techniques are plotted on the Y-axis. All Survey respondents were asked to rate the techniques on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

Figure 6.22 shows the agreement level for all technologies with respect to effectiveness considering all survey respondents. The agreement level is shown in the right size of the X-axis and the disagreement level on the left side of the X-axis, the different technologies are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

The output of Friedman test is shown below, the mean ranks are shown for all technologies as well as the tests statistics. In the test statistics the chi square value is shown along with the degree of freedom (DF), and the significance level (Asymp. Sig.) Which is used to judge the Null Hypothesis.

Looking at the test statistics the significance value is 0.041 which is less than the threshold value set at (P=0.05) this indicates there is a significance difference in effectiveness of the 4 static code analysis technologies.

Wilcoxon tests were used to follow up this finding and perform the pairwise comparisons for each 2 techniques with respect to effectiveness, 6 pair wise tests are performed as shown below.

The Wilcoxon test is used, all effects are reported at a 0.008 level of significance. The output of the Wilcoxon is shown below and the full statistical results can be found in appendix B.

Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.370, p = 0.171) or between walkthroughs and inspection (Z = -.416, p = 0.667) or between static analysis tools and inspection (Z = -1.41, p = 0.254) or between walkthroughs and informal reviews (Z = -1.103, p = 0.270) or between static analysis tools and informal reviews (Z = -1.913, p = 0.056) or between static analysis tools and walkthroughs (Z = -1.412, p = 0.158).

Conclusion for the global view: For perceived effectiveness, overall industrial practitioners perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.2 Company size view

In this this section the perceived effectiveness is filtered with respect to company size to see if the company size have an influence on the practitioners opinion on the effectiveness of different static code analysis techniques, and if the company size view differs or not from the global view.

6.2.3.1.2.1 Companies with less than 50 employees

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 3.994, p = 0.262). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.875, p = 0.382) or between walkthroughs and inspection (Z = - .918, p = .359) or between walkthroughs and informal reviews (Z = -2.230, p = 0.026) or between static analysis tools and inspection (Z = -.444, p = .657) or between static analysis tools and informal reviews (Z = -1.054, p = .292) or between static analysis tools and walkthroughs (Z = -.456, p = .648).

Conclusions for companies with less than 50 employees: Industrial practitioners working for companies with less than 50 employees perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.2.2 Companies with 50 – 249 employees

Friedman test indicated significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 14.660, p = 0.002). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.466, p = 0.143) or between walkthroughs and inspection (Z = -.918, p = .359) or between walkthroughs and informal reviews (Z = -1.807, p = 0.071) or between static analysis tools and inspection (Z = -1.613, p = .107) or between static analysis tools and informal reviews (Z = -2.195, p = .028), but a significant difference for static analysis tools over walkthroughs (Z = -2.913, p = .004).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50 – 249 employees perceive that static analysis tools are significantly more effective than walkthroughs, but not significantly more effective than inspections and informal reviews. The techniques can be ranked as follows starting with the most effective:

1. Static analysis tools, inspection, informal reviews 2. Walkthroughs

Effectivess - Company size view (less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Effectivess - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Company size view (250 - 2,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Company size view (More than 4,500 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.23 Effectiveness Likert chart – Company size view

6.2.3.1.2.3 Companies with 250-4,499 employees

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = .468, p = 0.926). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.265, p = 0.791) or between walkthroughs and inspection (Z = - .137, p = .891) or between walkthroughs and informal reviews (Z = -.575, p = 0.565) or between static analysis tools and inspection (Z = -.604, p = .546) or between static analysis tools and informal reviews (Z = -.575, p = .566), or between static analysis tools over walkthroughs (Z = -.458, p = .628).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 250 – 4,499 employees perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.2.4 Companies with more than 4,500 employees

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = .195, p = 0.978). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.269, p = 0.788) or between walkthroughs and inspection (Z = - .484, p = .629) or between walkthroughs and informal reviews (Z = -.121, p = 0.903) or between static analysis tools and inspection (Z = -.119, p = .905) or between static analysis tools and informal reviews (Z = -.400, p = .689), or between static analysis tools over walkthroughs (Z = -.484, p = .628).

Conclusions for companies with more than ,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.3 Software product type view

In this this section the perceived effectiveness is filtered with respect to the software product produced by the company, to see if the software product type size have an influence on the perceived effectiveness of different static code analysis techniques.

6.2.3.1.3.1 Companies producing data-dominant software

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 1.624, p = 0.654). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.582, p = 0.561) or between walkthroughs and inspection (Z = - .444, p = .657) or between walkthroughs and informal reviews (Z = -.029, p = 0.967) or between static analysis tools and inspection (Z = -.120, p = .904) or between static analysis tools and informal reviews (Z = -.302, p = .763), or between static analysis tools over walkthroughs (Z = -.411, p = .681).

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that none of the four static code analysis techniques is significantly more effective than the others.

Effectiveness - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Product type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.24 Effectiveness Likert chart - Product type view

6.2.3.1.3.2 Companies producing control-domain software

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = .3771, p = 0.287). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.625, p = 0.104) or between walkthroughs and inspection (Z = -1.308, p = .191) or between walkthroughs and informal reviews (Z = -.294, p = 0.768) or between

84 static analysis tools and inspection (Z = -.033, p = .974) or between static analysis tools and informal reviews (Z = -1.254, p = .210), or between static analysis tools and walkthroughs (Z = -.962, p = .336).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.3.3 Companies producing system software

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 3.049, p = 0.384). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.595, p = 0.552) or between walkthroughs and inspection (Z = - 1.201, p = .230) or between walkthroughs and informal reviews (Z = -.720, p = 0.472) or between static analysis tools and inspection (Z = -.275, p = .783) or between static analysis tools and informal reviews (Z = -.605, p = .545), or between static analysis tools and walkthroughs (Z = -1.430, p = .153).

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.3.4 Companies producing computation-dominant software

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 2.072, p = 0.558). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.674, p = 0.518) or between walkthroughs and inspection (Z = - 1.192, p = .233) or between walkthroughs and informal reviews (Z = -.965, p = 0.335) or between static analysis tools and inspection (Z = -1.043, p = .297) or between static analysis tools and informal reviews (Z = -1.192, p = .233), or between static analysis tools and walkthroughs (Z = -1.724, p = .085).

Conclusions for companies producing computation-dominant software: industrial practitioners working for companies producing computation-dominant software perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.4 Software life cycle model view

In this this section the perceived effectiveness is filtered with respect to the software life cycle model used by the company, and to see if the software life cycle model used have an influence on the perceived effectiveness of different static code analysis techniques.

Effectiveness - Software life cycle model view (Waterfall)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Software life cycle model view (Incremental)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Software life cycle model view (Agile)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Software life cycle model view (Agile-dominant hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Effectiveness - Software life cycle model view (Plan-driven hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis Figure 6.25 Effectiveness Likert chart - Software like cycle model view

6.2.3.1.4.1 Waterfall

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 4.500, p = 0.212). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.000, p = 1.000) or between walkthroughs and inspection (Z = - 1.342, p = .180) or between walkthroughs and informal reviews (Z = -1.342, p = 0.180) or between static analysis tools and inspection (Z = -1.342, p = .180) or between static analysis tools and informal reviews (Z = -1.342, p = .180), or between static analysis tools and walkthroughs (Z = -.000, p = 1.000).

Conclusions for companies using waterfall as a life cycle model: industrial practitioners working for companies using waterfall model perceive that none of the four static code analysis techniques is significantly more effective than the others

6.2.3.1.4.2 Incremental model

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 4.402, p = 0.221). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.922, p = .055) or between walkthroughs and inspection (Z = - .471, p = .638) or between walkthroughs and informal reviews (Z = -1.897, p = 0.058) or between static analysis tools and inspection (Z = -1.134, p = .257) or between static analysis tools and informal reviews (Z = -1.268, p = .205), or between static analysis tools and walkthroughs (Z = -.279, p = .780).

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies using incremental model perceive that none of the four static code analysis techniques is significantly more effective than the others

6.2.3.1.4.3 Spiral

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 1.400, p = 0.706). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = 1-.000, p = .317) or between walkthroughs and inspection (Z = - 1.000, p = .317) or between walkthroughs and informal reviews (Z = -1.000, p = 0.317) or between static analysis tools and inspection (Z = -1.000, p = .317) or between static analysis tools and informal reviews (Z = -.447, p = .655), or between static analysis tools and walkthroughs (Z = -.000, p = 1.000).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that none of the four static code analysis techniques is significantly more effective than the others

6.2.3.1.4.4 Agile

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 11.818, p = 0.008). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -2.372, p = .018) or between walkthroughs and inspection (Z = - .863, p = .388) or between walkthroughs and informal reviews (Z = -2.112, p = 0.035) or between static

87 analysis tools and inspection (Z = -.521, p = .602) or between static analysis tools and informal reviews (Z = -2.167, p = .030), or between static analysis tools and walkthroughs (Z = -1.326, p = .185).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 3.877, p = 0.275). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.940, p = .347) or between walkthroughs and inspection (Z = - .103, p = .918) or between walkthroughs and informal reviews (Z = -.829, p = 0.407) or between static analysis tools and inspection (Z = -1.152, p = .249) or between static analysis tools and informal reviews (Z = -.383, p = .702), or between static analysis tools and walkthroughs (Z = -1.762, p = .078).

Conclusions: Industrial practitioners working for companies using Hybrid process (dominated by agile practices, with few plan-driven practices perceive that none of the four static code analysis techniques is significantly more effective than the others.

6.2.3.1.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated no significant difference in perceived effectiveness of different static code analysis techniques (χ2(2) = 1.467, p = 0.690). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.098, p = .272) or between walkthroughs and inspection (Z = - .486, p = .627) or between walkthroughs and informal reviews (Z = -1.000, p = 0.317) or between static analysis tools and inspection (Z = -.811, p = .417) or between static analysis tools and informal reviews (Z = -.289, p = .773), or between static analysis tools and walkthroughs (Z = -.426, p = .670).

6.2.3.2 Number of false positives

In this section, different views on the number of false positives produced by different static code analysis techniques will be presented, we will filter practitoners opinion on number of false positives the same way we did with effectiveness in the previous section, section 6.2.3.1.

Table 6.7 Number of false positives - Friedman test statistics

Chi- Asymp. N df Square Sig. Global View 97 30.859 3 0

< 50 employees 27 20.424 3 0 50 – 249 employees 21 7.468 3 0.058 250 - 4,499 employees 16 1.33 3 0.722 size view size Company Company > 4,500 employees 33 14.033 3 0.003

Data-dominant software 57 11.189 3 0.011 Control-domain software 26 18.789 3 0 System software 26 10.603 3 0.014 Product Product type view type Computation-dominant software 14 5.755 3 0.124

Waterfall 6 6.429 3 0.093 Incremental 15 11.66 3 0.009 Spiral 2 4 3 0.261 Agile 28 5.899 3 0.117

Software life life Software Hybrid (agile with few plan driven practices) 24 2.79 3 0.425 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 13.011 3 0.005

Table 6.8 Number of false positives - Wilcoxen signed rank test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -.167b .000c -4.255b -.476d -3.603b -4.092b Global View Asymp. Sig. 0.867 1 0 0.634 0 0 (2-tailed) Z -1.327b -.632b -3.232b -.849c -2.329b -3.351b < 50 employess Asymp. Sig. 0.185 0.527 0.001 0.396 0.02 0.001 (2-tailed) Z -1.941b -.884b -2.364b -1.069c -.775b -1.238b 50 – 249 employees Asymp. Sig. 0.052 0.377 0.018 0.285 0.438 0.216 (2-tailed) Z -.921b -.106b -.366c -.689c -.955c -.184c 250 - 4,499 employees Asymp. Sig. Company size view size Company 0.357 0.916 0.715 0.491 0.34 0.854 (2-tailed) Z -1.530b -1.072b -2.310c -.225c -2.787c -2.772c > 4,500 employees Asymp. Sig. 0.126 0.284 0.021 0.822 0.005 0.006 (2-tailed) Z -.126b -.971b -2.941b -.731b -2.549b -2.032b Data-dominant software Asymp. Sig. 0.9 0.331 0.003 0.465 0.011 0.042 (2-tailed)

Z -.423b .000c -3.463b -.553d -2.719b -3.356b Control-domain software Asymp. Sig. 0.672 1 0.001 0.58 0.007 0.001 (2-tailed) Z -1.467b -.723b -2.998b -1.069c -1.581b -1.837b System software Asymp. Sig. Product type view type Product 0.142 0.47 0.003 0.285 0.114 0.066 (2-tailed) Z -.184b -.359c -1.941b -.905c -1.112b -1.674b Computation- dominant software Asymp. Sig. 0.854 0.72 0.052 0.366 0.266 0.094 (2-tailed)

Z -1.000b -1.342b -1.414c -1.000b -1.342c -1.633c Waterfall Asymp. Sig. 0.317 0.18 0.157 0.317 0.18 0.102 (2-tailed) Z -1.000b -1.100b -2.405c -.333b -2.631c -2.504c Incremental

Asymp. Sig. 0.317 0.271 0.016 0.739 0.009 0.012 (2-tailed) Z .000b -1.000c -1.000d -1.000c -1.000d -1.414d Spiral Asymp. Sig. 1 0.317 0.317 0.317 0.317 0.157 (2-tailed) Z -.690b -1.039b -1.567b -.091c -.562b -.912b Agile Asymp. Sig. 0.49 0.299 0.117 0.928 0.574 0.362 (2-tailed)

Software life cycle model view model cycle life Software Hybrid (agile with Z -.782b -.612b -1.753b -.247c -.794b -.956b few plan driven Asymp. Sig. practices) 0.434 0.54 0.08 0.805 0.427 0.339 (2-tailed)

Hybrid (plan-driven Z -.333b -.632b -2.289c -.302b -2.507c -2.714c with few agile Asymp. Sig. practices) 0.739 0.527 0.022 0.763 0.012 0.007 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.2.1 Global view

False positives - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.26 False positives generated by different static analysis techniques – Global view

Figure 6.23 shows the agreement level for all technologies with respect to the number of false positives produced considering all survey respondents. The agreement level is shown in the right size of the X-axis and the disagreement level on the left side of the X-axis, the different technologies are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 30.589, p = 0.000). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a

90 significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.167, p = .867) or between walkthroughs and inspection (Z = -.000, p = 1.000) or between walkthroughs and informal reviews (Z = -.467, p = 0.634), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = -4.255, p = .000) and for static analysis tools over informal reviews (Z = -3.603, p = .000), and for static analysis tools over walkthroughs (Z = -4.092, p = .000).

Conclusions for the global view: Overall industrial practitioners perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing few number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.2.2 Company size view

6.2.3.2.2.1 Companies with less than 50 employees

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 20.424, p = 0.000). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.327, p = .185) or between walkthroughs and inspection (Z = -.632, p = .527) or between walkthroughs and informal reviews (Z = -.849, p = 0.396), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = -3.232, p = .001) and for static analysis tools over informal reviews (Z = -2.329, p = .020), and for static analysis tools over walkthroughs (Z = -3.351, p = .001).

Conclusions for companies with less than 50 employees: industrial practitioners working for companies with less than 50 employees perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.2.2.2 Companies with 50 – 249 employees

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 7.468, p = 0.058). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -1.941, p = .052) or between walkthroughs and inspection (Z = -.884, p = .377) or between walkthroughs and informal reviews (Z = -1.069, p = 0.285), or between static analysis tools and inspection (Z = -2.364, p = .018) or between static analysis tools and informal reviews (Z = -.775, p = .438), or between static analysis tools and walkthroughs (Z = -1.238, p = .216).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50 – 249 employees perceive that none of the static code analysis techniques

91 significantly produce a higher number of false positives than the others. The techniques can be ranked in the following ways.

1. Static analysis tools, inspection, informal reviews, walkthroughs

False positives - Company size view (Less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Company size view (250 - 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Company size view (More than 4,500 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.27 Number of false positives Likert chart – Company size view

6.2.3.2.2.3 Companies with 250-4,499 employees

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 1.330, p = 0.722). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -.921, p = .357) or between walkthroughs and inspection (Z = -.106, p = .916) or between walkthroughs and informal reviews (Z = -.689, p = 0.491), or between static analysis tools and inspection (Z = -.366, p = .715) or between static analysis tools and informal reviews (Z = -.955, p = .340), or between static analysis tools and walkthroughs (Z = -.184, p = .854).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 50-4,499 employees perceive that none of the static code analysis techniques significantly produce a higher number of false positives than the others. The techniques can be ranked in the following ways.

1. Static analysis tools, inspection, informal reviews, Walkthroughs

6.2.3.2.2.4 Companies with more than 4,500 employees

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 14.033, p = 0.003). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.530, p = .126) or between walkthroughs and inspection (Z = -1.072, p = .284) or between walkthroughs and informal reviews (Z = -.225, p = 0.822), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = -2.310, p = .021) and for static analysis tools over informal reviews (Z = -2.787, p = .005), and for static analysis tools over walkthroughs (Z = -2.772, p = .006).

Conclusions for companies with more than 4,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.2.3 Software product type view

In this this section the data set of perceived number of false positives is filtered with respect to the software product produced by the company, and to see if the software product type size have an influence on the perceived number of false positives of different static code analysis techniques.

6.2.3.2.3.1 Companies producing data-dominant software Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 11.189, p = 0.011). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.126, p = .900) or between walkthroughs and inspection (Z = -.971, p = .331) or between walkthroughs and informal reviews (Z = -.731, p = 0.465), but a statistically significant difference in the number of false positives produced by

93 static analysis tools over inspection (Z = -2.941, p = .003) and for static analysis tools over informal reviews (Z = -2.549, p = .011), and for static analysis tools over walkthroughs (Z = -2.032, p = .042).

False positives - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Product type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.28 Number of false positives Likert chart - Product type view

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives. 1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.2.3.2 Companies producing control-domain software

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 18.789, p = 0.000). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.423, p = .672) or between walkthroughs and inspection (Z = -.000, p = 1.000) or between walkthroughs and informal reviews (Z = -.553, p = 0.580), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = -3.463, p = .001) and for static analysis tools over informal reviews (Z = -2.719, p = .007), and for static analysis tools over walkthroughs (Z = -3.356, p = .001).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.2.3.3 Companies producing system software

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 10.603, p = 0.014). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.467, p = .142) or between walkthroughs and inspection (Z = -.723, p = .470) or between walkthroughs and informal reviews (Z = -1.069, p = 0.285), or between static analysis tools and informal reviews (Z = -1.581, p = .114), or between static analysis tools over walkthroughs (Z = -1.837, p = .066), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = - 2.998, p = .003)

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.2.3.4 Companies producing computation-dominant software

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 5.775, p = 0.124). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -.184, p = .854) or between walkthroughs and inspection (Z = -.359, p = .720) or between walkthroughs and informal reviews (Z = -.905, p = 0.366), or between static analysis tools and inspection (Z = -1.941, p = .052) or between static analysis tools and informal reviews (Z = -1.112, p = .266), or between static analysis tools and walkthroughs (Z = -1.674, p = .094).

Conclusions for companies producing computation-dominant software: Industrial practitioners working for companies producing computation-dominant software perceive that that none of the static code analysis techniques significantly produce a higher number of false positives than the others. The techniques can be ranked in the following ways.

1. Static analysis tools, inspection, informal reviews, Walkthroughs

6.2.3.2.4 Software life cycle model view

In this this section the data set on perceived number of false positives is filtered with respect to the software life cycle model used by the company, and to see if the software life cycle model used have an influence on the perceived number of false positives of different static code analysis techniques.

6.2.3.2.4.1 Waterfall

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 6.429, p = 0.093). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -1.000, p = .317) or between walkthroughs and inspection (Z = -1.342, p = .180) or between walkthroughs and informal reviews (Z = -1.000, p = 0.317), or between static analysis tools and inspection (Z = -1.414, p = .157) or between static analysis tools and informal reviews (Z = -1.342, p = .180), or between static analysis tools and walkthroughs (Z = -1.633, p = .102).

Conclusions for companies using waterfall model as a life cycle model: industrial practitioners working for companies using the waterfall model perceive that none of the static code analysis techniques significantly produce a higher number of false positives than the others. The techniques can be ranked in the following way.

1. Static analysis tools, inspection, informal reviews, Walkthroughs

6.2.3.2.4.2 Incremental model

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 11.660, p = 0.009). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -1.000, p = .317) or between walkthroughs and inspection (Z = -1.100, p = .271) or between walkthroughs and informal reviews (Z = -.333, p = 0.739), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = -2.998, p = .003) and for static analysis tools over informal reviews (Z = -2.631, p = .009), and for static analysis tools over walkthroughs (Z = -2.504, p = .012),

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies producing system software perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

False positives - Software life cycle model view (Waterfall model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Software life cycle model view (Incremental model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Software life cycle model view (Agile model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Software life cycle model view (Agile-dominated hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

False positives - Software life cycle model view (Plan-driven hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis

Figure 6.29 Number of false positives Likert chart - Software life cycle model view

6.2.3.2.4.3 Spiral model

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 4.000, p = 0.261). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -.000, p = 1.000) or between walkthroughs and inspection (Z = -1.000, p = .317) or between walkthroughs and informal reviews (Z = -1.000, p = 0.317), or between static analysis tools and inspection (Z = -1.000, p = .317) or between static analysis tools and informal reviews (Z = -1.000, p = .317), or between static analysis tools and walkthroughs (Z = -1.414, p = .157).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that that none of the static code analysis techniques significantly produce a higher number of false positives than the others. The techniques can be ranked in the following way.

1. Static analysis tools, inspection, informal reviews, Walkthroughs

6.2.3.2.4.4 Agile

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 5.899, p = 0.117). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -.690, p = .490) or between walkthroughs and inspection (Z = -1.039, p = .299) or between walkthroughs and informal reviews (Z = -.091, p = 0.928), or between static analysis tools and inspection (Z = -1.567, p = .117) or between static analysis tools and informal reviews (Z = -.562, p = .574), or between static analysis tools and walkthroughs (Z = -.912, p = .362).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that none of the static code analysis techniques significantly produce a higher number of false positives than the others. The techniques can be ranked in the following way.

1. Static analysis tools, inspection, informal reviews, Walkthroughs

6.2.3.2.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated no significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 2.790, p = 0.425). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in produced number of false positives between informal reviews and inspection (Z = -.782, p = .434) or between walkthroughs and inspection (Z = -.612, p = .540) or between walkthroughs and informal reviews (Z = -.247, p = 0.805), or between static analysis tools and inspection (Z = -1.753, p = .080) or between static analysis tools and informal reviews (Z = -.794, p = .427), or between static analysis tools and walkthroughs (Z = -.956, p = .339).

Conclusions for companies using hybrid process (dominated by agile practices, with few plan- driven practices): Industrial practitioners working for companies using Hybrid process (dominated by agile practices, with few plan-driven practices perceive that none of the static code analysis techniques significantly produce a higher number of false positives than the others. The techniques can be ranked in the following way:

1. Static analysis tools, inspection, informal reviews, walkthroughs

6.2.3.2.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 13.011, p = 0.005). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.333, p = .739) or between walkthroughs and inspection (Z = -.632, p = .527) or between walkthroughs and informal reviews (Z = -.302, p = 0.763), but a statistically significant difference in the number of false positives produced by static analysis tools over inspection (Z = -2.289, p = .022) and for static analysis tools over informal reviews (Z = -2.507, p = .012), and for static analysis tools over walkthroughs (Z = -2.714, p = .007),

Conclusions for companies using hybrid process (dominated by plan-driven practices, with few agile practices): Industrial practitioners working for companies using Hybrid process dominated by plan-driven practices, with few agile practices perceive that static analysis tools significantly produce higher number of false positives as compared to inspections, informal reviews and walkthroughs. The techniques can be ranked as follows starting with the ones producing the fewer number of false positives.

1. Inspections, informal reviews, walkthroughs 2. Static analysis tools.

6.2.3.3 Fault content

In this section, different views on the fault content which can be detected by different static code analysis techniques will be presented.

Table 6.9 Fault content – Friedman test statistics

Chi- Asymp. N df Square Sig. Global View 97 13.328 3 0.004

< 50 employess 27 6.822 3 0.078 50 – 249 employees 21 10.296 3 0.016 250 - 4,499 employees 16 0.568 3 0.904 size view size Company Company > 4,500 employees 33 3.871 3 0.276

Data-dominant software 57 1.765 3 0.623 Control-domain software 26 6.695 3 0.082 System software 26 4.803 3 0.187 Product Product type view type Computation-dominant software 14 4.618 3 0.202

Waterfall 6 7.615 3 0.055 Incremental 15 6.989 3 0.072 Spiral 2 1.286 3 0.733 Agile 28 7.652 3 0.054

Software life life Software Hybrid (agile with few plan driven practices) 24 1.538 3 0.674 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 5.964 3 0.113

Table 6.10 Fault content - Wilcoxen signed ranked test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -.874b -.062c -2.121c -1.024c -2.524c -1.797c Global View Asymp. Sig. 0.382 0.95 0.034 0.306 0.012 0.072 (2-tailed) Z -.758b -.565c -1.602c -1.941c -1.719c -.953c < 50 employess Asymp. Sig. 0.449 0.572 0.109 0.052 0.086 0.341 (2-tailed) Z -1.826b -2.179b -1.221c -.265b -2.310c -2.184c 50 – 249 employees Asymp. Sig. 0.068 0.029 0.222 0.791 0.021 0.029 (2-tailed) Z -.439b -.471b -.275b -.491b -.240b -.122c 250 - 4,499 employees Asymp. Sig. Company size view size Company 0.66 0.638 0.783 0.623 0.81 0.903 (2-tailed) Z -.626b -.737b -1.246b -.244b -.920b -.537b > 4,500 employees Asymp. Sig. 0.531 0.461 0.213 0.807 0.358 0.591 (2-tailed) Z -.217b -.413b -.758b -.178b -.708b -.310b Data-dominant software Asymp. Sig. 0.828 0.679 0.449 0.859 0.479 0.756 (2-tailed)

Z -1.513b -.645b -1.324c -.566c -2.024c -1.353c Control-domain software Asymp. Sig. 0.13 0.519 0.185 0.572 0.043 0.176 (2-tailed) Z -.595b -.816b -1.342c -.291b -1.469c -1.556c System software Asymp. Sig. Product type view type Product 0.552 0.415 0.18 0.771 0.142 0.12 (2-tailed) Z -.879b -.977b -1.513c -.351b -1.897c -1.768c Computation- dominant software Asymp. Sig. 0.38 0.329 0.13 0.725 0.058 0.077 (2-tailed) Z -1.414b -1.414c -1.342c -2.000c -1.890c -.577c Waterfall Asymp. Sig. 0.157 0.157 0.18 0.046 0.059 0.564 (2-tailed) Z -1.725b -.277c -1.100c -2.251c -2.326c -.551c Incremental Asymp. Sig. 0.084 0.782 0.271 0.024 0.02 0.582 (2-tailed) life cycle model view model cycle life Z -1.000b -1.000b -1.000b .000c -.447b -.447b Spiral Asymp. Sig. 0.317 0.317 0.317 1 0.655 0.655 (2-tailed) Software Agile Z -1.728b -1.383b -.741c -.250c -2.172c -2.287c

100

Asymp. Sig. 0.084 0.167 0.458 0.803 0.03 0.022 (2-tailed)

Hybrid (agile with Z -.728b -.553c -.406b -1.417c -.365c -.669b few plan driven Asymp. Sig. practices) 0.467 0.58 0.685 0.156 0.715 0.503 (2-tailed)

Hybrid (plan-driven Z -.264b -1.190b -2.126b -1.134b -1.408b -.703b with few agile Asymp. Sig. practices) 0.792 0.234 0.033 0.257 0.159 0.482 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.3.1 Global view

Fault content - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.30 Fault content – Global view

Figure 6.30 shows the agreement level for all technologies with respect to the type of faults which can be detected by different static code analysis techniques considering all survey respondents. The agreement level is shown in the right size of the X-axis and the disagreement level on the left side of the X-axis, the different technologies are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

Friedman test indicated a statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 13.328, p = 0.004). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.874, p = .382) or between walkthroughs and inspection (Z = -.062, p = .950) or between walkthroughs and informal reviews (Z = -1.024, p = 0.306), or between static analysis tools and inspection (Z = -2.121, p = .034), or between static analysis tools and informal reviews (Z = -2.524, p = .012), or between static analysis tools and walkthroughs (Z = -1.797, p = .072).

Conclusions for the global view: Overall industrial practitioners perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

101

6.2.3.3.2 Company size view

Fault content - Company size view (Less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Company size view (250 - 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Company size view (More than 4,500 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.31 Fault content Likert chart - Company size view

6.2.3.3.2.1 Companies with less than 50 employees

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 6.822, p = 0.078). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.758, p = .449) or

102 between walkthroughs and inspection (Z = -.565, p = .572) or between walkthroughs and informal reviews (Z = -1.941, p = 0.052), or between static analysis tools and inspection (Z = -1.602, p = .109), or between static analysis tools and informal reviews (Z = -1.719, p = .086), or between static analysis tools and walkthroughs (Z = -.953, p = .341).

Conclusions for companies with less than 50 employees: industrial practitioners working for companies with less than 50 employees perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.3.2.2 Companies with 50 – 249 employees

Friedman test indicated a statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 10.296, p = 0.016). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.826, p = .068) or between walkthroughs and inspection (Z = -2.179, p = .029) or between walkthroughs and informal reviews (Z = -.265, p = 0.791), or between static analysis tools and inspection (Z = -1.221, p = .222), or between static analysis tools and informal reviews (Z = -2.310, p = .021), or between static analysis tools and walkthroughs (Z = -2.184, p = .029).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50 – 249 employees perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.3.2.3 Companies with 250-4,499 employees

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = .568, p = 0.904). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.439, p = .660) or between walkthroughs and inspection (Z = -.471, p = .638) or between walkthroughs and informal reviews (Z = -.491, p = 0.623), or between static analysis tools and inspection (Z = -.275, p = .783), or between static analysis tools and informal reviews (Z = -.240, p = .810), or between static analysis tools and walkthroughs (Z = -.122, p = .903).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 50 - 4,499 employees perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.3.2.4 Companies with more than 4,500 employees

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 3.871, p = 0.276). Post hoc analysis

103 with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.626, p = .531) or between walkthroughs and inspection (Z = -.737, p = .461) or between walkthroughs and informal reviews (Z = -.244, p = 0.807), or between static analysis tools and inspection (Z = -1.246, p = .213), or between static analysis tools and informal reviews (Z = -.920, p = .358), or between static analysis tools and walkthroughs (Z = -.537, p = .591).

Conclusions for companies with more than 4,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.3 Software product type view

In this this section the data set of fault content is filtered with respect to the software product produced by the company, and to see if the software product type size have an influence on the perceived fault content captured by different static code analysis techniques.

6.2.3.3.3.1 Companies producing data-dominant software

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 1.765, p = 0.623). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.217, p = .828) or between walkthroughs and inspection (Z = -.413, p = .679) or between walkthroughs and informal reviews (Z = -.178, p = 0.859), or between static analysis tools and inspection (Z = -.758, p = .449), or between static analysis tools and informal reviews (Z = -.708, p = .479), or between static analysis tools and walkthroughs (Z = -.310, p = .756).

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.3.2 Companies producing control-domain software

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 6.695, p = 0.082). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.513, p = .130) or between walkthroughs and inspection (Z = -.654, p = .519) or between walkthroughs and informal reviews (Z = -.566, p = 0.572), or between static analysis tools and inspection (Z = -1.324, p = .185), or between static analysis tools and informal reviews (Z = -2.024, p = .043), or between static analysis tools and walkthroughs (Z = -1.353, p = .176).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that none of the static code analysis techniques

104 is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

Fault content - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Product type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.32 Fault content Likert chart - Product type view

6.2.3.3.3.3 Companies producing system software

Friedman test indicated a significant difference in perceived number of false positives produced by different static code analysis techniques (χ2(2) = 4.803, p = 0.187). Post hoc analysis with Wilcoxon

105 signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived effectiveness between informal reviews and inspection (Z = -.595, p = .552) or between walkthroughs and inspection (Z = -.816, p = .415) or between walkthroughs and informal reviews (Z = -.291, p = 0.771), or between static analysis tools and inspection (Z = -1.342, p = .180), or between static analysis tools and informal reviews (Z = -1.469, p = .142), or between static analysis tools and walkthroughs (Z = -1.556, p = .120).

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.3.4 Companies producing computation-dominant software

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 4.618, p = 0.202). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.879, p = .380) or between walkthroughs and inspection (Z = -.977, p = .329) or between walkthroughs and informal reviews (Z = -.351, p = 0.725), or between static analysis tools and inspection (Z = -1.513, p = .130), or between static analysis tools and informal reviews (Z = -1.897, p = .058), or between static analysis tools and walkthroughs (Z = -1.768, p = .077).

Conclusions for companies producing computation-dominant software: Industrial practitioners working for companies producing computation-dominant software perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

2. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.3.4 Software life cycle model view

In this this section the data set on perceived fault content is filtered with respect to the software life cycle model used by the company, and to see if the software life cycle model used have an influence on the perceived fault content captured by different static code analysis techniques.

6.2.3.3.4.1 Waterfall model

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 7.615, p = 0.055). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.414, p = .157) or between walkthroughs and inspection (Z = -1.414, p = .157) or between walkthroughs and informal reviews (Z = -2.000, p = 0.046), or between static analysis tools and inspection (Z = -1.342, p = .180), or between static analysis tools and informal reviews (Z = -1.890, p = .059), or between static analysis tools and walkthroughs (Z = -.577, p = .564).

106

Fault content - Software life cycle model view (Waterfall)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Software life cycle model view (Incremental)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Software life cycle model view (Agile)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Fault content - Software life cycle model view (Plan-driven hybrid process)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis

Fault content - Software life cycle model view (Agile-dominant hybrid process)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Figure 6.33 Fault content Likert chart - Software life cycle model view

107

Conclusions for companies using waterfall model as a life cycle model: industrial practitioners working for companies using the waterfall model perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.4.2 Incremental model

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 6.989, p = 0.072). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.725, p = .084) or between walkthroughs and inspection (Z = -.277, p = .782) or between walkthroughs and informal reviews (Z = -2.251, p = 0.024), or between static analysis tools and inspection (Z = -1.100, p = .271), or between static analysis tools and informal reviews (Z = -2.326, p = .020), or between static analysis tools and walkthroughs (Z = -.551, p = .582).

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.4.3 Spiral model

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 1.286, p = 0.733). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.000, p = .317) or between walkthroughs and inspection (Z = -1.000, p = .317) or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and inspection (Z = -1.000, p = .317), or between static analysis tools and informal reviews (Z = -.447, p = .655), or between static analysis tools and walkthroughs (Z = -.447, p = .655).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.4.4 Agile model

Friedman test indicated no significant difference in perceived fault content captured by different static code analysis techniques (χ2(2) = 7.652, p = 0.054). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content

108 between informal reviews and inspection (Z = -1.728, p = .084) or between walkthroughs and inspection (Z = -1.383, p = .167) or between walkthroughs and informal reviews (Z = -.250, p = 0.803), or between static analysis tools and inspection (Z = -.741, p = .458) or between static analysis tools and informal reviews (Z = -2.172, p = .030), or between static analysis tools and walkthroughs (Z = - 2.287, p = .022).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.3.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 1.538, p = 0.674). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.728, p = .467) or between walkthroughs and inspection (Z = -.553, p = .580) or between walkthroughs and informal reviews (Z = -1.417, p = .156), or between static analysis tools and inspection (Z = -.406, p = .685), or between static analysis tools and informal reviews (Z = -.365, p = .715), or between static analysis tools and walkthroughs (Z = -.669, p = .503).

Conclusions for companies using Hybrid process (dominated by agile practices, with few plan- driven practices): Industrial practitioners working for companies using Hybrid process (dominated by agile practices, with few plan-driven practices perceive that that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.3.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated no statistically significant difference in the perceived fault content which can be detected by different static code analysis techniques (χ2(2) = 5.964, p = 0.113). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.264, p = .792) or between walkthroughs and inspection (Z = -1.190, p = .234) or between walkthroughs and informal reviews (Z = -1.134, p = .257), or between static analysis tools and inspection (Z = -2.126, p = .033), or between static analysis tools and informal reviews (Z = -1.408, p = .159), or between static analysis tools and walkthroughs (Z = -.703, p = .482).

Conclusions for companies using Hybrid process (dominated by plan-driven practices, with few agile practices): Industrial practitioners working for companies using Hybrid process dominated by plan-driven practices, with few agile practices perceive that none of the static code analysis techniques is significantly capable of detecting more types of defects than the others. The techniques can be ranked in the following way starting with the most capable technique of detecting various type of defects:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

109

6.2.3.4 Cost efficiency

In this section, different views on the cost efficiency of different static code analysis techniques will be presented.

Table 6.11 - Cost efficiency - Friedman test statistics

Chi- Asymp. N df Square Sig. Global View 97 65.287 3 0

< 50 employess 27 17.314 3 0.001 50 – 249 employees 21 11.635 3 0.009 250 - 4,499 employees 16 7.19 3 0.066 size view size Company Company > 4,500 employees 33 35.498 3 0

Data-dominant software 57 32.783 3 0 Control-domain software 26 27.362 3 0 System software 26 8.302 3 0.04 Product Product type view type Computation-dominant software 14 7.722 3 0.052

Waterfall 6 4.174 3 0.243 Incremental 15 2.333 3 0.506 Spiral 2 3 3 0.392 Agile 28 31.684 3 0

Software life life Software Hybrid (agile with few plan driven practices) 24 12.271 3 0.007 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 21.255 3 0

Table 6.12 Cost efficiency - Wilcoxen signed rank test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -2.566b -.784b -6.097b -2.481c -5.012b -6.416b Global View Asymp. Sig. 0.01 0.433 0 0.013 0 0 (2-tailed) Z -.664b -1.080b -3.124b -.966b -3.118b -3.019b < 50 employess Asymp. Sig. 0.506 0.28 0.002 0.334 0.002 0.003 (2-tailed) Z -.642b -.263c -2.729b -1.208c -1.750b -2.954b 50 – 249 employees Asymp. Sig. 0.521 0.793 0.006 0.227 0.08 0.003 (2-tailed) Z -1.292b -1.027b -2.386b -.275c -2.145b -2.019b 250 - 4,499 employees Asymp. Sig. Company size view size Company 0.196 0.305 0.017 0.783 0.032 0.043 (2-tailed) Z -2.847b -.294c -3.814b -3.439c -3.039b -4.484b > 4,500 employees Asymp. Sig. 0.004 0.769 0 0.001 0.002 0 (2-tailed)

110

Z -2.847b -.923b -4.583b -2.272c -3.055b -4.499b Data-dominant software Asymp. Sig. 0.004 0.356 0 0.023 0.002 0 (2-tailed)

Z -1.478b -.570c -3.540b -2.423c -3.211b -3.981b Control-domain software Asymp. Sig. 0.14 0.569 0 0.015 0.001 0 (2-tailed) Z -.220b -.395c -2.300b -.894c -2.225b -2.709b System software Asymp. Sig. Product type view type Product 0.826 0.693 0.021 0.371 0.026 0.007 (2-tailed) Z -1.285b -.425c -2.274b -1.642c -1.357b -2.431b Computation- dominant software Asymp. Sig. 0.199 0.671 0.023 0.101 0.175 0.015 (2-tailed) Z -1.000b -1.342b -1.730b .000c -1.342b -1.134b Waterfall Asymp. Sig. 0.317 0.18 0.084 1 0.18 0.257 (2-tailed) Z -1.098b -.277b -1.818b -.666c -1.084b -1.405b Incremental

Asymp. Sig. 0.272 0.782 0.069 0.506 0.279 0.16 (2-tailed) Z -.447b -.447b -1.000b .000c -1.414b -1.414b Spiral Asymp. Sig. 0.655 0.655 0.317 1 0.157 0.157 (2-tailed) Z -1.018b -.512c -3.746b -1.908c -3.736b -4.400b Agile Asymp. Sig. 0.309 0.609 0 0.056 0 0 (2-tailed)

Software life cycle model view model cycle life Software Hybrid (agile with Z -1.534b -.551b -2.931b -1.563c -1.814b -3.098b few plan driven Asymp. Sig. practices) 0.125 0.582 0.003 0.118 0.07 0.002 (2-tailed)

Hybrid (plan-driven Z -1.582b -.973b -3.095b -1.027c -2.724b -2.848b with few agile Asymp. Sig. practices) 0.114 0.331 0.002 0.305 0.006 0.004 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.4.1 Global view

Figure 6.34 shows the agreement level for all technologies with respect to the cost efficiency of different static code analysis techniques considering all survey respondents. The agreement level is shown in the right size of the X-axis and the disagreement level on the left side of the X-axis, the different technquies are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on five agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

111

Cost efficiency - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.34 Cost efficiency – Global view

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 65.287, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -2.566, p = .010) or between walkthroughs and inspection (Z = -.784, p = .433) or between walkthroughs and informal reviews (Z = -2.481, p = 0.013), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -6.097, p = .000), and for static analysis tools over informal reviews (Z = - 5.012, p = .000), and for static analysis tools over walkthroughs (Z = -6.416, p = .000).

Conclusions for the global view: Overall industrial practitioners perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.4.2 Company size view

6.2.3.4.2.1 Companies with less than 50 employees

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 17.314, p = 0.001). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -.664, p = .506) or between walkthroughs and inspection (Z = -1.080, p = .280) or between walkthroughs and informal reviews (Z = -.966, = 0.334), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -3.124, p = .002), and for static analysis tools over informal reviews (Z = - 3.118, p = .002), and for static analysis tools over walkthroughs (Z = -3.019, p = .003). Conclusions for companies with less than 50 employees: industrial practitioners working for companies with less than 50 employees perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

112

Cost efficiency - Company size view (Less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Company size view (250 - 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Company size view (More than 4,500 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.35 Cost efficiency Likert chart - Company size view

6.2.3.4.2.2 Companies with 50 – 249 employees

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 11.635, p = 0.009). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -.263, p = .793) or between walkthroughs and inspection (Z = -.263, p = .793) or between walkthroughs and informal reviews (Z = -1.208, p = 0.227), or between static analysis tools and informal reviews (Z = -1.750, p = .080), but the test indicated a

113 statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = - 2.729, p = .006), and for static analysis tools over walkthroughs (Z = -2.954, p = .003).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50 – 249 employees perceive that that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.4.2.3 Companies with 250 – 4,499 employees

Friedman test indicated no statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 7.190, p = 0.066). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -1.292, p = .196) or between walkthroughs and inspection (Z = -1.027, p = .305) or between walkthroughs and informal reviews (Z = -.275, p = 0.783), or between static analysis tools and informal reviews (Z = -2.145, p = .032),or between static analysis tools and inspection (Z = -2.386, p = .017), or between static analysis tools and walkthroughs (Z = -2.019, p = .043).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 50 - 4,499 employees perceive that none of the static code analysis techniques is significantly more cost efficient than the others. The techniques can be ranked in the following way starting with the most cost efficient:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.4.2.4 Companies with more than 4,500 employees

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 35.498, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between walkthroughs and inspection (Z = -.294, p = .769), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = - 3.814, p = .000), and for static analysis tools over walkthroughs (Z = -4.484, p = .000) and for informal reviews over inspection (Z = -2.847, p = .004), and for walkthroughs over informal reviews (Z = - 3.439, p = 0.001), and for static analysis tools over informal reviews (Z = -3.039, p = .002).

Conclusions for companies with more than 4,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.4.3 Software product type view

In this this section the data set of cost efficiency is filtered with respect to the software product produced by the company, and to see if the software product type have an influence on the perceived cost efficiency of different static code analysis techniques.

114

Cost efficiency - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Producct type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.36 Cost efficiency Likert chart - Product type view

6.2.3.4.3.1 Companies producing data-dominant software

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 32.783, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between walkthroughs and inspection (Z = -.923, p = .356), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = - 4.583, p = .000), and for static analysis tools over walkthroughs (Z = -4.499, p = .000) and for informal

115 reviews over inspection (Z = -2.847, p = .004), and for walkthroughs over informal reviews (Z = - 2.272, p = 0.023), and for static analysis tools over informal reviews (Z = -3.055, p = .002).

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.4.3.2 Companies producing control-domain software

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 27.362, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -1.478, p = .140) or between walkthroughs and inspection (Z = -.570, p = .569) or between walkthroughs and informal reviews (Z = -2.423, = 0.015), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -3.540, p = .000), and for static analysis tools over informal reviews (Z = - 3.211, p = .001), and for static analysis tools over walkthroughs (Z = -3.981, p = .000).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.4.3.3 Companies producing system software

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 8.302, p = 0.040). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -.220, p = .826) or between walkthroughs and inspection (Z = -.395, p = .693) or between walkthroughs and informal reviews (Z = -.894, = 0.371), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -2.300, p = .021), and for static analysis tools over informal reviews (Z = - 2.225, p = .026), and for static analysis tools over walkthroughs (Z = -2.709, p = .007).

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

116

6.2.3.4.3.4 Companies producing computation-dominant software

Friedman test indicated no statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 7.722, p = 0.052). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.285, p = .199) or between walkthroughs and inspection (Z = -.425, p = .671) or between walkthroughs and informal reviews (Z = -1.642, p = 0.101), or between static analysis tools and inspection (Z = -2.274, p = .023), or between static analysis tools and informal reviews (Z = -1.357, p = .175), or between static analysis tools and walkthroughs (Z = - 2.431, p = .015).

Conclusions for companies producing computation-dominant software: Industrial practitioners working for companies producing computation-dominant software perceive that none of the static code analysis techniques is significantly more cost efficient than the others. The techniques can be ranked in the following way starting with the most cost efficient:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.4.4 Software life cycle model view

In this this section the data set on perceived cost efficiency is filtered with respect to the software life cycle model used by the company, to see if the software life cycle model used have an influence on the perceived cost efficiency of different static code analysis techniques.

6.2.3.4.4.1 Waterfall

Friedman test indicated no statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 4.147, p = 0.243). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.000, p = .317) or between walkthroughs and inspection (Z = -1.342, p = .180) or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and inspection (Z = -1.730, p = .084), or between static analysis tools and informal reviews (Z = -1.342, p = .180), or between static analysis tools and walkthroughs (Z = - 1.134, p = .257).

Conclusions for companies using waterfall model as a life cycle model: industrial practitioners working for companies using the waterfall model perceive that none of the static code analysis techniques is significantly more cost efficient than the others. The techniques can be ranked in the following way starting with the most cost efficient:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

117

Cost efficiency - Software life cycle model view (Waterfall)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Software life model view (Incremental)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Software life cycle model view (Agile)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Software life model view (Agile-dominant hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Cost efficiency - Software life cycle model view (Plan-driven hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis

Figure 6.37 Cost efficiency Likert chart - Software life cycle model view

118

6.2.3.4.4.2 Incremental model

Friedman test indicated no statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 2.333, p = 0.506). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -1.098, p = .272) or between walkthroughs and inspection (Z = -.277, p = .782) or between walkthroughs and informal reviews (Z = -.666, p = .506), or between static analysis tools and inspection (Z = -1.818, p = .069), or between static analysis tools and informal reviews (Z = -1.084, p = .279), or between static analysis tools and walkthroughs (Z = - 1.405, p = .160).

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques is significantly more cost efficient than the others. The techniques can be ranked in the following way starting with the most cost efficient:

1. Inspections, informal reviews, walkthroughs, static analysis tools.

6.2.3.4.4.3 Spiral model

Friedman test indicated no statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 3.000, p = 0.392). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived fault content between informal reviews and inspection (Z = -.447, p = .655) or between walkthroughs and inspection (Z = -.447, p = .655) or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and inspection (Z = -1.000, p = .317), or between static analysis tools and informal reviews (Z = -1.414, p = .157), or between static analysis tools and walkthroughs (Z = -1.414, p = .157).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that none of the static code analysis techniques is significantly more cost efficient than the others. The techniques can be ranked in the following way starting with the most cost efficient:

1. Inspections, informal reviews, walkthroughs, Static analysis tools.

6.2.3.4.4.4 Agile

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 31.684, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -1.018, p = .309) or between walkthroughs and inspection (Z = -.512, p = .609) or between walkthroughs and informal reviews (Z = -1.908, = 0.056), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -3.746, p = .000), and for static analysis tools over informal reviews (Z = - 3.736, p = .000), and for static analysis tools over walkthroughs (Z = -4.400, p = .000).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools.

119

2. Inspections, informal reviews, walkthroughs.

6.2.3.4.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 12.271, p = 0.007). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -1.534, p = .125) or between walkthroughs and inspection (Z = -.551, p = .582) or between walkthroughs and informal reviews (Z = -1.563, = 0.118), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -2.931, p = .003), and for static analysis tools over informal reviews (Z = - 1.814, p = .070), and for static analysis tools over walkthroughs (Z = -3.098, p = .002).

Conclusions for companies using Hybrid process (dominated by agile practices, with few plan- driven practices): Industrial practitioners working for companies using Hybrid process (dominated by agile practices, with few plan-driven practices perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.4.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated a statistically significant difference in the perceived cost efficiency of different static code analysis techniques (χ2(2) = 21.255, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived cost efficiency between informal reviews and inspection (Z = -1.582, p = .114) or between walkthroughs and inspection (Z = -.973, p = .331) or between walkthroughs and informal reviews (Z = -1.027, = 0.305), but the test indicated a statistically significant difference in the cost efficiency of static analysis tools over inspection (Z = -3.095, p = .002), and for static analysis tools over informal reviews (Z = - 2.724, p = .006), and for static analysis tools over walkthroughs (Z = -2.848, p = .004).

Conclusions for companies using Hybrid process (dominated by plan-driven practices, with few agile practices): Industrial practitioners working for companies using Hybrid process dominated by plan-driven practices, with few agile practices perceive that static analysis tool are significantly more cost efficient than the other techniques. The techniques can be ranked in the following way starting with the most efficient:

1. Static analysis tools. 2. Inspections, informal reviews, walkthroughs.

6.2.3.5 Ease of use

In this section, different views on the ease of use of different static code analysis techniques will be presented.

120

Table 6.13 Ease of use - Friedman test statistics

Chi- Asymp. N df Square Sig. Global View 97 43.339 3 0

< 50 employess 27 18.327 3 0 50 – 249 employees 21 2.957 3 0.398 250 - 4,499 employees 16 6.364 3 0.095 size view size Company Company > 4,500 employees 33 24.222 3 0

Data-dominant software 57 20.38 3 0 Control-domain software 26 16.662 3 0.001 System software 26 6.517 3 0.089 Product Product type view type Computation-dominant software 14 3.121 3 0.373

Waterfall 6 3.341 3 0.342 Incremental 15 4.19 3 0.242 Spiral 2 3 3 0.392 Agile 28 18.493 3 0

Software life life Software Hybrid (agile with few plan driven practices) 24 3.651 3 0.302 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 12.091 3 0.007

Table 6.14 Ease of use - Wilcoxen signed rank test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -4.118b -2.150b -5.150b -2.982c -1.599b -4.075b Global View Asymp. Sig. 0 0.032 0 0.003 0.11 0 (2-tailed) Z -1.874b -2.867b -3.633b -.194b -1.828b -2.024b < 50 employess Asymp. Sig. 0.061 0.004 0 0.846 0.068 0.043 (2-tailed) Z -1.051b -.484b -1.206b -.884c -.360b -1.187b 50 – 249 employees Asymp. Sig. 0.293 0.628 0.228 0.377 0.719 0.235 (2-tailed) Z -1.035b -.787c -1.381b -2.428c -.426b -1.933b 250 - 4,499 employees Asymp. Sig. Company size view size Company 0.301 0.431 0.167 0.015 0.67 0.053 (2-tailed) Z -3.780b -1.381b -3.452b -3.334c -.502b -2.882b > 4,500 employees Asymp. Sig. 0 0.167 0.001 0.001 0.616 0.004 (2-tailed) Z -3.403b -1.040b -2.964b -3.049c -.356c -1.996b Data-dominant

software Asymp. Sig. 0.001 0.298 0.003 0.002 0.722 0.046

view (2-tailed)

Control-domain b b b c b b Product type type Product Z -2.853 -1.320 -2.849 -2.874 -.676 -2.740 software

121

Asymp. Sig. 0.004 0.187 0.004 0.004 0.499 0.006 (2-tailed) Z -1.570b -1.867b -2.399b -.092c -1.129b -1.414b System software Asymp. Sig. 0.116 0.062 0.016 0.927 0.259 0.157 (2-tailed) Z -.586b -.289b -1.642b -.577c -1.026b -1.611b Computation- dominant software Asymp. Sig. 0.558 0.773 0.101 0.564 0.305 0.107 (2-tailed) Z -1.414b -.816b -1.633b -.816c -.447b -1.134b Waterfall Asymp. Sig. 0.157 0.414 0.102 0.414 0.655 0.257 (2-tailed) Z -2.095b -1.643b -1.585b -1.350c -.045c -.902b Incremental

Asymp. Sig. 0.036 0.1 0.113 0.177 0.964 0.367 (2-tailed) Z .000b .000b -1.000c .000b -1.414c -1.414c Spiral Asymp. Sig. 1 1 0.317 1 0.157 0.157 (2-tailed) Z -1.734b -.371c -2.803b -2.631c -.907b -3.244b Agile Asymp. Sig. 0.083 0.71 0.005 0.009 0.365 0.001 (2-tailed)

Software life cycle model view model cycle life Software Hybrid (agile with Z -1.082b -.680b -1.693b -.744c -.642b -1.214b few plan driven Asymp. Sig. practices) 0.279 0.497 0.09 0.457 0.521 0.225 (2-tailed)

Hybrid (plan-driven Z -2.251b -2.309b -2.719b -.791c -.973b -1.443b with few agile Asymp. Sig. practices) 0.024 0.021 0.007 0.429 0.331 0.149 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.5.1 Global view

Ease of use - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.38 Ease of use of static analysis techniques – Global view

Figure 6.38 shows the agreement level for all technologies with respect to ease of use of different static code analysis techniques considering all survey respondents. The agreement level is shown in the

122 right size of the X-axis and the disagreement level on the left side of the X-axis, different technquies are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 43.339, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between walkthroughs and inspection (Z = -2.150, p = .032) or between static analysis tools and informal reviews (Z = -1.599, p = .110), but the test indicated a statistically significant difference in the perceived ease of use for static analysis tools over inspection (Z = -5.150, p = .000), and for, and for static analysis tools over walkthroughs (Z = -4.075, p = .000) and for informal reviews over inspection (Z = -4.118, p = .000) and for walkthroughs over informal reviews (Z = -2.982, p = 0.003).

Conclusions for the global view: Overall industrial practitioners perceive that static analysis tool are the most easy to use technique, followed by informal reviews and walkthroughs, while inspection is the most difficult to use technique. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools. 2. Informal reviews, walkthroughs. 3. Inspections.

6.2.3.5.2 Company size view

6.2.3.5.2.1 Companies with less than 50 employees

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 18.327, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = -1.874, p = .061) or between walkthroughs and informal reviews (Z = -.194, p = 0.846) or between static analysis tools and informal reviews (Z = - 1.828, p = .068) or between static analysis tools and walkthroughs (Z = -2.024, p = .043), but the test indicated a statistically significant difference in the perceived ease of use for static analysis tools over inspection (Z = -3.663, p = .000), and for walkthroughs over inspection (Z = -2.867, p = .004).

Conclusions for companies with less than 50 employees: industrial practitioners working for companies with less than 50 employees perceive that static analysis tool are the most easy to use technique, followed by informal reviews and walkthroughs, while inspection is the most difficult to use technique. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools. 2. Informal reviews, walkthroughs. 3. Inspections.

123

Ease of use - Company size view (Less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Company size view (250 - 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Company size view (More than 4,500 emlpoyees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.39 Ease of use Likert chart - Company size view

6.2.3.5.2.2 Companies with 50 – 249 employees

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 2.957, p = 0.398). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = -1.051, p = .293) or between walkthroughs and informal reviews (Z = -.884, p = 0.377) or between static analysis tools and informal reviews (Z = -.360, p = .719) or between static analysis tools and walkthroughs (Z = -1.187, p = .235), or between static analysis

124 tools and inspection (Z = -1.206, p = .228), or between walkthroughs and inspection (Z = -.484, p = .628).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50–249 employees perceive that perceive none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.2.3 Companies with 250 – 4,499 employees

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 6.364, p = 0.095). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = -1.035, p = .301) or between walkthroughs and informal reviews (Z = -2.428, p = 0.015) or between static analysis tools and informal reviews (Z = -.426, p = .670) or between static analysis tools and walkthroughs (Z = -1.933, p = .053), or between static analysis tools and inspection (Z = -1.381, p = .167), or between walkthroughs and inspection (Z = -.787, p = .431).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 50 - 4,499 employees perceive that none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.2.4 Companies with more than 4,500 employees

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 24.222, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use walkthroughs and inspection (Z = -1.381, p = .167) or between static analysis tools and informal reviews (Z = -.502, p = 0.616) but the test indicated a statistically significant difference in the perceived ease of use for static analysis tools over inspection (Z = -3.452, p = .001), and for walkthroughs over informal reviws (Z = -3.334, p = .001) and for informal reviews over inspection (Z = -3.780, p = .000) and for static analysis tools over walkthroughs (Z = -2.024, p = .043),

Conclusions for companies with more than 4,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that static analysis tool are the most easy to use technique, followed by informal reviews and walkthroughs, while inspection is the most difficult to use technique. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools. 2. Informal reviews, walkthroughs. 3. Inspections.

125

6.2.3.5.3 Software product type view

In this this section the data set on ease of use is filtered with respect to the software product type developed by the company, and to see if the software product type have an influence on the perceived ease of use of different static code analysis techniques.

Ease of use - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Product type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.40 Ease of use Likert chart - Product type view

126

6.2.3.5.3.1 Companies producing data-dominant software

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 20.380, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use walkthroughs and inspection (Z = -1.040, p = .298) or between static analysis tools and informal reviews (Z = -.356, p = 0.722) or between static analysis tools and walkthroughs (Z = -1.996, p = .046), but the test indicated a statistically significant difference in the perceived ease of use for static analysis tools over inspection (Z = -2.964, p = .003), and for walkthroughs over informal reviews (Z = - 3.049, p = .002) and for informal reviews over inspection (Z = -3.403, p = .001).

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that static analysis tool are the most easy to use technique, followed by informal reviews and walkthroughs, while inspection is the most difficult to use technique. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools. 2. Informal reviews, walkthroughs. 3. Inspections.

6.2.3.5.3.2 Companies producing control-domain software

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 16.662, p = 0.001). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use walkthroughs and inspection (Z = -1.320, p = .187) or between static analysis tools and informal reviews (Z = -.676, p = 0.499), but the test indicated a statistically significant difference in the perceived ease of use for static analysis tools over inspection (Z = -2.849, p = .004), and for walkthroughs over informal reviews (Z = -2.874, p = .004) and for informal reviews over inspection (Z = -2.853, p = .004) and for or static analysis tools over walkthroughs (Z = -2.740, p = .006).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that static analysis tool are the most easy to use technique, followed by informal reviews and walkthroughs, while inspection is the most difficult to use technique. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools. 2. Informal reviews, walkthroughs. 3. Inspections.

6.2.3.5.3.3 Companies producing system software

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 6.517, p = 0.089). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = -1.570, p = .116) or between walkthroughs and informal reviews (Z = -.092, p = 0.927) or between static analysis tools and informal reviews (Z = -1.129, p = .259) or between static analysis tools and walkthroughs (Z = -1.414, p = .157), or between static analysis tools and inspection (Z = -2.399, p = .016), or between walkthroughs and inspection (Z = -1.867, p = .062).

127

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.3.4 Companies producing computation-dominant software

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 3.121, p = 0.373). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = -.586, p = .558) or between walkthroughs and informal reviews (Z = -.577, p = 0.564) or between static analysis tools and informal reviews (Z = -1.026, p = .305) or between static analysis tools and walkthroughs (Z = -1.611, p = .107), or between static analysis tools and inspection (Z = -1.642, p = .101), or between walkthroughs and inspection (Z = -.289, p = .773).

Conclusions for companies producing computation-dominant software: Industrial practitioners working for companies producing computation-dominant software perceive that none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.4 Software life cycle model view

In this this section the data set on perceived ease of use is filtered with respect to the software life cycle model used by the company to see if it has an influence on the perceived ease of use of different static code analysis techniques.

6.2.3.5.4.1 Waterfall

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 3.341, p = 0.342). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = -1.414, p = .157) or between walkthroughs and informal reviews (Z = -.816, p = 0.414) or between static analysis tools and informal reviews (Z = -.447, p = .655) or between static analysis tools and walkthroughs (Z = -1.134, p = .257), or between static analysis tools and inspection (Z = -1.633, p = .102), or between walkthroughs and inspection (Z = -.816, p = .414).

Conclusions for companies using waterfall model as a life cycle model: industrial practitioners working for companies using the waterfall model perceive none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

128

Ease of use - Software life cycle model view (Waterfall)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Software life cycle model view (Incremental)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Software life cycle model view (Agile)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Software life cycle model view (Agile-dominant hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Ease of use - Software life cycle model view (Plan-driven hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis

Figure 6.41 Ease of use Likert chart - Software life cycle model view

129

6.2.3.5.4.2 Incremental model

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 4.190, p = 0.242). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = 2.095, p = .036) or between walkthroughs and informal reviews (Z = -1.350, p = 0.177) or between static analysis tools and informal reviews (Z = -.045, p = .964) or between static analysis tools and walkthroughs (Z = -.902, p = .367), or between static analysis tools and inspection (Z = -1.585, p = .113), or between walkthroughs and inspection (Z = -1.643, p = .100).

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.4.3 Spiral model

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 3.000, p = 0.392). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = .000, p = 1.000) or between walkthroughs and informal reviews (Z = -.000, p = 1.000) or between static analysis tools and informal reviews (Z = -1.414, p = .157) or between static analysis tools and walkthroughs (Z = -1.414, p = .157), or between static analysis tools and inspection (Z = -1.000, p = .317), or between walkthroughs and inspection (Z = -.000, p = 1.000).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.4.4 Agile

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 18.493, p = 0.000). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use walkthroughs and inspection (Z = -.371, p = .710) or between static analysis tools and informal reviews (Z = -.907, p = 0.365), or between informal reviews and inspection (Z = -2.853, p = .004), but the test indicated a statistically significant difference in the perceived ease of use for static analysis tools over inspection (Z = -2.803, p = .005), and for walkthroughs over informal reviews (Z = -2.631, p = .009) and for or static analysis tools over walkthroughs (Z = -2.740, p = .006).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that static analysis tool are the most easy to use technique, followed by walkthroughs, while inspections and informal reviews are the most difficult to use technique. The techniques can be ranked in the following order starting with the easier to use:

130

1. Static analysis tools. 2. Walkthroughs. 3. Inspections, informal reviews.

6.2.3.5.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated no statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 3.651, p = 0.302). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = 1.082, p = .279) or between walkthroughs and informal reviews (Z = -.744, p = .457) or between static analysis tools and informal reviews (Z = -.642, p = .521) or between static analysis tools and walkthroughs (Z = -1.214, p = .225), or between static analysis tools and inspection (Z = -1.693, p = .090), or between walkthroughs and inspection (Z = -.680, p = .497).

Conclusions for companies using Hybrid process (dominated by agile practices, with few plan- driven practices): Industrial practitioners working for companies using Hybrid process (dominated by agile practices, with few plan-driven practices perceive that none of the static code analysis techniques is significantly easier to use than the others. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.5.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated a statistically significant difference in the perceived ease of use of different static code analysis techniques (χ2(2) = 12.091, p = 0.007). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived ease of use between informal reviews and inspection (Z = 2.251, p = .024) or between walkthroughs and informal reviews (Z = -.791, p = .429) or between static analysis tools and informal reviews (Z = - .973, p = .331) or between static analysis tools and walkthroughs (Z = -1.443, p = .149), or between static analysis tools and inspection (Z = -2.719, p = .007), or between walkthroughs and inspection (Z = -2.309, p = .021).

Conclusions for companies using Hybrid process (dominated by plan-driven practices, with few agile practices): Industrial practitioners working for companies using Hybrid process dominated by plan-driven practices, with few agile practices perceive that static analysis tools, informal reviews and wlkthroughs are significantly easier to use than inspections. The techniques can be ranked in the following order starting with the easier to use:

1. Static analysis tools, Informal reviews, walkthroughs. 2. Inspections.

6.2.3.6 Internal code quality

In this section, different views on the internal code quality of different static code analysis techniques will be presented.

131

Table 6.15 Internal code quality - Friedman test statistics

Chi- Asymp. N df Square Sig. Global View 97 14.54 3 0.002

< 50 employess 27 5.503 3 0.138 50 – 249 employees 21 11.782 3 0.008 250 - 4,499 employees 16 3.633 3 0.304 size view size Company Company > 4,500 employees 33 5.347 3 0.148

Data-dominant software 57 7.496 3 0.058 Control-domain software 26 5.612 3 0.132 System software 26 3.965 3 0.265 Product Product type view type Computation-dominant software 14 5.753 3 0.124

Waterfall 6 3.6 3 0.308 Incremental 15 7.472 3 0.058 Spiral 2 3 3 0.392 Agile 28 16.379 3 0.001

Software life life Software Hybrid (agile with few plan driven practices) 24 4.967 3 0.174 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 1.659 3 0.646

Table 6.16 Internal code quality - Wilcoxen signed rank test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -2.867b -3.771b -.916b -.984b -1.636c -2.402c Global View Asymp. Sig. 0.004 0 0.36 0.325 0.102 0.016 (2-tailed) Z -1.386b -1.498b -1.416b -.632b -.440b -.037b < 50 employess Asymp. Sig. 0.166 0.134 0.157 0.527 0.66 0.971 (2-tailed) Z -2.414b -3.071b -2.352b -.711b .000c -1.043d 50 – 249 employees Asymp. Sig. 0.016 0.002 0.019 0.477 1 0.297 size view size

(2-tailed) Z -1.115b -1.311b -.250c .000d -1.050c -1.051c 250 - 4,499 employees Asymp. Sig. Company 0.265 0.19 0.803 1 0.294 0.293 (2-tailed) Z -1.079b -1.724b -1.519c -.693b -1.927c -2.284c > 4,500 employees Asymp. Sig. 0.28 0.085 0.129 0.488 0.054 0.022 (2-tailed) Z -2.493b -3.030b -.926b -.379b -1.137c -1.595c Data-dominant

software Asymp. Sig. 0.013 0.002 0.355 0.705 0.255 0.111

view (2-tailed)

Control-domain b b b b c c Product type type Product Z -1.781 -2.579 -1.342 -1.129 -.300 -1.277 software

132

Asymp. Sig. 0.075 0.01 0.18 0.259 0.764 0.201 (2-tailed) Z -.852b -1.617b -.268c -.849b -.843c -1.354c System software Asymp. Sig. 0.394 0.106 0.788 0.396 0.399 0.176 (2-tailed) Z -1.725b -2.226b -.378b -.879b -1.100c -1.628c Computation- dominant software Asymp. Sig. 0.084 0.026 0.705 0.38 0.271 0.103 (2-tailed) Z .000b .000b -1.414c .000b -1.414c -1.414c Waterfall Asymp. Sig. 1 1 0.157 1 0.157 0.157 (2-tailed) Z -2.070b -1.155b -1.848b -1.406c -.122c -1.006b Incremental

Asymp. Sig. 0.038 0.248 0.065 0.16 0.903 0.314 (2-tailed) Z -1.000b -1.000b .000c .000c -1.000d -1.000d Spiral Asymp. Sig. 0.317 0.317 1 1 0.317 0.317 (2-tailed) Z -3.046b -3.255b -1.342b -.216c -1.993c -2.364c Agile Asymp. Sig. 0.002 0.001 0.18 0.829 0.046 0.018 (2-tailed)

Software life cycle model view model cycle life Software Hybrid (agile with Z -.850b -1.942b -.247b -1.811b -.329c -1.925c few plan driven Asymp. Sig. practices) 0.395 0.052 0.805 0.07 0.742 0.054 (2-tailed)

Hybrid (plan-driven Z -.707b -.632c -.491b -1.265c .000d -.686b with few agile Asymp. Sig. practices) 0.48 0.527 0.623 0.206 1 0.493 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.6.1 Global view

Internal code quality - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.42 Internal code quality – Global view

Figure 6.42 shows the agreement level for all technologies with respect to the internal code quality of different static code analysis techniques considering all survey respondents. The agreement level is

133 shown in the right size of the X-axis and the disagreement level on the left side of the X-axis, the different technquies are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

Friedman test indicated a statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 14.540, p = 0.002). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between static analysis tools and inspection (Z = -.916, p = .360) or between static analysis tools and informal reviews (Z = -1.636, p = .102), or between static analysis tools and walkthroughs (Z = -2.402, p = .016), or between walkthroughs and informal reviews (Z = - .984, p = .325), but the test indicated a statistically significant difference in the perceived internal code quality for informal reviews over inspections (Z = -2.867, p = .004), and for walkthroughs over inspections (Z = -2.867, p = .004).

Conclusions for the global view: Overall industrial practitioners perceive that static analysis tool, informal reviews and walkthroughs significantly improve internal code quality over inspections. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs. 2. Inspections.

6.2.3.6.2 Company size view

6.2.3.6.2.1 Companies with less than 50 employees

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 5.503, p = 0.138). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.386, p = .166), or between walkthroughs and inspections (Z = -1.498, p = .134), or between static analysis tools and inspection (Z = -1.416, p = .157) or between static analysis tools and informal reviews (Z = -0.440, p = .660), or between walkthroughs and informal reviews (Z = -.632, p = .527), or between static analysis tools and walkthroughs (Z = -.037, p = .971).

Conclusions for companies with less than 50 employees: industrial practitioners working for companies with less than 50 employees perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

134

Internal code quality - Company size view (Less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Company size view (250 - 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Compnay size view (More than 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.43 Internal code quality Likert chart - Company size view

6.2.3.6.2.2 Companies with 50 – 249 employees

Friedman test indicated a statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 11.782, p = 0.008). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -2.414, p = .016), or between static analysis tools and inspection (Z = -2.352, p = .019) or between static analysis tools and informal reviews (Z = -0.000, p = 1.000), or between walkthroughs and informal reviews (Z = -

135

.711, p = .477), or between static analysis tools and walkthroughs (Z = -1.043, p = .297), but the test indicated a statistically significant difference in perceived internal code quality for walkthroughs over inspections (Z = -3.071, p = .002).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50 – 249 employees perceive that walkthroughs, informal reviews and static analysis tools significantly improves internal code quality than inspections. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs. 2. Inspections.

6.2.3.6.2.3 Companies with 250 – 4,499 employees

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 3.633, p = 0.304). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.115, p = .265), or between walkthroughs and inspections (Z = -1.311, p = .190), or between static analysis tools and inspection (Z = -.250, p = .803) or between static analysis tools and informal reviews (Z = -1.050, p = .294), or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and walkthroughs (Z = -1.051, p = .293).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 50 - 4,499 employees perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.2.4 Companies with more than 4,500 employees

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 5.347, p = 0.148). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.079, p = .280), or between walkthroughs and inspections (Z = -1.724, p = .085), or between static analysis tools and inspection (Z = -1.519, p = .129) or between static analysis tools and informal reviews (Z = -1.927, p = .054), or between walkthroughs and informal reviews (Z = -.693, p = .488), or between static analysis tools and walkthroughs (Z = -2.284, p = .022).

Conclusions for companies with more than 4,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

136

6.2.3.6.3 Software product type view

In this this section the data set on internal code quality is filtered with respect to the software product type developed by the company, and to see if the software product type have an influence on the perceived internal code quality of different static code analysis techniques.

Internal code quality - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Product type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.44 Internal code quality Likert chart - Product type view

137

6.2.3.6.3.1 Companies producing data-dominant software

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 7.496, p = 0.058). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -2.493, p = .013), or between static analysis tools and inspection (Z = -.926, p = .355) or between static analysis tools and informal reviews (Z = -1.137, p = .255), or between walkthroughs and informal reviews (Z = -.379, p = .705), or between static analysis tools and walkthroughs (Z = -1.595, p = .111), but the test indicated a statistically significant difference in perceived internal code quality for walkthroughs over inspections (Z = -3.030, p = .002).

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that walkthroughs and informal reviews significantly improves internal code quality than inspections and static analysis tools. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Informal reviews, walkthroughs. 2. Static analysis tools, Inspections.

6.2.3.6.3.2 Companies producing control-domain software

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 5.612, p = 0.132). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.781, p = .075), or between walkthroughs and inspections (Z = -2.579, p = .010), or between static analysis tools and inspection (Z = -1.342, p = .180) or between static analysis tools and informal reviews (Z = -.300, p = .764), or between walkthroughs and informal reviews (Z = -1.129, p = .259), or between static analysis tools and walkthroughs (Z = -1.277, p = .201).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.3.3 Companies producing system software

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 3.965, p = 0.265). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -.852, p = .394), or between walkthroughs and inspections (Z = -1.617, p = .106), or between static analysis tools and inspection (Z = -.268, p = .788) or between static analysis tools and informal reviews (Z = -.843, p = .399), or between walkthroughs and informal reviews (Z = -.849, p = .396), or between static analysis tools and walkthroughs (Z = -1.354, p = .176).

138

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.3.4 Companies producing computation-dominant software

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 5.753, p = 0.124). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.725, p = .084), or between walkthroughs and inspections (Z = -2.226, p = .026), or between static analysis tools and inspection (Z = -.378, p = .705) or between static analysis tools and informal reviews (Z = -1.100, p = .271), or between walkthroughs and informal reviews (Z = -.879, p = .380), or between static analysis tools and walkthroughs (Z = -1.628, p = .103).

Conclusions for companies producing computation-dominant software: Industrial practitioners working for companies producing computation-dominant software perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.4 Software life cycle model view

In this this section the data set on perceived internal code quality is filtered with respect to the software life cycle model used by the company to see if it has an influence on the perceived internal code quality of different static code analysis techniques.

6.2.3.6.4.1 Waterfall

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 3.600, p = 0.308). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -.000, p = 1.000), or between walkthroughs and inspections (Z = -.000, p = 1.000), or between static analysis tools and inspection (Z = -1.414, p = .157) or between static analysis tools and informal reviews (Z = -1.414, p = .157), or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and walkthroughs (Z = -1.414, p = .157).

Conclusions for companies using waterfall model as a life cycle model: industrial practitioners working for companies using the waterfall model perceive none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

139

Internal code quality - Software life cycle model view (Waterfall)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Software life cycle model view (Incremental)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Software life cycle model view (Agile)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Software life cycle model view (Agile-dominant hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Internal code quality - Software life cycle model view (Plan-driven hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis

Figure 6.45 Internal code quality Likert chart - Software life cycle model view

140

6.2.3.6.4.2 Incremental model

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 7.472, p = 0.058). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -2.070, p = 1.038), or between walkthroughs and inspections (Z = -1.155, p = .248), or between static analysis tools and inspection (Z = -1.848, p = .065) or between static analysis tools and informal reviews (Z = -.122, p = .903), or between walkthroughs and informal reviews (Z = -1.406, p = .160), or between static analysis tools and walkthroughs (Z = -1.006, p = .314).

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.4.3 Spiral model

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 3.000, p = 0.392). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.000, p = .317), or between walkthroughs and inspections (Z = -1.000, p = .317), or between static analysis tools and inspection (Z = -.000, p = 1.000) or between static analysis tools and informal reviews (Z = -1.000, p = .317), or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and walkthroughs (Z = -1.000, p = .317).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.4.4 Agile

Friedman test indicated a statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 16.379, p = 0.001). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between static analysis tools and inspection (Z = -1.342, p = .180) or between static analysis tools and informal reviews (Z = -1.993, p = .046), or between walkthroughs and informal reviews (Z = -.216, p = .829), or between static analysis tools and walkthroughs (Z = - 2.364, p = .018), but the test indicated a statistically significant difference in perceived internal code quality for walkthroughs over inspections (Z = -3.071, p = .002) and for informal reviews over inspections (Z = -3.046, p = .002).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that static analysis tools significantly improves internal code quality than walkthroughs and informal reviews, in turn walkthroughs and informal reviews significantly improves

141 internal code quality than inspections. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools. 2. Informal reviews, walkthroughs. 3. Inspections.

6.2.3.6.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 4.967, p = 0.174). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -.850, p = .395), or between walkthroughs and inspections (Z = -1.942, p = .052), or between static analysis tools and inspection (Z = -.247, p = .805) or between static analysis tools and informal reviews (Z = -.329, p = .742), or between walkthroughs and informal reviews (Z = -1.811, p = .070), or between static analysis tools and walkthroughs (Z = -1.925, p = .054).

Conclusions for companies using Hybrid process (dominated by agile practices, with few plan- driven practices): Industrial practitioners working for companies using Hybrid process (dominated by agile practices, with few plan-driven practices perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.6.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated no statistically significant difference in the perceived internal code quality of different static code analysis techniques (χ2(2) = 1.659, p = 0.646). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -.707, p = .480), or between walkthroughs and inspections (Z = -.632, p = .527), or between static analysis tools and inspection (Z = -.491, p = .623) or between static analysis tools and informal reviews (Z = -.000, p = 1.000), or between walkthroughs and informal reviews (Z = -1.265, p = .206), or between static analysis tools and walkthroughs (Z = -.686, p = .493).

Conclusions for companies using Hybrid process (dominated by plan-driven practices, with few agile practices): Industrial practitioners working for companies using Hybrid process dominated by plan-driven practices, with few agile practices perceive that none of the static code analysis techniques significantly improve internal code quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7 Product quality

In this section, different views on the product quality of different static code analysis techniques will be presented.

142

Table 6.17 Product quality - Friedman test statistics

Chi- Asymp. N df Square Sig. Global View 97 8.238 3 0.041

< 50 employees 27 3.02 3 0.389 50 – 249 employees 21 8.279 3 0.041 250 - 4,499 employees 16 2.684 3 0.443 size view size Company Company > 4,500 employees 33 2.673 3 0.445

Data-dominant software 57 5.16 3 0.16 Control-domain software 26 3.576 3 0.311 System software 26 2.508 3 0.474 Product Product type view type Computation-dominant software 14 4.607 3 0.203

Waterfall 6 6.231 3 0.101 Incremental 15 8.32 3 0.04 Spiral 2 1.286 3 0.733 Agile 28 12.101 3 0.007

Software life life Software Hybrid (agile with few plan driven practices) 24 2.681 3 0.443 cycle model view cycle model Hybrid (plan-driven with few agile practices) 13 6.805 3 0.078

Table 6.18 Product quality - Wilcoxen signed rank test statistics

Rev - Walk - SAT - Walk - SAT - SAT - Insp Insp Insp Rev Rev Walk Z -2.282b -1.978b -.150b -.436c -1.924c -1.616c Global View Asymp. Sig. 0.023 0.048 0.88 0.662 0.054 0.106 (2-tailed) Z -1.255b -1.209b -1.310b .000c -.471b -.489b < 50 employees Asymp. Sig. 0.21 0.227 0.19 1 0.637 0.625 (2-tailed) Z -1.813b -2.495b -.973b -.061c -1.347c -1.645c 50 – 249 employees Asymp. Sig. 0.07 0.013 0.33 0.951 0.178 0.1 (2-tailed) Z -.632b -.318c -1.155c -.857c -1.382c -.735c 250 - 4,499 employees Asymp. Sig. Company size view size Company 0.527 0.751 0.248 0.391 0.167 0.462 (2-tailed) Z -.665b -.635b -1.108c -.037b -1.377c -1.304c > 4,500 employees Asymp. Sig. 0.506 0.526 0.268 0.971 0.168 0.192 (2-tailed) Z -1.340b -1.266b -.266c -.067c -1.471c -1.571c Data-dominant

software Asymp. Sig. 0.18 0.205 0.79 0.946 0.141 0.116

view (2-tailed)

Control-domain b b b b c c Product type type Product Z -1.581 -1.887 -.259 -.517 -1.268 -1.437 software

143

Asymp. Sig. 0.114 0.059 0.795 0.605 0.205 0.151 (2-tailed) Z -1.530b -1.340b -.707b -.243c -1.028c -.819c System software Asymp. Sig. 0.126 0.18 0.479 0.808 0.304 0.413 (2-tailed) Z -.879b -1.496b -1.265c -.649b -1.725c -2.209c Computation- dominant software Asymp. Sig. 0.38 0.135 0.206 0.516 0.084 0.027 (2-tailed) Z .000b .000b -1.732c .000b -1.732c -1.732c Waterfall Asymp. Sig. 1 1 0.083 1 0.083 0.083 (2-tailed) Z -1.777b -.061b -1.438b -2.165c -.811c -1.224b Incremental

Asymp. Sig. 0.076 0.951 0.15 0.03 0.417 0.221 (2-tailed) Z -1.000b -1.000b -1.000b .000c -.447b -.447b Spiral Asymp. Sig. 0.317 0.317 0.317 1 0.655 0.655 (2-tailed) Z -2.266b -2.224b -.346b -.372b -1.522c -2.001c Agile Asymp. Sig. 0.023 0.026 0.729 0.71 0.128 0.045 (2-tailed)

Software life cycle model view model cycle life Software Hybrid (agile with Z -.791b -1.567b -.073b -.749b -.471c -1.222c few plan driven Asymp. Sig. practices) 0.429 0.117 0.942 0.454 0.638 0.222 (2-tailed)

Hybrid (plan-driven Z -.302b -.632b -2.111b -.264b -1.540b -1.089b with few agile Asymp. Sig. practices) 0.763 0.527 0.035 0.792 0.124 0.276 (2-tailed) b. Based on positive ranks. c. Based on negative ranks.

6.2.3.7.1 Global view

Product quality - Global view

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.46 Product quality – Global view

144

Figure 6.46 shows the agreement level for all technologies with respect to the product quality of different static code analysis techniques considering all survey respondents. The agreement level is shown in the right size of the X-axis and the disagreement level on the left side of the X-axis, the different techniques are plotted on the Y-axis. All Survey respondents are asked to rate the technologies on 5 agreement/disagreement levels, strongly agree (5), agree (4), undecided (3), disagree (2), and strongly disagree (1). The 5 agreement levels are masked with numerical values shown in the brackets, these numerical values will be used to perform the statistical tests.

Friedman test indicated a statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 8.238, p = 0.041). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between static analysis tools and inspection (Z = -.150, p = .880) or between static analysis tools and informal reviews (Z = -1.924, p = .054), or between static analysis tools and walkthroughs (Z = - 1.616, p = .106), or between walkthroughs and informal reviews (Z = -.436, p = .662), or between informal reviews and inspections (Z = -2.282, p = .023), or between walkthroughs and inspections (Z = -1.924, p = .054).

Conclusions for the global view: Overall industrial practitioners perceive that none of the static code analysis techniques significantly improve product quality than the others. The techniques can be ranked in the following order starting with the ones that improve internal quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.2 Company size view

6.2.3.7.2.1 Companies with less than 50 employees

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 3.020, p = 0.389). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.255, p = .210), or between walkthroughs and inspections (Z = -1.209, p = .227), or between static analysis tools and inspection (Z = -1.310, p = .190) or between static analysis tools and informal reviews (Z = -0.471, p = .637), or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and walkthroughs (Z = -.489, p = .625).

Conclusions for companies with less than 50 employees: industrial practitioners working for companies with less than 50 employees perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.2.2 Companies with 50 – 249 employees

Friedman test indicated a statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 8.279, p = 0.041). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived internal code quality between informal reviews and inspections (Z = -1.813, p = .070), or between walkthroughs and inspections (Z = -2.495, p = .013), or between static analysis tools and inspection (Z = -.973, p = .330) or between static analysis tools and informal reviews (Z = -1.347, p = .178), or between walkthroughs

145 and informal reviews (Z = -.061, p = .951), or between static analysis tools and walkthroughs (Z = - 1.645, p = .100).

Conclusions for companies with 50 – 249 employees: industrial practitioners working for companies with 50 – 249 employees perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

Product quality - Company size view (Less than 50 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Company size view (50 - 249 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Company size view (250 - 4,499 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Company size view (More than 4,500 employees)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.47 Product quality Likert chart - Company size view

146

6.2.3.7.2.3 Companies with 250 – 4,499 employees

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 2.684, p = 0.443). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -.632, p = .527), or between walkthroughs and inspections (Z = -.318, p = .751), or between static analysis tools and inspection (Z = -1.155, p = .248) or between static analysis tools and informal reviews (Z = -1.382, p = .167), or between walkthroughs and informal reviews (Z = -.857, p = .391), or between static analysis tools and walkthroughs (Z = -.735, p = .462).

Conclusions for companies with 250 – 4,499 employees: industrial practitioners working for companies with 50 - 4,499 employees perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.2.4 Companies with more than 4,500 employees

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 2.673, p = 0.445). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -.665, p = .506), or between walkthroughs and inspections (Z = -.635, p = .526), or between static analysis tools and inspection (Z = -1.108, p = .268) or between static analysis tools and informal reviews (Z = -1.377, p = .168), or between walkthroughs and informal reviews (Z = -.037, p = .971), or between static analysis tools and walkthroughs (Z = -1.304, p = .192).

Conclusions for companies with more than 4,500 employees: industrial practitioners working for companies with more than 4,500 employees perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.3 Software product type view

In this this section the data set on product quality is filtered with respect to the software product type developed by the company, and to see if the software product type have an influence on the perceived product quality of different static code analysis techniques.

6.2.3.7.3.1 Companies producing data-dominant software

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 5.160, p = 0.160). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -1.340, p = .180), or between walkthroughs and inspections (Z = -1.266, p = .205), or between static analysis tools and inspection (Z = -.266, p = .790) or between static analysis tools and informal reviews (Z = -1.471, p = .141), or

147 between walkthroughs and informal reviews (Z = -.067, p = .946), or between static analysis tools and walkthroughs (Z = -1.571, p = .116).

Conclusions for companies producing data-dominant software: industrial practitioners working for companies producing data-dominant software perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

Product quality - Product type view (Data-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Product type view (Control-domain software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Product type view (System software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Product type view (Computation-dominant software)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Static Analysis TechniquesStatic Analysis Disagree Strongly disagree Agree Strongly agree

Figure 6.48 Product quality Likert chart - Product type view

148

6.2.3.7.3.2 Companies producing control-domain software

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 3.576, p = 0.311). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -1.581, p = .114), or between walkthroughs and inspections (Z = -1.887, p = .059), or between static analysis tools and inspection (Z = -.259, p = .795) or between static analysis tools and informal reviews (Z = -1.268, p = .205), or between walkthroughs and informal reviews (Z = -.517, p = .605), or between static analysis tools and walkthroughs (Z = -1.437, p = .151).

Conclusions for companies producing control-domain software: industrial practitioners working for companies producing control-domain software perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.3.3 Companies producing system software

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 2.508, p = 0.474). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -1.530, p = .126), or between walkthroughs and inspections (Z = -1.340, p = .180), or between static analysis tools and inspection (Z = -.707, p = .479) or between static analysis tools and informal reviews (Z = -1.028, p = .304), or between walkthroughs and informal reviews (Z = -.243, p = .808), or between static analysis tools and walkthroughs (Z = -.819, p = .413).

Conclusions for companies producing system software: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.3.4 Companies producing computation-dominant software

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 4.607, p = 0.203). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -.879, p = .380), or between walkthroughs and inspections (Z = -1.496, p = .135), or between static analysis tools and inspection (Z = -1.265, p = .206) or between static analysis tools and informal reviews (Z = -1.725, p = .084), or between walkthroughs and informal reviews (Z = -.649, p = .516), or between static analysis tools and walkthroughs (Z = -2.209, p = .027).

149 techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.4 Software life cycle model view

In this this section the data set on perceived product quality is filtered with respect to the software life cycle model used by the company to see if it has an influence on the perceived product quality of different static code analysis techniques.

6.2.3.7.4.1 Waterfall

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 6.231, p = 0.101). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -.000, p = 1.000), or between walkthroughs and inspections (Z = -.000, p =1 000), or between static analysis tools and inspection (Z = -1.732, p = .083) or between static analysis tools and informal reviews (Z = -1.732, p = .083), or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and walkthroughs (Z = -1.732, p = .083).

Conclusions for companies using waterfall model as a life cycle model: industrial practitioners working for companies using the waterfall model perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.4.2 Incremental model

Friedman test indicated a statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 8.320, p = 0.040). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -1.777, p = .076), or between walkthroughs and inspections (Z = -.061, p =.951), or between static analysis tools and inspection (Z = -1.438, p = .150) or between static analysis tools and informal reviews (Z = -.811, p = .417), or between walkthroughs and informal reviews (Z = -2.165, p = .030), or between static analysis tools and walkthroughs (Z = - 1.224, p = .221).

Conclusions for companies using incremental model as a life cycle model: industrial practitioners working for companies producing system software perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

150

Product quality - Software life cycle model view (Waterfall)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Software life cycle model view (Incremental)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Software life cycle model view (Agile)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Software life cycle model view (Agile-dominant hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools Techniques Static Analysis -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of Practitioners

Product quality - Software life cycle model view (Plan-driven hybrid model)

Inspection Informal Reviews Walkthroughs Static Analysis Tools -100% -80% -60% -40% -20% 0% 20% 40% 60% 80% 100% Percentage of responses

Disagree Strongly disagree Agree Strongly agree Static Analysis TechnologiesStatic Analysis

Figure 6.49 Product quality Likert chart - Software life cycle model view

151

6.2.3.7.4.3 Spiral model

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 1.286, p = 0.733). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -1.000, p = .317), or between walkthroughs and inspections (Z = -1.000, p =.317), or between static analysis tools and inspection (Z = -1.000, p = .317) or between static analysis tools and informal reviews (Z = -.447, p = .655), or between walkthroughs and informal reviews (Z = -.000, p = 1.000), or between static analysis tools and walkthroughs (Z = -.447, p = .655).

Conclusions for companies using spiral model as a life cycle model: industrial practitioners working for companies using spiral model perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.4.4 Agile

Friedman test indicated a statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 12.101, p = 0.007). Post hoc analysis with Wilcoxon signed- rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -2.266, p = .023), or between walkthroughs and inspections (Z = -2.224, p =.026), or between static analysis tools and inspection (Z = -.346, p = .729) or between static analysis tools and informal reviews (Z = -1.522, p = .128), or between walkthroughs and informal reviews (Z = -.372, p = .710), or between static analysis tools and walkthroughs (Z = - 2.001, p = .045).

Conclusions for companies using Agile models: industrial practitioners working for companies using Agile models perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.4.5 Hybrid process (dominated by agile practices, with few plan-driven practices)

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 2.681, p = 0.443). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -2.266, p = .023), or between walkthroughs and inspections (Z = -.791, p =.429), or between static analysis tools and inspection (Z = -.073, p = .942) or between static analysis tools and informal reviews (Z = -.471, p = .638), or between walkthroughs and informal reviews (Z = -.749, p = .454), or between static analysis tools and walkthroughs (Z = -1.222, p = .222).

152 significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

6.2.3.7.4.6 Hybrid process (dominated by plan-driven practices, with few agile practices)

Friedman test indicated no statistically significant difference in the perceived product quality of different static code analysis techniques (χ2(2) = 6.805, p = 0.078). Post hoc analysis with Wilcoxon signed-rank tests were used to follow up this finding. A Bonferroni correction was applied with a significance level set at p < .0083. Wilcoxon test indicated no statistically significant difference in perceived product quality between informal reviews and inspections (Z = -.302, p = .763), or between walkthroughs and inspections (Z = -.632, p =.527), or between static analysis tools and inspection (Z = -2.111, p = .035) or between static analysis tools and informal reviews (Z = -1.540, p = .124), or between walkthroughs and informal reviews (Z = -.264, p = .792), or between static analysis tools and walkthroughs (Z = -1.089, p = .276).

Conclusions for companies using Hybrid process (dominated by plan-driven practices, with few agile practices): Industrial practitioners working for companies using Hybrid process dominated by plan-driven practices, with few agile practices perceive that none of the static code analysis techniques significantly improves product quality than the others. The techniques can be ranked in the following order starting with the ones that product quality with the most:

1. Static analysis tools, Informal reviews, walkthroughs, Inspections.

153

7 DISCUSSION AND GUIDELINES

In this chapter, after conducting the SLR and the survey, the findings of the survey and the SLR will be analyzed, aggregated and summarized to answer our research questions, draw conclusions and develop guidelines for researchers and for industrial practitioners.

7.1 Guidelines for Researchers

The guidelines for researchers will be developed in the light of the systematic literature review findings, the guidelines will answer research questions, RQ1, RQ2, RQ3, and RQ4. For researchers, the guidelines will do the following:

 Reveal the current state of research in terms of: a. Which static code analysis techniques received most of the researchers attention (RQ.1). b. What kind of research practices have been carried out (RQ1.2). c. What variables have been used by researchers to investigate and report the benefits and limitations of static code analysis techniques (RQ1.3).  Reveal the state of rigor and industrial relevance in static code analysis research(RQ2) and explain why.  List the benefits and limitations of different static code analysis techniques reported by researchers, and show the strength of evidence (in terms of rigor and relevance) supporting these reported benefits and limitations (RQ3 and RQ4).  Give recommendations on how to improve future static code analysis research.  Give recommendation on how to improve rigor and relevance of future static code analysis research.

Researchers can use these guidelines to:

 Fill the gap in research by redirecting attention towards less evaluated techniques. This will be possible because the researcher will know which static code analysis techniques have been fairly evaluated and which techniques haven’t, what research practices have been carried out and which practices haven’t.  Have an overview of the state of rigor and relevance in existing static code analysis research.  Have an overview of the benefits and limitations of static code analysis techniques reported by researchers.  Use consistent variables, measuring criteria in future evaluations of static code analysis techniques. This will help in conducting re-producible, consistent, easy to build on, and easy to aggregate research.  Improve rigor of future static code analysis research by following the recommendations on how to conduct rigor research.  Improve industrial relevance of future static code analysis techniques research by following the recommendations on how to conduct industrial relevant research.

7.1.1 RQ1 – State of static code analysis research

RQ1.1 For the techniques that received most of researcher’s attention, the SLR revealed that inspection is the technique which grabbed most of the attention in research, a total of 53 studies

154 evaluated inspection. Static code analysis tools comes in the second rank with 21 studies evaluating them. Informal reviews have been evaluated in 4 studies and walkthroughs have been evaluated only in one study. These findings are shown in details in table 5.11.

RQ1.2 For the types of research practices, referring to Table 5.11 we can see that for the studies evaluating informal reviews, walkthroughs and static code analysis tools, only one practice type is identified, which is solely evaluating them as code analysis technique. The case is different for inspection, since it’s the only code analysis technique which have a well-defined formal process. For the studies evaluating Inspection four practice types are identified, the first practice type evaluated inspection solely as a code analysis technique. The second practice type evaluated the factors that influence the inspection process such as team size, number of sessions, use of procedural roles, experience of participants, process maturity and process environment. The third practice type evaluated the change to the inspections structure. The fourth practice type evaluated the techniques that support the inspection structure such as reading techniques, computer support to the inspection process and support for re inspection. All these studies are shown in details in table 5.11 and in sections 5.2.3.1 and 5.2.4.

We also would like to emphasize that none of the other static code analysis techniques received the same attention as inspection, either in terms of the total number of studies which evaluated them or in terms of the diversity of research practices among the studies which evaluated them. For inspection we can see that not only it received most of the attention in research but studies also evaluated the inspection process elements such as team size. The reason behind this as we observed is the old age of inspection, as it was introduced in 1979 by Michael Fagan [94].

RQ1.3 For the variables used by the studies, the authors analyzed the variables across the studies using thematic analysis to identify the most frequently variables (themes) used to report benefits and limitations in the studies evaluating static code analysis techniques. The authors identified eight variables (patterns) or themes that were recurrent among the studies, they are shown in table 5.12, and for each theme/variable the measuring criteria used to measure it are listed. The benefits and limitations of each technique is reported with respect to these eight variables.

It has also been observed that 44% of studies used experiments as research methods, and the outcome variables in the majority of the experiments are inconsistent, in some experiments the variables are consistent but their measuring criteria are not consistent enough to generalize their findings (see the details in appendix A). The same was found for the other non-experimental studies the variables investigated were heterogeneous and the metrics used to measure them were not consistent enough. Some non-experimental studies did not reveal their variables and some did not reveal the metrics used to investigate the variables.

7.1.2 RQ2 – Rigor and relevance of static code analysis research

After measuring the rigor and industrial relevance of the studies using scoring rubrics in study [74] with details explained in section 5.2.2, we conclude the following regarding the state of rigor and relevance in static code analysis techniques research:

 Only 11 studies (15.71 %) falls in category A (high rigor and high relevance). Only this number of studies can act as a solid empirical basis for researchers and can help facilitate decision making for industrial practitioners looking to adopt static code analysis techniques. This is disappointing from a technology transfer point of view as the rest of the studies have less potential for actually influencing practice.

155

 By looking at Figure 5.7 we can see that more than 61% of the studies falls in the lower part of the figure with low relevance scores, this means that more than half of the studies are fair poorly relevant for static code analysis industry when evaluated based on Ivarsson and Gorscheck criteria [74]. This is disappointing from a technology transfer point of view, as these evaluations have less potential for actually influencing practice [74].

 Looking at figure 5.7 we can also see that more than 48% of the studies falls in the left part of the figure with low rigor scores, here we conclude that almost half of the studies are poorly rigorous for static code analysis research, this is a disappointing form research point of view as these studies cannot act as a solid empirical basis for researchers hindering the progress of research.

 It has also been observed that, most of the studies (technology evaluations) conducted with a high degree or rigor doesn’t necessarily have a good ability to impact industry (high relevance), while most of the technology evaluations conducted in an industrial setting (high relevance) usually have a high degree of rigor (See figure 5.6 and section 5.2.2 for details).

 There is an apparent need to improve the rigor and relevance of static code analysis research. The rigor of studies needs to be improved so they can act as solid empirical basis, while the relevance of the studies need to be improved to improve its ability to impact industry. To do this we need to look at the individual aspects that charactarize rigor and relevance identified by study [74] and and compare hem against the actual rigor and relevance scores for individual studies presented in table 5.7 and table 5.9 in section 5.1.6 to identify the reasons limiting rigor and relevance scores of individual studies.

 Why does 61% of SLR studies have a low relevant scores? Four aspects define relevance: 1) research method, 2) context, 3) subjects and 4) scale [74], now when we look into the research methods and context aspect of the SLR studies we find that experiments have been used as the research method in 31 studies, in other words 44% of studies did use experiments as research method. Also 38 studies, including those who used experiments, are performed in academic context using students as subjects. Since Ivarsson and Gorscheck [74] give experiments and academic contexts a relevance score of zero, we identified this as the major factor that degrades the relevance of research. Second we look into the subjects and scale, regarding subjects 40 studies out of 70 comprising 57% used researcher or students as subjects, regarding scale 40 studies out of 70 comprising 57% used down sized or toy examples as subjects. Since Ivarsson and Gorscheck [74] give students and researcher subjects as well as down sized scales a zero relevance score, the use of student and toy examples subjects contributed significantly in degrading relevance scores of static code analysis research. This indicates a need for improving the relevance of research

 Why does 48% of SLR studies have a low rigor scores? Three aspects define rigor: context, study design and validity threats, referring to table 5.7 the average context score is .7 out of a maximum of 1, the same goes for the study design, while the average score for the validity threats is .5 out of a maximum of 1. the bad reporting of validity threats and their mitigation strategies contributed significantly in degrading rigor. If the average rigor scores are summed together it will result in 1.9 total average rigor, which is still considered low because high rigor scores start from a score of two or higher according to study [74]. This indicates a need for improving the rigor of research.

7.1.3 RQ3 & RQ4 – Benefits and limitations reported by researchers

In section 5.2.3, the benefits/limitations of different static code analysis techniques were extracted from the SLR studies using the variables identified in Table 5.12. Conclusions on benefits/limitations were

156 drawn for each rigor/relevance category (A, B1, B2, C) using vote counting. In this section for each technique these conclusions will be summarized and interpreted to answer RQ3 & RQ4.

7.1.3.1 Inspection

Table 7.1 Summary of inspection benefits and limitations based on the rigor and relevance categories

Conclusions Category-A Category-B1 Category-B2 Category-C All studies studies studies studies studies Effectiveness Positive No evidence Positive Positive Positive Lower number of False Positive No evidence Positive Positive Positive positives Positive (weak Fault content Negative Inconclusive Negtive Negative evidence) Time efficieny Negative No evidence Negative Negative Positive Effort Negative No evidence Negative No evidence efficiency Cost Positive (weak Positive Inconclusive Weak evidence Positive effectiveness evidence) Internal code Positive (weak Positve No evidence No evidence No evidence quality evidence) External code Positive (weak Weak evidence No evidence No evidence No evidence qaulity evidence)

Table 7.1 summarizes the conclusions on inspection’s benefits/limitations considering all rigor/relevance categories. The left most column list the variables representing benefits/limitations, the following column represents the conclusions for each rigor/relevance category. In Table 7.1 the conclusions are represented in one of the following four terms (resulting from vote counting in section 5.2.3.1):

 Positive: The number of studies in favor of the particular variable is greater than number of studies not in favor of that particular variable.  Negative: The number of studies not in favor of the particular variable is greater than number of studies in favor of that particular variable.  No evidence: No studies have evaluated the particular variable.  Weak evidence: Only one study evaluated the particular variable.

Category A studies are the high quality, trustworthy studies with the high rigor and high relevance scores [74], thus only category A studies will be used to answer RQ3 and RQ4, and complement the survey findings on inspection’s benefits and limitations later when answering RQ6.

Conclusions on Inspections benefits and limitations

In Table 7.1 and Section 5.2.3.1, Category A studies showed positive results on inspections cost effectiveness, improvement of internal code quality, improvement of external code quality and ability to capture various types of defects. For inspection’s time efficiency, effort efficiency, effectiveness and number of false positives produced there is no evidence presented in category A studies. However, the number of category A studies evaluating inspection is very low which makes the overall conclusions of Category A studies stands on a weak foundation.

157

7.1.3.2 Informal reviews

As shown in Table 5.12 and Section 5.2.3.3 we have only four studies evaluating informal reviews [4] [11] [41] [61] and none of these studies falls in Category A, thus no solid conclusions can be drawn on informal reviews benefits/limitations from the SLR.

7.1.3.3 Walkthroughs

As shown in Table 5.12 and Section 5.2.3.4, only study [20] evaluated walkthroughs and it doesn’t fall in Category A, thus no solid conclusions can be drawn on walkthroughs benefits/limitations from the SLR.

7.1.3.4 Static analysis tools

Table 7.2 Summary of static analysis tools benefits and limitations based on the rigor and relevance categories

Conclusions Category-A Category-B1 Category-B2 Category-C All studies studies studies studies studies Positive (weak Effectiveness Negative Negative Negative Positive evidence) Lower number Negative (weak of False Negative Negative Negative No evidence evidence) positives Positive (weak Fault content Positive Negative Weak evidence Inconclusive evidence) Negative (weak Time efficieny Positive No evidence Negative Positive evidence) Effort Positive Weak evidence No evidence No evidence Weak evidence efficiency Cost Negative (weak Inconclusive No evidence Weak evidence Positive effectiveness evidence) Internal code Weak evidence No evidence No evidence Weak evidence No evidence quality External code No evidence No evidence No evidence No evidence No evidence qaulity

Conclusions on static analysis tools benefits and limitations

In Table 7.2 and Section 5.2.3.2 Category A studies showed positive results on static analysis tools effectiveness, ability to capture various types of defects. Category A studies showed negative results on the cost effectiveness and ability to produce lower number of false positives. For effort efficiency, internal code quality and external code quality, there is weak or no evidence presented in category A studies. However the number of category A studies evaluating static analysis tools is very low which makes the overall conclusions of Category A studies stands on a weak foundation.

7.1.4 Recommendations for researchers

After answering the first research questions, we recommend the following to improve the state of research:

158

 More empirical evaluations are needed, looking at table 5.11 there is a very less number of studies evaluating informal reviews and walkthroughs as opposed to inspection and static analysis tools. Also there is a shortage in the number of studies evaluating the factors that influence inspection process, specifically the effect of team size, multiple sessions, use of procedural roles, experience of participants, group design, communication in inspection meetings, process maturity and process environment. Research effort can be redirected in toward these areas.

 Use consistent variables in future research, the authors defined 8 variables or themes that were frequently used to evaluate static analysis techniques, they are shown in table 5.12, for each variable the measuring criteria used to measure it among the studies are listed. We recommend researchers to look at these variables and their measuring criteria when conducting future evaluations to come up with more consistent evaluations and to facilitate evidence aggregation.

After answering the second research questions, we recommend the following to improve the state of rigor and relevance:

 Promote industry oriented research in the future, a significant way to improve the relevance of research is to promote industrial oriented research, this implies using subjects, scale and realistic settings close to the ones used in industry [74]. Since we identified a need for improving rigor and relevance in static code analysis research we recommend researchers to follow the researcher guidelines in study [74] on how to conduct industrial relevant research. Also since experiments are widely used in static code analysis research (44%) we recommend researchers to use realistic scales, and involves industrial subjects if they decide to use experiments as their research method.

 Adequately report rigor aspects in future research, we recommend researchers to adequately report rigor aspects such as context and study design, studies [104] [105] provide guidelines for that. Since inadequate discussion of validity threats was identified as the major factor that limited the rigor of the studies we recommend researchers to adequately discuss validity threats along with their mitigation strategies. Studies [79][106] can be followed to discuss validity threats.

7.2 Guidelines for Practitioners

The guidelines for practitioners will be developed in the light of the survey findings and by using some input from the SLR, the guidelines will answer research questions RQ5 and RQ6. For practitioners, the guidelines will do the following:

 Reflect what static code analysis techniques are used most frequently in industry among companies with different sizes, producing different software products and using different software life cycle models. This will be done by answering RQ5.  Reveal if the usage of techniques in industry dpends on company size or product type or life cycle model. This will be done by answering RQ5.1.  Reveal if the usage of techniques in industry dpends on the amount of attention a technique has received in research. This will be done by answering RQ5.2 with input from RQ1.1.  Reveal the benefits/limitations of static code analysis techniques as perceived by industrial professionals, working for companies with different sizes, producing different software products and using different software life cycle models. This will be done by answering RQ6.

159

 Reveal if industrial practitioner’s perception on benefits/limitations of static code analysis techniques is influenced by company size, software life cycle model or software product type. This will be done by answering RQ6.1.  If high quality evidence on benefits/limitations is available form the SLR, the practitioner’s guidelines will reveal if industrial practitioner’s perception on benefits/limitations agree or disagree with the SLR evidence. This will be done by answering RQ6.1 with input from RQ3 and RQ4.

Industrial practitioners can use these guidlines to:

 Have an overview of the trend in industry with respect to the use of different static code analysis techniques. Section 7.2.1.  Have an overview on the Influential factors on the usage of static code analysis techniques in industry. Section 7.2.2.  Understand the gap (Attention in research vs. usage in industry) between static code analysis research and its industry. Section 7.2.3.  Facilitate decision making when adopting static code analysis techniques evaluated and proposed by researchers, a practitioner can refer to section 7.2.4 do the following: o Decide which variables(s) representing benefits/limitations he wants to look at. o For each selected variable(s) look at the corresponding implications/recommendations to select a technique which better fits his needs. o The implications/recommendations for each variable will be presented form three different industrial views 1) company size, 2) software product type and 3) software life cycle model, plus a fourth view from researchers if high quality SLR evidence is avaiable. o Overall, a practitioner can see what the survey tells him, what the SLR tells him, and if the survey agrees or disagree with SLR findings. Then he can decide.

7.2.1 RQ5 – Techniques frequently used in industry

In Section 6.2.2 the usage of different static code analysis techniques among companies with different sizes, producing different software products and using different software life cycle models were presented (different views). For each view the most frequently or always used techniques were identified and ranked, as well as the rarely and never used techniques. In this section the findings of section 6.2.2 will be analyzed to answer RQ5 and its sub research questions.

Table 6.3 summarizes the conclusions on the usage of static code analysis techniques in section 6.2.2. The left most column shows the categories in each view, the middle column shows the very frequently used techniques, the right column shows the rarely used techniques. In middle and left column the techniques are ranked by their order of preference identified in section 6.2.2.

160

Table 7.3 Summary of the usage of static code analysis techniques in industry

Always or frequently used Rarely or never used

techniques techniques 1. Static analysis tools. Global view 1. Inspections, walkthroughs. 2. Informal reviews. Less than 50 1. Static analysis tools. 1. Inspections, walkthroughs. employees 2. Informal reviews. 1. Static analysis tools. 50 – 249 employees 1. Inspections, walkthroughs. 2. Informal reviews. Company size 1. Static analysis tools. 250 – 4,499 1. Inspections, walkthroughs. 2. Informal reviews. More than 4,500 1. Static analysis tools. 1. Informal reviews. employees 2. Inspections. 2. Walkthroughs. Data-dominant 1. Static analysis tools. 1. Inspections, walkthroughs. software 2. Informal reviews. Control-domin 1. Static analysis tools. 1. Inspections, walkthroughs. Company software 2. Informal reviews. product type 1. Static analysis tools. System software 1. Inspections, walkthroughs. 2. Informal reviews. Computation-dominant 1. Static analysis tools. 1. Inspections, walkthroughs. software 2. Informal reviews. 1. Static analysis tools. Waterfall 1. Inspections, walkthroughs. 2. Informal reviews. 1. Informal reviews. Incremental 1. Inspections, walkthroughs. 2. Static analysis tools. Spiral No evidence No evidence 1. Static analysis tools. Company using Agile 2. Informal reviews, 1. Inspections. software life walkthroughs. Hybrid process cycle model 1. Static analysis tools. (dominated by agile 2. Informal reviews, 1. Inspections. practices, with few walkthroughs. plan-driven practices) Hybrid process 1. Static analysis tools. (dominated by plan- 2. Inspections. 1. Walkthroughs. driven practices, with 3. Informal reviews. few agile practices)

Looking at Table 7.3 we can draw the following conclusions:

 Static code analysis tools is the most frequently used technique in industry. All companies, with different sizes, producing different products and using different lifecycles models indicated they use static analysis tools very frequently or always as their first preference to analyze the code, the only exception is with companies using incremental models.  Informal reviews come as the second preference for industrial practitiooners after static analysis tools, they are also used very frequently and always by most categories in table 7.3.  Inspections and walkthroughs are rarely used in industry, in almost every category this is apparent.

161

7.2.2 RQ5.1 – Influential factors on the usage of static code analysis

 Looking at table 7.3 we conclude that the product type and company size doesn’t have an influence on the usage of static code analysis techniques. All company size categories, as well as software product type categories have the same conclusions.  For companies using agile related practices the conclusions on usage differ from the rest of the categories, indicating that the software life cycle model do influence the usage of static code analysis techniques in industry.

7.2.3 RQ5.2 – Attention in research vs. usage in industry

When the survey findings on usage, summarized in table 7.3 are compared with the SLR findings on the state of research summarized in RQ1.1 in section 7.1.1, we conclude the following:

 Static analysis tools and inspections received most of researcher’s attention, static analysis tools are very frequently used in industry, while inspections are rarely used.  Informal reviews and walkthroughs received very less attention in research compared to static analysis tools and inspections, informal reviews are very frequently used in industry, while walkthroughs are rarely used.  This indicates that the amount of attention a static code analysis technique has received in research doesn’t necessarily influence its adoption in industry, we have seen that inspection is the techniques which is most thoroughly evaluated by researchers, yet it is rarely used in industry, on the other hand, informal reviews were rarely evaluated by researchers but very frequently used in industry, this indicates a gap between static code analysis techniques research and industrial practices, as the ideal situation would be that practitoners apodt an evidence based approach to select which technique to use, relying on high quality research findings.

7.2.4 RQ6 – Benefits and limitations as perceived by industry professionals

In section 6.2.3, the survey data on the benefits and limitations of different static code analysis techniques were statistically analyzed, conclusion on benefits/limitations were drawn for companies with different sizes, producing different software products and using different software life cycle models. In this section, the conclusions of section 6.2.3 will be analyzed, summarized and compared to the SLR findings if applicable to answer RQ6 and its sub research questions.

RQ6 and its sub research questions will be answered based on the seven variable representing benefits/limitations defined in table 6.4. For each variable, first the findings of the Survey will be analyzed to answer RQ 6.1, second the survey findings on the particular variable will be cross analyzed with the SLR findings on the variable to answer RQ 6.2, and finally conclusions will be drawn for ech variable considering both findings.

7.2.4.1 Conclusions on effectiveness

7.2.4.1.1 Summary of survey findings on effectiveness – Answering RQ 6.1

Table 7.4 summarizes the survey conclusions on the effectiveness of all static code analysis techniques drawn in section 6.2.3.1. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with the most effective. As explained in section 6.2.3 the rankings resulted from statistical analysis and the difference in effectiveness between the ranks is significant.

162

Table 7.4 Summary of survey findings on the effectiveness of different static code analysis techniques.

Static code analysis techniques ranking 1. Inspection, static analysis tools, informal Global view reviews, walkthroughs.

1. Inspection, static analysis tools, informal Less than 50 employees reviews, walkthroughs.

1. Static analysis tools, inspection, informal reviews 50 – 249 employees 2. Walkthroughs Company size 1. Inspection, static analysis tools, informal 250 – 4,499 reviews, walkthroughs.

1. Inspection, static analysis tools, informal More than 4,500 employees reviews, walkthroughs.

1. Inspection, static analysis tools, informal Data-dominant software reviews, walkthroughs.

1. Inspection, static analysis tools, informal Control-domin software reviews, walkthroughs. Company product type 1. Inspection, static analysis tools, informal System software reviews, walkthroughs.

1. Inspection, static analysis tools, informal Computation-dominant software reviews, walkthroughs.

1. Inspection, static analysis tools, informal Waterfall reviews, walkthroughs.

1. Inspection, static analysis tools, informal Incremental reviews, walkthroughs.

1. Inspection, static analysis tools, informal Spiral reviews, walkthroughs. Company using software life cycle 1. Inspection, static analysis tools, informal model Agile reviews, walkthroughs.

Hybrid process (dominated by 1. Inspection, static analysis tools, informal agile practices, with few plan- reviews, walkthroughs. driven practices) Hybrid process (dominated by 1. Inspection, static analysis tools, informal plan-driven practices, with few reviews, walkthroughs. agile practices)

163

Does the company size, software product type or software life cycle influence industrial practioners perception on the effectiveness of different static code analysis techniques?

 Table 7.4 shows that the software product type and software life cycle model doesn’t influence the effectiveness of static code analysis techniques, this is because all categories representing different product types and software life cycle models rank the techniques in the same order. They match the ranking order in the global view.  Table 7.4 shows that the company size does slightly influence the effectiveness of different static code analysis techniques, only companies with 50 – 249 rank the techniques in a different order. The rest of the categories in the company size view matches the other views and the global view.  To see how categories in one view relates to the categories in the other views, please see Table 7.5.

To help aggregate the findings summarized table 7.4 and give recommendations to practitioners, table 7.5 groups the categories which rank the techniques in the same order across different views. This will help practitioners to see the preference on effectiveness matching their own context in terms of company size, software product type and software life cycle model.

The left column contains the categories representing different views, the categories with the similar rankings have the same color, and the right column contains the rankings on effectiveness of the similar categories. A practitioner can see which context applies to him in terms of company size, software product type and software lifecycle model and then see the corresponding preference on effectiveness.

Table 7.5 Similarities & differences between the categories representing the different views on effectiveness

Static code analysis techniques ranking Global view 1. Inspection, static analysis tools, informal reviews, walkthroughs. Less than 50 employees

1. Inspection, static analysis tools, informal reviews. 50 – 249 employees Company size 2. Walkthroughs.

250 – 4,499 More than 4,500 employees Data-dominant software Company product Control-domin software type System software Computation-dominant software Waterfall 1. Inspection, static analysis tools, informal Incremental reviews, walkthroughs. Spiral Agile Company using software life cycle Hybrid process (dominated by model agile practices, with few plan- driven practices) Hybrid process (dominated by plan-driven practices, with few agile practices)

164

7.2.4.1.2 Survey findings vs. SLR findings on effectiveness - Answering RQ 6.2

Section 7.1.3 summarized the SLR findings on all the benefits and limitations including effectiveness, Category A studies in table 7.1 showed no evidence on the effectiveness of inspections, informal reviews and walkthroughs, for static analysis tools category A studies showed a positive finding, but the evidence is weak.

As we can see in section 7.1.3 there is no solid evidence reported in the SLR by category A studies on effectiveness, rather the evidence reported is weak and the number of category A studies evaluating effectiveness is small, as such:

 The SLR findings on effectiveness cannot complement or be compared to the survey findings on effectiveness to come to more solid conclusions on the effectiveness of different static code analysis techniques.  There is no solid evidence to see if industrial professional’s perception on effectiveness matches category A finding on effectiveness. This means that we can’t see if industrial professionals seeking effectiveness select static code analysis techniques based on effectiveness findings reported by category A studies (high rigor and high relevance studies), this also means we can’t see if rigor and relevance influence the industrial professional opinion when selecting techniques.

7.2.4.1.3 Implications if you are seeking to be highly effective

If you are an industrial practitioner seeking to be highly effective, we conclude the following:

 We cannot consider the SLR findings on effectiveness to recommend you which technique to use. Only survey findings will be considered.

The survey findings in table 7.4 and 7.5 tells you the following:

 The software product type, life cycle model views doesn’t influence effectiveness, they also match the global view findings on effectiveness. Thus form a global, software product type and software lifecycle model views we conclude that you can use any of the static code analysis techniques and still be highly effective. This conclusion also applies to all the company size categories except companies with 50 – 249 employees.  The company size slightly influences effectiveness. Only companies with 50 – 249 employees rank the techniques in a different order than the rest of the categories. Please refer to table 7.5 to see the conclusion for companies with 50 – 249 employees.

7.2.4.2 Conclusions on number of false positives

7.2.4.2.1 Summary of survey findings on number of false positives – Answering RQ 6.1

Table 7.6 summarizes survey conclusions on the number of false positives produced by different static code analysis techniques drawn in section 6.2.3.2. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with the techniques producing the lowest number of false positives. As explained in section 6.2.3 the rankings were obtained using statistical analysis and the difference in the number of produced false positives between the ranks is significant.

165

Table 7.6 Summary of survey findings on the number of false positives produced by different static code analysis techniques

Static code analysis techniques ranking 1. Inspections, informal reviews, walkthroughs Global view Static analysis tools. 1. Inspections, informal reviews, walkthroughs Less than 50 employees 2. Static analysis tools. 1. Static analysis tools, inspection, informal 50 – 249 employees reviews, Walkthroughs Company size 1. Static analysis tools, inspection, informal 250 – 4,499 reviews, Walkthroughs 1. Inspections, informal reviews, walkthroughs More than 4,500 employees 2. Static analysis tools 1. Inspections, informal reviews, walkthroughs Data-dominant software 2. Static analysis tools. 1. Inspections, informal reviews, walkthroughs Control-domin software Company product 2. Static analysis tools. type 1. Inspections, informal reviews, walkthroughs System software 2. Static analysis tools. 1. Inspections, informal reviews, walkthroughs Computation-dominant software 2. Static analysis tools 1. Inspections, informal reviews, walkthroughs Waterfall 2. Static analysis tools 1. Inspections, informal reviews, walkthroughs Incremental 2. Static analysis tools. 1. Inspections, informal reviews, walkthroughs Spiral 2. Static analysis tools Company using 1. Inspections, informal reviews, walkthroughs software life cycle Agile 2. Static analysis tools model Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs agile practices, with few plan- 2. Static analysis tools driven practices) Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs plan-driven practices, with few 2. Static analysis tools. agile practices)

Does the company size, software product type or software life cycle influence the number of false positives produced by static code analysis techniques? How do these views compare to each other and the global view?

Table 7.6 shows that the company size, software product type and software life cycle model influence the perceived number of false positives produced by different static code analysis techniques. This is because, for each view, not all the categories representing it rank the techniques in the same order.

The different views differ from each other and differ with the global view as well. There are no two views that rank the techniques in the same order. To see how categories in one view are similar or different to the categories in the other views, please see table 7.7.

To help aggregate the findings summarized table 7.6 and give recommendations to practitioners, table 7.7 groups the categories which rank the techniques in the same order across different views. This will help practitioners to see the preference on the number of false positives matching their own context in terms of company size, software product type and software life cycle model.

166 can see which context applies to him in terms of company size, software product type and software lifecycle model and then see the corresponding preference on number of false positives.

Table 7.7 Similarities & differences between the categories representing the different views on number of false positives

Static code analysis techniques ranking 1. Inspections, informal reviews, walkthroughs Global view Static analysis tools.

1. Static analysis tools, inspection, informal Less than 50 employees reviews, Walkthroughs

1. Static analysis tools, inspection, informal 50 – 249 employees reviews, Walkthroughs Company size 1. Inspections, informal reviews, walkthroughs 250 – 4,499 2. Static analysis tools

1. Inspections, informal reviews, walkthroughs More than 4,500 employees 2. Static analysis tools

1. Inspections, informal reviews, walkthroughs Data-dominant software 2. Static analysis tools

1. Inspections, informal reviews, walkthroughs Control-domin software 2. Static analysis tools Company product type 1. Inspections, informal reviews, walkthroughs System software 2. Static analysis tools

1. Inspections, informal reviews, walkthroughs Computation-dominant software 2. Static analysis tools

1. Inspections, informal reviews, walkthroughs Waterfall 2. Static analysis tools.

1. Inspections, informal reviews, walkthroughs Incremental 2. Static analysis tools.

1. Inspections, informal reviews, walkthroughs Spiral 2. Static analysis tools. Company using software life cycle 1. Inspections, informal reviews, walkthroughs model Agile 2. Static analysis tools.

Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs agile practices, with few plan- 2. Static analysis tools. driven practices) Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs plan-driven practices, with few 2. Static analysis tools. agile practices)

167

7.2.4.2.2 Survey findings vs. SLR findings on number of false positives – Answering RQ 6.2

Section 7.1.3 summarized the SLR findings on all the benefits and limitations including the number of false positives, As seen in section 7.1.3 Category A studies showed no evidence on the number of false positives for inspections, informal reviews and walkthroughs, for static analysis tools category A studies showed a negative findings, but the evidence is weak.

As we can see in section 7.1.3 there is no solid evidence reported in the SLR by category A studies on the number of false positives, rather the evidence reported is weak and the number of category A studies evaluating number of false positives is low, as such:

 The SLR findings on number of false positives cannot complement or be compared to the survey findings on number of false positives to come to more solid conclusions on the number of false positives produced by different static cod analysis techniques.  There is no solid evidence to see if industrial professional’s perception on number of false positives matches category A findings on number of false positives. This means that we can’t see if industrial professionals seeking low number of false positives select static code analysis techniques based on the findings on the number of false positives reported by category A studies (high rigor and high relevance studies), this also means we can’t see if rigor and relevance influence the industrial professional opinion when adopting techniques.

7.2.4.2.3 Implications if you are seeking a low number of false positives

If you are an industrial practitioner seeking to be have a low number of false positives we conclude the following:

 We cannot consider the SLR findings on number of false positives to recommend you which technique to use. Only survey findings will be considered.

The survey findings in table 7.6 and 7.7 tells you the following:

 The company size, software product type, life cycle model influence the perceived number of false positives produced by different static code analysis techniques, they also differ from the global view on perceived number of false positives. Thus form a software company size, product type and software lifecycle model views we cannot recommend you which technique to use if you want to have a low number of false positives.  Please refer to table 7.7 to see the preference of static code analysis techniques with respect to the number of false positives produced, please see the context that applies to you in terms of company size, software product type and software life cycle model and see the corresponding preference.

7.2.4.3 Conclusions on fault content

7.2.4.3.1 Summary of survey findings on fault content – Answering RQ 6.1

Table 7.8 summarizes survey conclusions on the perceived ability of static code analysis techniques to capture various types of defects, drawn in section 6.2.3.3. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with the most effective. As explained in section 6.2.3 the difference in perceived fault content between the ranks is significant.

Table 7.8 Summary of survey findings on the perceived fault content of different static code analysis techniques.

168

Static code analysis techniques ranking Global view 1. Inspections, informal reviews, walkthroughs, Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Less than 50 employees Static analysis tools.

1. Inspections, informal reviews, walkthroughs, 50 – 249 employees Static analysis tools. Company size 1. Inspections, informal reviews, walkthroughs, 250 – 4,499 Static analysis tools.

1. Inspections, informal reviews, walkthroughs, More than 4,500 employees Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Data-dominant software Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Control-domin software Static analysis tools. Company product type 1. Inspections, informal reviews, walkthroughs, System software Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Computation-dominant software Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Waterfall Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Incremental Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Spiral Static analysis tools. Company using software life cycle 1. Inspections, informal reviews, walkthroughs, model Agile Static analysis tools.

Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs, agile practices, with few plan- Static analysis tools. driven practices) Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs, plan-driven practices, with few Static analysis tools. agile practices)

Does the company size, software product type or software life cycle influence the perceived fault content of static code analysis techniques? How do these views compare to each other and the global view?

 Table 7.8 shows that the company size software product type and software life cycle model doesn’t influence the perceived fault content of static code analysis techniques, this is because all categories representing different product types and software life cycle models rank the techniques in the same order. They match the ranking order in the global view as well.

169

7.2.4.3.2 Survey findings vs. SLR findings on fault content – Answering RQ 6.2

Section 7.1.3 summarized the SLR findings on all the benefits and limitations including perceived fault content, Category A studies showed no evidence on the perceived fault content of informal reviews and walkthroughs, for static analysis tools and inspections category A studies showed a positive finding, but the evidence is weak.

As we can see in section 7.1.3 there is no solid evidence reported in the SLR by category A studies on perceived fault content, rather the evidence reported is weak and the number of category A studies evaluating perceived fault content is low, as such:

 The SLR findings on perceived fault content cannot complement or be compared to the survey findings on perceived fault content to come to more solid conclusions on the perceived fault content of static cod analysis techniques.  There is no solid evidence to see if industrial professional’s perception on perceived fault content matches category A findings on perceived fault content. This means that we can’t see if industrial professionals seeking to capture a wide variety of defects select static code analysis techniques based on the findings on perceived fault content reported by category A studies (high rigor and high relevance studies), this also means that we can’t see if rigor and relevance influence the industrial professional opinion when selecting techniques.

7.2.4.3.3 Implications if you are seeking to capture a wide number of defects

If you are an industrial practitioner seeking to capture wide variety of defects we conclude the following:

 We cannot consider the SLR findings on perceived fault content to recommend you which technique to use. Only survey findings will be considered.  The survey findings in table 7.8 tells you that the company size, software product type, life cycle model doesn’t influence perceived fault content, they also match the global view on perceived fault content. Thus form a global, company size, software product type and software lifecycle model views we conclude that you can use any of the static code analysis techniques and still be able to capture various types of defects.

7.2.4.4 Conclusions on cost efficiency

7.2.4.4.1 Summary of survey findings on cost efficiency – Answering RQ 6.1

Table 7.9 summarizes survey conclusions on the perceived cost efficiency of different static code analysis techniques drawn in section 6.2.3.4. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with the most cost efficient techniques. As explained in section 6.2.3 the difference in the perceived cost efficiency between the ranks is significant.

Table 7.9 Summary of survey findings on the cost efficiency of different static code analysis techniques

Static code analysis techniques ranking 1. Static analysis tools. Global view 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Less than 50 employees 2. Inspections, informal reviews, walkthroughs. Company size 1. Static analysis tools. 50 – 249 employees 2. Inspections, informal reviews, walkthroughs.

170

1. Static analysis tools, Inspections, informal 250 – 4,499 reviews, walkthroughs. 1. Static analysis tools. More than 4,500 employees 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Data-dominant software 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Control-domin software Company product 2. Inspections, informal reviews, walkthroughs. type 1. Static analysis tools. System software 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools, Inspections, informal Computation-dominant software reviews, walkthroughs. 1. Static analysis tools, Inspections, informal Waterfall reviews, walkthroughs. 1. Static analysis tools, Inspections, informal Incremental reviews, walkthroughs. 1. Static analysis tools, Inspections, informal Spiral reviews, walkthroughs. Company using 1. Static analysis tools. software life cycle Agile 2. Inspections, informal reviews, walkthroughs. model Hybrid process (dominated by 1. Static analysis tools. agile practices, with few plan- 2. Inspections, informal reviews, walkthroughs. driven practices) Hybrid process (dominated by 1. Static analysis tools. plan-driven practices, with few 2. Inspections, informal reviews, walkthroughs. agile practices)

Does the company size, software product type or software life cycle influence the perceived cost efficiency of static code analysis techniques? How do these views compare to each other and the global view?

Table 7.9 shows that the company size, software product type and software life cycle model influence the perceived cost efficiency by different static code analysis techniques. This is because, for each view, not all the categories representing it rank the techniques in the same order.

The different views differ from each other and differ with the global view as well. There is no two view where all their categories rank the techniques in the same order. To see how categories in one view are similar or different to the categories in the other views, please see table 7.10.

To help aggregate the findings summarized in Table 7.9 and give recommendations to practitioners, table 7.10 groups the categories which rank the techniques in the same order across different views. This will help practitioners to see the preference on the perceived cost efficiency suitable to their own context in terms of company size, software product type and software life cycle model.

The left column contains the categories of the different views, the categories with the similar rankings have the same color, and the right column contains the rankings of the similar categories. A practitioner can see which context applies to him in terms of company size, software product type and software lifecycle model and then see the corresponding preference on cost efficiency.

Table 7.10 Similarities & differences between the categories representing the different views on cost efficiency

Static code analysis techniques ranking 1. Static analysis tools. Global view 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Company size Less than 50 employees 2. Inspections, informal reviews, walkthroughs.

171

1. Static analysis tools. 50 – 249 employees 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. 250 – 4,499 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. More than 4,500 employees 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Data-dominant software 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Control-domin software Company product 2. Inspections, informal reviews, walkthroughs. type 1. Static analysis tools. System software 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Computation-dominant software 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools. Waterfall 2. Inspections, informal reviews, walkthroughs. 1. Static analysis tools, Inspections, informal Incremental reviews, walkthroughs. 1. Static analysis tools, Inspections, informal Spiral reviews, walkthroughs. Company using 1. Static analysis tools, Inspections, informal software life cycle Agile reviews, walkthroughs. model Hybrid process (dominated by 1. Static analysis tools, Inspections, informal agile practices, with few plan- reviews, walkthroughs. driven practices) Hybrid process (dominated by 1. Static analysis tools, Inspections, informal plan-driven practices, with few reviews, walkthroughs. agile practices)

7.2.4.4.2 Survey findings vs. SLR findings on cost efficiency – Answering RQ 6.2

Section 7.1.3 summarized the SLR findings on all the benefits and limitations including the cost efficiency, As seen in section 7.1.3 Category A studies showed weak evidence on the perceived cost efficiency off inspections and static analysis tools, for informal reviews and walkthroughs there is no category A studies evaluating their cost efficiency.

As we can see in section 7.1.3 there is no solid evidence reported in the SLR by category A studies on the cost efficiency, rather the evidence reported is weak and the number of category A studies evaluating cost efficiency is low, as such:

 The SLR findings on cost efficiency cannot complement or be compared to the survey findings on cost efficiency to come to more solid conclusions on the cost efficiency produced by different static cod analysis techniques.  There is no solid evidence to see if industrial professional’s perception on cost efficiency matches category A findings on cost efficiency. This means that we can’t see if industrial professionals seeking higher cost efficiency select static code analysis techniques based on the findings on the cost efficiency reported by category A studies (high rigor and high relevance studies), this also means we can’t see if rigor and relevance influence the industrial professional opinion when adopting cost efficient techniques.

7.2.4.4.3 Implications if you are seeking to be cost efficiency

If you are an industrial practitioner seeking to be cost efficient we conclude the following:

172

 We cannot consider the SLR findings on cost efficiency to recommend you which technique to use. Only survey findings will be considered.

The survey findings in table 7.9 and 7.10 tells you the following:

 The company size, software product type and life cycle model influence the perceived cost efficiency of different static code analysis techniques, they also differ from the global view on perceived cost efficiency. Thus form a software company size, product type and software lifecycle model views we cannot recommend you which technique to use if you want to be highly cost efficient.  Please refer to table 7.10 to see the preference of static code analysis techniques with respect to the cost efficiency, please see the context that applies to you in terms of company size, software product type and software life cycle model and see the corresponding preference on cost efficiency.

7.2.4.5 Conclusions on ease of use

7.2.4.5.1 Summary of survey findings on ease of use – Answering RQ 6.1

Table 7.11 summarizes survey conclusions on the perceived ease of use of different static code analysis techniques drawn in section 6.2.3.5. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with easier to use techniques. As explained in section 6.2.3 the difference in the perceived ease of use between the ranks is significant. The different

Table 7.11 Summary of survey findings on the ease of use of different static code analysis techniques

Static code analysis techniques ranking 1. Static analysis tools. Global view 2. Informal reviews, walkthroughs. 3. Inspections. 1. Static analysis tools. Less than 50 employees 2. Informal reviews, walkthroughs. 3. Inspections. 1. Static analysis tools, Informal reviews, 50 – 249 employees walkthroughs, Inspections. Company size 1. Static analysis tools, Informal reviews, 250 – 4,499 walkthroughs, Inspections. 1. Static analysis tools. More than 4,500 employees 2. Informal reviews, walkthroughs. 3. Inspections. 1. Static analysis tools. Data-dominant software 2. Informal reviews, walkthroughs. 3. Inspections. 1. Static analysis tools. Company product Control-domin software 2. Informal reviews, walkthroughs. type 3. Inspections. 1. Static analysis tools, Informal reviews, System software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Computation-dominant software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Waterfall walkthroughs, Inspections. Company using 1. Static analysis tools, Informal reviews, software life cycle Incremental walkthroughs, Inspections. model 1. Static analysis tools, Informal reviews, Spiral walkthroughs, Inspections.

173

1. Static analysis tools. Agile 2. Informal reviews, walkthroughs. 3. Inspections. Hybrid process (dominated by 1. Static analysis tools, Informal reviews, agile practices, with few plan- walkthroughs, Inspections. driven practices) Hybrid process (dominated by 1. Static analysis tools, Informal reviews, plan-driven practices, with few walkthroughs. agile practices) 2. Inspections.

Does the company size, software product type or software life cycle influence the perceived ease of use of static code analysis techniques? How do these views compare to each other and the global view?

Table 7.11 shows that the company size, software product type and software life cycle model influence the perceived ease of use of different static code analysis techniques. This is because, for each view, not all the categories representing it rank the techniques in the same order.

The different views differ from each other and differ with the global view as well. There is no two views where all their categories rank the techniques in the same order. To see how categories in one view are similar or different to the categories in the other views, please see table 7.12.

To help aggregate the findings summarized table 7.11 and give recommendations to practitioners, table 7.12 groups the categories which rank the techniques in the same order across different views. This will help practitioners to see the preference on the perceived ease of use suitable to their own context in terms of company size, software product type and software life cycle model.

Table 7.12 Similarities & differences between the categories representing the different views on ease of use.

174

1. Static analysis tools, Informal reviews, Computation-dominant software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Waterfall walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Incremental walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Spiral walkthroughs, Inspections. Company using 1. Static analysis tools. software life cycle Agile 2. Informal reviews, walkthroughs. model 3. Inspections. Hybrid process (dominated by 1. Static analysis tools, Informal reviews, agile practices, with few plan- walkthroughs, Inspections. driven practices) Hybrid process (dominated by 1. Static analysis tools, Informal reviews, plan-driven practices, with few walkthroughs. agile practices) 2. Inspections.

7.2.4.5.2 Survey findings vs. SLR findings on ease of use – Answering RQ 6.2

 The SLR findings on ease of use cannot complement or be compared to the survey findings on ease of use to come to more solid conclusions on the perceived ease of use of different static cod analysis techniques.  There is no solid evidence to see if industrial professional’s perception on ease of use matches category A findings on ease of use. This means that we can’t see if industrial professionals seeking easier to use techniques select static code analysis techniques based on the findings on the ease of use reported by category A studies (high rigor and high relevance studies), this also means we can’t see if rigor and relevance influence the industrial professional opinion when adopting easier to use techniques.

7.2.4.5.3 Implications if you are seeking easier to use techniques

If you are an industrial practitioner seeking easier to use techniques, we conclude the following:

 We cannot consider the SLR findings on cost efficiency to recommend you which technique to use. Only survey findings will be considered.

The survey findings in table 7.11 and 7.12 tells you the following:

 The company size, software product type and life cycle model influence the perceived ease of use of different static code analysis techniques, they also differ from the global view on perceived ease of use. Thus form a software company size, product type and software lifecycle model views we cannot recommend you which technique to use if you want to be highly cost efficient.  Please refer to table 7.12 to see the preference of static code analysis techniques with respect to their ease of use, please see the context that applies to you in terms of company size, software product type and software life cycle model and see the corresponding preference on ease of use.

175

7.2.4.6 Conclusions on internal code quality

7.2.4.6.1 Summary of survey findings on internal code quality – Answering RQ 6.1

Table 7.13 summarizes survey conclusions on the perceived internal code quality of different static code analysis techniques drawn in section 6.2.3.6. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with techniques that improve internal code quality the most. As explained in section 6.2.3.6 the difference in the perceived internal code quality between the ranks is significant.

Table 7.13 Summary of survey findings on the internal code quality of different static code analysis techniques.

Static code analysis techniques ranking 1. Static analysis tools, Informal reviews, Global view walkthroughs. 2. Inspections. 1. Static analysis tools, Informal reviews, Less than 50 employees walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, 50 – 249 employees walkthroughs. Company size 2. Inspections. 1. Static analysis tools, Informal reviews, 250 – 4,499 employees walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, More than 4,500 employees walkthroughs, Inspections. 1. Informal reviews, walkthroughs. Data-dominant software 2. Inspections, Static analysis tools. 1. Static analysis tools, Informal reviews, Control-domin software Company product walkthroughs, Inspections. type 1. Static analysis tools, Informal reviews, System software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Computation-dominant software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Waterfall walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Incremental walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Spiral walkthroughs, Inspections. Company using 1. Static analysis tools, Informal reviews, software life cycle Agile walkthroughs. model 2. Inspections. Hybrid process (dominated by 1. Static analysis tools, Informal reviews, agile practices, with few plan- walkthroughs, Inspections. driven practices) Hybrid process (dominated by 1. Static analysis tools, Informal reviews, plan-driven practices, with few walkthroughs, Inspections. agile practices)

Does the company size, software product type or software life cycle model influence the perceived internal code quality of static code analysis techniques? How does these views compare to each other and to the global view?

176

Table 7.13 shows that the company size, software product type and software life cycle model influence the perceived internal code quality of different static code analysis techniques. This is because, for each view, not all the categories representing it rank the techniques in the same order.

The different views differ from each other and differ with the global view as well. There are no two views where all their categories rank the techniques in the same order. To see how categories in one view are similar or different to the categories in the other views, please see table 7.14.

To help aggregate the findings summarized table 7.13 and give recommendations to practitioners, table 7.14 groups the categories which rank the techniques in the same order across different views. This will help practitioners to see the preference on the perceived internal code quality suitable for their own context in terms of company size, software product type and software life cycle model.

Table 7.14 Similarities & differences between the categories representing the different views on internal code quality.

Static code analysis techniques ranking 1. Static analysis tools, Informal reviews, Global view walkthroughs. 2. Inspections. 1. Static analysis tools, Informal reviews, Less than 50 employees walkthroughs. 2. Inspections. 1. Static analysis tools, Informal reviews, 50 – 249 employees walkthroughs, Inspections. Company size 1. Informal reviews, walkthroughs. 250 – 4,499 employees 2. Inspections, Static analysis tools. 1. Static analysis tools, Informal reviews, More than 4,500 employees walkthroughs. 2. Inspections. 1. Static analysis tools, Informal reviews, Data-dominant software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Control-domin software Company product walkthroughs, Inspections. type 1. Static analysis tools, Informal reviews, System software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Computation-dominant software walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Waterfall walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Incremental walkthroughs, Inspections. 1. Static analysis tools, Informal reviews, Spiral walkthroughs, Inspections. Company using 1. Static analysis tools, Informal reviews, software life cycle Agile walkthroughs, Inspections. model Hybrid process (dominated by 1. Static analysis tools, Informal reviews, agile practices, with few plan- walkthroughs, Inspections. driven practices) Hybrid process (dominated by 1. Static analysis tools, Informal reviews, plan-driven practices, with few walkthroughs, Inspections. agile practices)

177

7.2.4.6.2 Survey findings Vs. SLR findings on internal code quality – Answering RQ 6.2

Section 7.1.3 summarized the SLR findings on all the benefits and limitations including the ease of use, As seen in section 7.1.3 there is no evidence reported by Category A studies on the perceived internal code quality of static analysis tools, informal reviews and walkthroughs, for inspections category A studies shows a positive but weak finding, as such:

 The SLR findings on internal code quality cannot complement or be compared to the survey findings on internal code quality to come to more solid conclusions on the perceived internal code quality of different static code analysis techniques.  There is no solid evidence to see if industrial professional’s perception on internal code quality matches category A findings on cost efficiency. This means that we can’t see if industrial professionals seeking techniques with high internal code quality select static code analysis techniques based on the findings on the ease of use reported by category A studies (high rigor and high relevance studies), this also means we can’t see if rigor and relevance influence the industrial professional opinion when adopting techniques with higher internal code quality.

7.2.4.6.3 Implications if you are seeking improving internal code quality

If you are an industrial practitioner seeking techniques with high internal code quality, we conclude the following:

 We cannot consider the SLR findings on internal code quality to recommend you which technique to use. Only survey findings will be considered.

The survey findings in table 7.13 and 7.14 tells you the following:

 The company size, software product type and life cycle model influence the perceived internal code quality of different static code analysis techniques, they also differ from the global view on perceived internal code quality. Thus form a software company size, product type and software lifecycle model views we cannot recommend you which technique to use if you want to significantly improve internal code quality.  Please refer to table 6.14 to see the preference of static code analysis techniques with respect to their internal code quality, please see the context that applies to you in terms of company size, software product type and software life cycle model and see the corresponding preference on internal code quality.

7.2.4.7 Conclusions on product quality

7.2.4.7.1 Summary of survey findings on product quality – Answering RQ 6.1

Table 7.15 summarizes survey conclusions on the perceived product quality of static code analysis techniques drawn in section 6.2.3.7. The left column contains the categories representing different company sizes, software product types and life cycle models, the right column contains the ranking of different static code analysis techniques starting with the technique that achieves the highest product quality. As explained in section 6.2.3.7 the difference in perceived product quality between the ranks is significant.

Table 7.15 Summary of survey findings on the perceived product quality of different static code analysis techniques

178

Static code analysis techniques ranking 1. Inspections, informal reviews, walkthroughs, Global view Static analysis tools.

1. Inspections, informal reviews, walkthroughs, Less than 50 employees Static analysis tools. 1. Inspections, informal reviews, walkthroughs, 50 – 249 employees Static analysis tools. Company size 1. Inspections, informal reviews, walkthroughs, 250 – 4,499 employees Static analysis tools. 1. Inspections, informal reviews, walkthroughs, More than 4,500 employees Static analysis tools. 1. Inspections, informal reviews, walkthroughs, Data-dominant software Static analysis tools. 1. Inspections, informal reviews, walkthroughs, Control-domin software Company product Static analysis tools. type 1. Inspections, informal reviews, walkthroughs, System software Static analysis tools. 1. Inspections, informal reviews, walkthroughs, Computation-dominant software Static analysis tools. 1. Inspections, informal reviews, walkthroughs, Waterfall Static analysis tools. 1. Inspections, informal reviews, walkthroughs, Incremental Static analysis tools. 1. Inspections, informal reviews, walkthroughs, Spiral Static analysis tools. Company using 1. Inspections, informal reviews, walkthroughs, software life cycle Agile Static analysis tools. model Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs, agile practices, with few plan- Static analysis tools. driven practices) Hybrid process (dominated by 1. Inspections, informal reviews, walkthroughs, plan-driven practices, with few Static analysis tools. agile practices)

Does the company size, software product type or software life cycle influence the perceived product quality of static code analysis techniques? How do these views compare to each other and the global view?

 Table 7.15 shows that the company size software product type and software life cycle model doesn’t influence the perceived product quality of static code analysis techniques, this is because all categories representing different product types and software life cycle models rank the techniques in the same order. They match the ranking order in the global view as well.

7.2.4.7.2 Survey findings Vs. SLR findings on product quality – Answering RQ 6.2

Section 7.1.3 summarized the SLR findings on all the benefits and limitations including perceived product quality, Category A studies showed no evidence on the perceived product quality of informal reviews, walkthroughs, and static analysis tools, for inspections category A studies showed a positive but finding, but the evidence is weak.

As we can see in section 7.1.3 there is no solid evidence reported in the SLR by category A studies on perceived product quality, rather the evidence reported is weak and the number of category A studies evaluating perceived product quality is low, as such:

179

 The SLR findings on perceived product quality cannot complement or be compared to the survey findings on perceived product quality to come to more solid conclusions on the perceived product quality of static cod analysis techniques.  There is no solid evidence to see if industrial professional’s perception on perceived product quality matches category A findings on perceived product quality. This means that we can’t see if industrial professionals seeking perceived product quality select static code analysis techniques based on the findings on perceived product quality reported by category A studies (high rigor and high relevance studies), this also means we can’t see if rigor and relevance influence the industrial professional opinion when selecting techniques.

7.2.4.7.3 Implications if you are seeking improving product quality

If you are an industrial practitioner seeking to improve product quality we conclude the following:

 We cannot consider the SLR findings on perceived product quality to recommend you which technique to use. Only survey findings will be considered.  The survey findings in table 6.15 tells you that the company size, software product type, life cycle model doesn’t influence perceived product quality, they also match the global view on perceived product quality. Thus form a global, company size, software product type and software lifecycle model views we conclude that you can use any of the static code analysis techniques and still improve product quality significantly.

180

8 CONCLUSIONS

The aim of this study is to contribute towards bridging the gap between static code analysis research and industry. To aim of this study is broken into two main objectives, first evaluating static code analysis research and improving its its ability to impact industry, second facilitating decision making for industrial practitioners when adopting static code analysis techniques to improve the ability of industry to impact and orient research.

The contribution of this study is twofold, first it provided guidelines for researchers, these guidelines 1) identified which static code analysis techniques has been fairly evaluated, 2) identified reported benefits/limitation of static code analysis techniques and which variables/measuring criteria were used to report them, 3) scored rigor and industrial relevance of existing static code analysis research, and the reason behind it 4) provided recommendation for researchers outlining how to conduct high quality, industry oriented research. Four research questions were formulated to address and develop the researcher’s guidelines and achieve the first study objective, RQ1, RQ2, RQ3 and RQ4, they are answered in the light of a comprehensive systematic literature review and exist fully in section 7.1.

Second this study provided guidelines for practitioners, these guidelines 1) investigated the adoption of static code analysis techniques in industry and what does that depends on (e.g. company size, software product type, life cycle model) and investigated the gap (Attention in research vs. usage in industry) between static code analysis research and its industry 2) identified benefits/limitations of static code analysis techniques from industrial perspective, compared them to benefits and limitations reported by researchers and made facts available for practitioners, facilitating their decision making when deciding which technique to adopt. Two research questions, RQ5 and RQ6, were formulated to develop the practitioner’s guidelines along with input from RQ1, RQ3 and RQ4, all together achieved the second study objective. RQ5 and RQ6, were answered fully in the light of the online questionnaire based survey.

RQ1: What is the state of research in static code analysis?

The SLR concluded that some static code analysis techniques received significantly more attention than the others, where inspections and static analysis tools received fair attention, informal reviews and walkthroughs were almost neglected in comparison. Inspections received the most attention in research because it existed early and it’s the only static code analysis technique which have a formal well defined process and process elements, therefore there is studies evaluating its process and process elements as well. All other techniques were evaluated solely because they don’t have that well-structured and defined process and process elements. However this doesn’t necessarily mean that inspection is a better technique than the others because of its defined process only, but it tells us that when you have a well- defined process and process elements there is a good chance that researchers will evaluate your process and thus the technique will end up receiving more attention and having more research scrutiny. With respect to the variables used by researchers to report benefits/limitation of static code analysis techniques, they found to be seven most frequently used variables. We found that experiments are widely used as a research method in static code analysis research but the outcome variables in the majority of the experiments are inconsistent, in some experiments the variables were consistent but their measuring criteria are not consistent enough to generalize their findings.

RQ2: What is the state of rigor and relevance of static code analysis research? And why?

The SLR concluded that only 15% of the studies have a high rigor and a high relevance score making their results highly capable of influencing industry, the authors further traced the root cause contributing to degraded rigor and relevance, by tracing the aspects that define rigor/relevance described in study [74]. The use of experiments as the research method in approximately 50% of studies was identified as the major factor limiting the industrial relevance of static code analysis research because most of them use students as subjects and are performed in academic context not industrial context (experiments and

181 student subjects are given a relevance score of zero as in scoring rubrics in study [74]). On the other hand, the inadequate reporting of validity threats and their mitigation strategies was identified as the major factor contributing to poor rigor of static code analysis research.

RQ3: What are the benefits and limitations of different static code analysis techniques reported in literature? RQ4: What is the strength of evidence (in terms of rigor and relevance) supporting the claimed benefits and limitations of static code analysis?

The SLR concluded that rigor and relevance could not strengthen the reported benefits/limitations because only 15% of studies have a high rigor/high relevance score and can act as a solid empirical evidence to be provided for practitioners looking to adopt techniques proposed in research. Further when we tried to aggregate the benefits/limitations of the techniques using the common variables identified by RQ1.3, if we look at each variable the number of studies per variable is too low as well, making it even harder to draw solid conclusions. This is disappointing from a technology transfer point of view, as there will be no solid evidence (in terms of rigor/relevance as well as large number of studies per each variable) from the SLR to facilitate decision making for practitioners looking to adopt new techniques.

RQ5: Which static code analysis technologies are most frequently used in industry?

The survey concluded that static code analysis tools are widely used in industry followed by informal reviews, on the other hand inspections and walkthroughs are rarely used. This was found to be the case in the vast majority of the industrial context we investigated. The survey also concluded that adoption of static code analysis techniques in industry is influenced by the software life cycle model used, companies using agile related approaches adopt static code analysis techniques with a different preference. On the other hand, the survey revealed that the software product type and company size doesn’t have an influence on the adoption of static code analysis techniques, companies with different sizes and companies producing different software products have the same preference for using static code analysis techniques. The survey also revealed that the amount of attention a static code analysis technique has received in research doesn’t necessarily influence its adoption in industry, we have seen that inspection is the techniques which is most thoroughly evaluated by researchers, yet it is rarely used in industry, on the other hand, informal reviews were rarely evaluated by researchers but very frequently used in industry, this indicates a gap between static code analysis techniques research and industrial practices, as the ideal situation would be that practitioners adopt an evidence based approach to select which technique to use, relying on high quality research findings.

RQ6: What are the benefits and limitations of static code analysis techniques as perceived by industrial professionals?

The survey concluded that when you need to look at the benefits and limitations of static code analysis techniques you need to look at 7 recurrent variables used frequently to report them, for each one of these variables the company size, product type and life cycle model do influence the perception on benefits/limitations. For example if we look at the effectiveness variable, companies with different sizes have different perceptions on the effectiveness of static code analysis techniques, the same goes with companies producing different software products and using different life cycle models, there is no common perception, in other words, the perception on benefits/limitations of different static code analysis techniques depends on the variables representing them as well as the company size, product type, and life cycle model used. Further, the benefits/limitations identified by the SLR could not complement the survey findings on benefits/limitation because the rigor and relevance of most of the studies reporting them is weak, as well as, the number of category A studies per variable is low.

182

9 FUTURE WORK

For future work we recommend researchers to conduct similar studies on technology evaluations in different fields within software engineering, the same way we did it for the static code analysis field in this study. The aim of researchers should be bridging the gap between research and industry in the selected field. First researchers should diagnose the state of research to see if it is conducted with a high degree of rigor and if it is industry oriented. Second, researchers should investigate the adoption of techniques, proposed by researchers in the selected field, in industry and identify if there is a gap between research and industry in selected field. If a gap found to exist, for example, and research is found to lack rigor and relevance researchers should find out the reason, then give recommendations for researchers on how to improve rigor and relevance in future research. If research is found to have high rigor and high relevance, researchers can use the evidence in research and present it to industry, for example, researchers can look at benefits and limitations.

To improve the state of research more empirical evaluations are needed. There is a very less number of studies evaluating informal reviews and walkthroughs. Informal reviews are very frequently use in industry thus we recommend research effort to be redirected toward evaluating them to substantiate their use in industry. Adequate evaluation of walkthrough could lead to their adoption in industry as well.

183

10 REFERENCES

[1] D. Kelly and T. Shepard, “Task-directed software inspection,” Journal of Systems and Software, vol. 73, no. 2, pp. 361–368, Oct. 2004.

[2] Z. Abdelnabi, G. Cantone, M. Ciolkowski, and D. Rombach, “Comparing code reading techniques applied to object-oriented software frameworks with regard to effectiveness and defect detection rate,” in 2004 International Symposium on Empirical Software Engineering, 2004. ISESE ’04. Proceedings, 2004, pp. 239–248.

[3] F. Wedyan, D. Alrmuny, and J. M. Bieman, “The Effectiveness of Automated Static Analysis Tools for Fault Detection and Refactoring Prediction,” in International Conference on Software Testing Verification and Validation, 2009. ICST ’09, 2009, pp. 141–150.

[4] M. Höst and C. Johansson, “Evaluation of code review methods through interviews and experimentation,” Journal of Systems and Software, vol. 52, no. 2–3, pp. 113–120, Jun. 2000.

[5] A. A. Porter, H. P. Siy, C. A. Toman, and L. G. Votta, “An Experiment to Assess the Cost-Benefits of Code Inspections in Large Scale Software Development,” IEEE Transactions on Software Engineering, vol. 23, no. 6, pp. 329–346, 1997.

[6] O. Laitenberger, K. El Emam, and T. G. Harbich, “An internally replicated quasi-experimental comparison of checklist and perspective based reading of code documents,” IEEE Transactions on Software Engineering, vol. 27, no. 5, pp. 387–421, 2001.

[7] J. W. Wilkerson, J. Nunamaker, J.F., and R. Mercer, “Comparing the Defect Reduction Benefits of Code Inspection and Test-Driven Development,” IEEE Transactions on Software Engineering, vol. 38, no. 3, pp. 547– 560, 2012.

[8] N. Nagappan and T. Ball, “Static Analysis Tools As Early Indicators of Pre-release Defect Density,” in Proceedings of the 27th International Conference on Software Engineering, New York, NY, USA, 2005, pp. 580– 586.

[9] A. Dunsmore, M. Roper, and M. Wood, “Object-oriented inspection in the face of delocalisation,” in Proceedings of the 2000 International Conference on Software Engineering, 2000, 2000, pp. 467–476.

[10] A. Dunsmore, M. Roper, and M. Wood, “The Development and Evaluation of Three Diverse Techniques for Object-Oriented Code Inspection,” IEEE Transactions of Software Engineering, vol. 29, no. 8, pp. 677–686, Aug. 2003.

[11] P. C. Rigby, D. M. German, and M.-A. Storey, “Open Source Software Peer Review Practices: A Case Study of the Apache Server,” in Proceedings of the 30th International Conference on Software Engineering, New York, NY, USA, 2008, pp. 541–550.

[12] C. B. Seaman and V. R. Basili, “An Empirical Study of Communication in Code Inspections,” in Proceedings of the 19th International Conference on Software Engineering, New York, NY, USA, 1997, pp. 96–106.

[13] S. Liu, Y. Chen, F. Nagoya, and J. A. McDermid, “Formal Specification-Based Inspection for Verification of Programs,” IEEE Transactions on Software Engineering, vol. 38, no. 5, pp. 1100–1122, 2012.

[14] J. H. Hayes, I. R. Chemannoor, and E. A. Holbrook, “Improved code defect detection with fault links,” Software Testing, Verification and Reliability, vol. 21, no. 4, pp. 299–325, 2011.

[15] D. E. Perry, A. Porter, M. W. Wade, L. G. Votta, and J. Perpich, “Reducing Inspection Interval in Large- scale Software Development,” IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 695–705, Jul. 2002.

[16] B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge, “Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?,” in Proceedings of the 2013 International Conference on Software Engineering, Piscataway, NJ, USA, 2013, pp. 672–681.

184

[17] S. S. So, S. D. Cha, T. J. Shimeall, and Y. R. Kwon, “An empirical evaluation of six methods to detect faults in software,” Software Testing, Verification and Reliability, vol. 12, no. 3, pp. 155–171, 2002.

[18] A. Dunsmore, M. Roper, and M. Wood, “Further investigations into the development and evaluation of reading techniques for object-oriented code inspection,” in Proceedings of the 24rd International Conference on Software Engineering, 2002. ICSE 2002, 2002, pp. 47–57.

[19] P. Jalote and M. Haragopal, “Overcoming the NAH syndrome for inspection deployment,” in Proceedings of the 1998 International Conference on Software Engineering, 1998, 1998, pp. 371–378.

[20] M. A. Wojcicki and P. Strooper, “Maximising the information gained from a study of static analysis technologies for concurrent software,” Empirical Software Engineering, vol. 12, no. 6, pp. 617–645, Dec. 2007.

[21] H. M. Kienle, J. Kraft, and T. Nolte, “System-specific static code analyses: a case study in the complex embedded systems domain,” Software Quality Journal, vol. 20, no. 2, pp. 337–367, Jun. 2012.

[22] S. Wagner, F. Deissenboeck, M. Aichner, J. Wimmer, and M. Schwalb, “An Evaluation of Two Bug Pattern Tools for Java,” in 2008 1st International Conference on Software Testing, Verification, and Validation, 2008, pp. 248–257.

[23] J. Zheng, L. Williams, N. Nagappan, W. Snipes, J. P. Hudepohl, and M. A. Vouk, “On the value of static analysis for fault detection in software,” IEEE Transactions on Software Engineering, vol. 32, no. 4, pp. 240– 253, 2006.

[24] B. Chimdyalwar, “Survey of Array out of Bound Access Checkers for C Code,” in Proceedings of the 5th India Software Engineering Conference, New York, NY, USA, 2012, pp. 45–48.

[25] F. MacDonald and J. Miller, “A Comparison of Tool-Based and Paper-Based Software Inspection,” Empirical Software Engineering, vol. 3, no. 3, pp. 233–253, Sep. 1998.

[26] A. A. Porter, H. P. Siy, and L. G. Votta,Jr., “Understanding the Effects of Developer Activities on Inspection Interval,” in Proceedings of the 19th International Conference on Software Engineering, New York, NY, USA, 1997, pp. 128–138.

[27] X. Li, “A Comparison-based Approach for Software Inspection,” in Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, Ontario, Canada, 1995, p. 41.

[28] P. G. Koneri, G.-J. de Vreede, D. L. Dean, A. L. Fruhling, and P. Wolcott, “The Design and Field Evaluation of a Repeatable Collaborative Software Code Inspection Process,” in Groupware: Design, Implementation, and Use, H. Fukś, S. Lukosch, and A. C. Salgado, Eds. Springer Berlin Heidelberg, 2005, pp. 325–340.

[29] M. Hirayama, K. Ohno, N. Kawai, K. Tamaru, and H. Monden, “An Effective Source Code Review Process for Embedded Software,” in Product-Focused Software Process Improvement, J. Münch and M. Vierimaa, Eds. Springer Berlin Heidelberg, 2006, pp. 47–60.

[30] F. Nagoya, Y. Chen, and S. Liu, “An Empirical Study on a Specification-Based Program Review Approach,” in International Conference on Dependability of Computer Systems, 2006. DepCos-RELCOMEX ’06, 2006, pp. 199–206.

[31] A. Vetro, M. Morisio, and M. Torchiano, “An empirical validation of FindBugs issues related to defects,” in 15th Annual Conference on Evaluation Assessment in Software Engineering (EASE 2011), 2011, pp. 144–153.

[32] O. Laitenberger and J.-M. DeBaud, “Perspective-based reading of code documents at Robert Bosch GmbH,” Information and Software Technology, vol. 39, no. 11, pp. 781–791, 1997.

[33] P. Runeson and C. Wohlin, “An Experimental Evaluation of an Experience-Based Capture-Recapture Method in Software Code Inspections,” Empirical Software Engineering, vol. 3, no. 4, pp. 381–406, Dec. 1998.

185

[34] A. Fehnker and R. Huuck, “Model checking driven static analysis for the real world: designing and tuning large scale bug detection,” Innovations in Systems and Software Engineering, vol. 9, no. 1, pp. 45–56, Mar. 2013.

[35] A. Harel and E. Kantorowitz, “Estimating the number of faults remaining in software code documents inspected with iterative code reviews,” in IEEE International Conference on Software - Science, Technology and Engineering, 2005. Proceedings, 2005, pp. 151–160.

[36] S. Wagner, J. Jürjens, C. Koller, and P. Trischberger, “Comparing Bug Finding Tools with Reviews and Tests,” in Testing of Communicating Systems, F. Khendek and R. Dssouli, Eds. Springer Berlin Heidelberg, 2005, pp. 40–55.

[37] T. G. Grbac, Z. Car, and D. Huljenić, “Quantifying value of adding inspection effort early in the development process: A case study,” IET Software, vol. 6, no. 3, pp. 249–259, 2012.

[38] N. Manzoor, H. Munir, and M. Moayyed, “Comparison of Static Analysis Tools for Finding Concurrency Bugs,” in 2012 IEEE 23rd International Symposium on Software Reliability Engineering Workshops (ISSREW), 2012, pp. 129–133.

[39] H. Siy and L. Votta, “Does the modern code inspection have value?,” in IEEE International Conference on Software Maintenance, 2001. Proceedings, 2001, pp. 281–289.

[40] F. Belli and R. Crisan, “Empirical performance analysis of computer-supported code-reviews,” in, The Eighth International Symposium on Software Reliability Engineering, 1997. Proceedings, 1997, pp. 245–255.

[41] A. De Lucia, F. Fasano, G. Scanniello, and G. Tortora, “Evaluating distributed inspection through controlled experiments,” IET Software, vol. 3, no. 5, pp. 381–394, 2009.

[42] D. Baca, B. Carlsson, and L. Lundberg, “Evaluating the Cost Reduction of Static Code Analysis for Software Security,” in Proceedings of the Third ACM SIGPLAN Workshop on Programming Languages and Analysis for Security, New York, NY, USA, 2008, pp. 79–88.

[43] J. C. Knight and E. A. Myers, “An Improved Inspection Technique,” Communications of the ACM, vol. 36, no. 11, pp. 51–61, Nov. 1993.

[44] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static Code Analysis to Detect Software Security Vulnerabilities - Does Experience Matter?,” in International Conference on Availability, Reliability and Security, 2009. ARES ’09, 2009, pp. 804–810.

[45] D. A. McMeekin, B. R. von Konsky, M. Robey, and D. J. A. Cooper, “The Significance of Participant Experience when Evaluating Software Inspection Techniques,” in Software Engineering Conference, 2009. ASWEC ’09. Australian, 2009, pp. 200–209.

[46] D. Kester, M. Mwebesa, and J. S. Bradbury, “How Good is Static Analysis at Finding Concurrency Bugs?,” in 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM), 2010, pp. 115–124.

[47] L. P. W. Land, C. Sauer, and R. Jeffery, “The Use of Procedural Roles in Code Inspections: An Experimental Study,” Empirical Software Engineering, vol. 5, no. 1, pp. 11–34, Mar. 2000.

[48] D. Baca, B. Carlsson, K. Petersen, and L. Lundberg, “Improving software security with static automated code analysis in an industry setting,” Software: Practice and Experience, vol. 43, no. 3, pp. 259–279, 2013.

[49] T. Stålhane and T. H. Awan, “Improving the Software Inspection Process,” in Software Process Improvement, I. Richardson, P. Abrahamsson, and R. Messnarz, Eds. Springer Berlin Heidelberg, 2005, pp. 163– 174.

[50] L. P. W. Land, C. Sauer, and R. Jeffery, “Validating the Defect Detection Performance Advantage of Group Designs for Software Reviews: Report of a Laboratory Experiment Using Program Code,” in Proceedings of the 6th European SOFTWARE ENGINEERING Conference Held Jointly with the 5th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, USA, 1997, pp. 294–309.

186

[51] S. Nelson and J. Schumann, “What makes a code review trustworthy?,” in Proceedings of the 37th Annual Hawaii International Conference on System Sciences, 2004, 2004, p. 10.

[52] G. W. Russell, “Experience with inspection in ultralarge-scale development,” IEEE Software, vol. 8, no. 1, pp. 25–31, 1991.

[53] P. Emanuelsson and U. Nilsson, “A Comparative Study of Industrial Static Analysis Tools,” Electronic Notes in Theoretical Computer Science, vol. 217, pp. 5–21, Jul. 2008.

[54] A. Austin, C. Holmgreen, and L. Williams, “A comparison of the efficiency and effectiveness of vulnerability discovery techniques,” Information and Software Technology, vol. 55, no. 7, pp. 1279–1288, Jul. 2013.

[55] K. El Emam and O. Laitenberger, “Evaluating capture-recapture models with two inspectors,” IEEE Transactions on Software Engineering, vol. 27, no. 9, pp. 851–864, Sep. 2001.

[56] A. Marchenko and P. Abrahamsson, “Predicting Software Defect Density: A Case Study on Automated Static Code Analysis,” in Agile Processes in Software Engineering and Extreme Programming, G. Concas, E. Damiani, M. Scotto, and G. Succi, Eds. Springer Berlin Heidelberg, 2007, pp. 137–140.

[57] T. Berling and T. Thelin, “An industrial case study of the verification and validation activities,” in Software Metrics Symposium, 2003. Proceedings. Ninth International, 2003, pp. 226–238.

[58] F. Lanubile and T. Mallardo, “Inspecting Automated Test Code: A Preliminary Study,” in Proceedings of the 8th International Conference on Agile Processes in Software Engineering and Extreme Programming, Berlin, Heidelberg, 2007, pp. 115–122.

[59] M. Nadeem, B. J. Williams, and E. B. Allen, “High False Positive Detection of Security Vulnerabilities: A Case Study,” in Proceedings of the 50th Annual Southeast Regional Conference, New York, NY, USA, 2012, pp. 359–360.

[60] O. Laitenberger, “Studying the effects of code inspection and structural testing on software quality,” in The Ninth International Symposium on Software Reliability Engineering, 1998. Proceedings, 1998, pp. 237–246.

[61] B. Hémeury, “Report on the VERA Experiment,” in Reliable Software Technologies — Ada-Europe’ 99, M. G. Harbour and J. A. de la Puente, Eds. Springer Berlin Heidelberg, 1999, pp. 103–113.

[62] G. F. Gattis and T. J. Cheatham, “Testing Object-oriented Software,” in Proceedings of the 33rd Annual on Southeast Regional Conference, New York, NY, USA, 1995, pp. 285–286.

[63] C. Denger and R. Kolb, “Testing and Inspecting Reusable Product Line Components: First Empirical Results,” in Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering, New York, NY, USA, 2006, pp. 184–193.

[64] T. Nakamura, L. Hochstein, and V. R. Basili, “Identifying Domain-specific Defect Classes Using Inspections and Change History,” in Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering, New York, NY, USA, 2006, pp. 346–355.

[65] M. V. Mantyla and C. Lassenius, “What Types of Defects Are Really Discovered in Code Reviews?,” IEEE Transactions on Software Engineering, vol. 35, no. 3, pp. 430–448, 2009.

[66] L. G. Votta Jr., “Does Every Inspection Need a Meeting?,” in Proceedings of the 1st ACM SIGSOFT Symposium on Foundations of Software Engineering, New York, NY, USA, 1993, pp. 107–114.

[67] B. Carlsson and D. Baca, “Software security analysis – execution phase audit,” in 31st EUROMICRO Conference on Software Engineering and Advanced Applications, 2005, 2005, pp. 240–247.

[68] L. C. Briand, B. Freimut, and F. Vollei, “Using multiple adaptive regression splines to support decision making in code inspections,” Journal of Systems and Software, vol. 73, no. 2, pp. 205–217, Oct. 2004.

187

[69] E. F. Weller, “Lessons from three years of inspection data (software development),” IEEE Software, vol. 10, no. 5, pp. 38–45, 1993.

[70] T. L. Rodgers and D. L. Dean, “Process maturity and inspector proficiency: feedback mechanisms for software inspections,” in Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences, 1999. HICSS-32, 1999, vol. Track3, p. 13.

[71] D. I. K. Sjoberg, B. Anda, E. Arisholm, T. Dyba, M. Jorgensen, A. Karahasanovic, E. F. Koren, and M. Vokac, “Conducting realistic experiments in software engineering,” in Empirical Software Engineering, 2002. Proceedings. 2002 International Symposium n, 2002, pp. 17–26.

[72] M. Ivarsson and T. Gorschek, “Technology transfer decision support in requirements engineering research: a systematic review of REj,” Requirements Eng, vol. 14, no. 3, pp. 155–175, Mar. 2009.

[73] M. V. Zelkowitz, D. R. Wallace, and D. W. Binkley, “Culture Conflicts in Software Engineering Technology Transfer,” in NASA Goddard Software Engineering Workshop, 1998.

[74] M. Ivarsson and T. Gorschek, “A method for evaluating rigor and industrial relevance of technology evaluations,” Empirical Software Engineering, vol. 16, no. 3, pp. 365–395, Oct. 2010.

[75] T. Gorschek, P. Garre, S. Larsson, and C. Wohlin, “A Model for Technology Transfer in Practice,” IEEE Software, vol. 23, no. 6, pp. 88–95, 2006.

[76] M. Zelkowitz, “An update to experimental models for validating computer technology,” Journal of Systems and Software, vol. 82, no. 3, pp. 373–376, 2009.

[77] D. I. K. Sjoeberg, J. E. Hannay, O. Hansen, V. B. Kampenes, A. Karahasanovic, N.-K. Liborg, and A. C. Rekdal, “A survey of controlled experiments in software engineering,” IEEE Transactions on Software Engineering, vol. 31, no. 9, pp. 733–753, Sep. 2005.

[78] A. Höfer and W. F. Tichy, “Status of Empirical Research in Software Engineering,” in Empirical Software Engineering Issues. Critical Assessment and Future Directions, V. R. Basili, D. Rombach, K. Schneider, B. Kitchenham, D. Pfahl, and R. W. Selby, Eds. Springer Berlin Heidelberg, 2007, pp. 10–19.

[79] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén, Experimentation in Software Engineering, 2012 edition. New York: Springer, 2012.

[80] S. Mahdavi-Hezavehi, M. Galster, and P. Avgeriou, “Variability in quality attributes of service-based software systems: A systematic literature review,” Information and Software Technology, vol. 55, no. 2, pp. 320– 343, Feb. 2013.

[81] S. Schneider, R. Torkar, and T. Gorschek, “Solutions in global software engineering: A systematic literature review,” International Journal of Information Management, vol. 33, no. 1, pp. 119–132, Feb. 2013.

[82] F. Elberzhager, A. Rosbach, J. Münch, and R. Eschbach, “Reducing test effort: A systematic mapping study on existing approaches,” Information and Software Technology, vol. 54, no. 10, pp. 1092–1106, Oct. 2012.

[83] D. Graham, E. V. Veenendaal, I. Evans, and R. Black, Foundations of Software Testing: ISTQB Certification, Revised edition. Australia: Cengage Learning EMEA, 2008.

[84] O. Laitenberger and J.-M. DeBaud, “Perspective-based reading of code documents at Robert Bosch GmbH,” Information and Software Technology, vol. 39, no. 11, pp. 781–791, 1997.

[85] S. Heckman and L. Williams, “A systematic literature review of actionable alert identification techniques for automated static code analysis,” Information and Software Technology, vol. 53, no. 4, pp. 363–387, Apr. 2011.

[86] J. W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition, 4th edition. Thousand Oaks: SAGE Publications, Inc, 2013.

188

[87] B. Kitchenham and S. Charters, “Guidelines for performing Systematic Literature Reviews in Software Engineering,” Keele University and Durham University Joint Report, UK, EBSE 2007-001, 2007.

[88] S. Jalali and C. Wohlin, “Systematic literature studies: Database searches vs. backward snowballing,” in 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2012, pp. 29–38.

[89] D. S. Cruzes and T. Dybå, “Synthesizing Evidence in Software Engineering Research,” in Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, New York, NY, USA, 2010, pp. 1:1–1:10.

[90] K. Petersen and N. B. Ali, “Identifying Strategies for Study Selection in Systematic Reviews and Maps,” in 2011 International Symposium on Empirical Software Engineering and Measurement (ESEM), 2011, pp. 351– 354.

[91] T. C. Lethbridge, S. E. Sim, and J. Singer, “Studying Software Engineers: Data Collection Techniques for Software Field Studies,” Empirical Software Engineering, vol. 10, no. 3, pp. 311–341, Jul. 2005.

[92] B. A. Kitchenham and S. L. Pfleeger, “Principles of Survey Research Part 2: Designing a Survey,” SIGSOFT Software Engineering Notes, vol. 27, no. 1, pp. 18–20, Jan. 2002.

[93] H. Zhang and M. A. Babar, “An Empirical Investigation of Systematic Reviews in Software Engineering,” in 2011 International Symposium on Empirical Software Engineering and Measurement (ESEM), 2011, pp. 87– 96.

[94] A. Aurum, H. Petersson, and C. Wohlin, “State-of-the-art: software inspections after 25 years,” Software Testing, Verification and Reliability, vol. 12, no. 3, pp. 133–154, 2002.

[95] M. Ciolkowski, “What Do We Know About Perspective-based Reading? An Approach for Quantitative Aggregation in Software Engineering,” in Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, Washington, DC, USA, 2009, pp. 133–144.

[96] V. D’silva, D. Kroening, and G. Weissenbacher, “A Survey of Automated Techniques for Formal Software Verification,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 7, pp. 1165–1178, 2008.

[97] O. Laitenberger, S. Vegas, and M. Ciolkowski, "The State of the Practice of Review and Inspection Technologies in Germany," Technical Report ViSEK/010/E, ViSEK, 2002.

[98] H. Zhang, M. A. Babar, and P. Tell, “Identifying relevant studies in software engineering,” Information and Software Technology, vol. 53, no. 6, pp. 625–637, Jun. 2011.

[99] J. E. Hannay, T. Dybå, E. Arisholm, and D. I. K. Sjøberg, “The effectiveness of pair programming: A meta- analysis,” Information and Software Technology, vol. 51, no. 7, pp. 1110–1122, Jul. 2009.

[100] T. Dyba, T. Dingsyr, and G. K. Hanssen, “Applying systematic reviews to diverse study types: An experience report,” in 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007, September 20, 2007 – September 21, 2007, Madrid, Spain, 2007, pp. 225–234.

[101] J. Cohen, “Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit,” Psychological Bulletin, vol. 70, no. 4, pp. 213–220, Oct. 1968.

[102] V. Braun and V. Clarke, “Using thematic analysis in psychology,” Qualitative Research in Psychology, vol. 3, no. 2, pp. 77–101, Jan. 2006.

[103] M. Rodgers, A. Sowden, M. Petticrew, L. Arai, H. Roberts, N. Britten, and J. Popay, “Testing Methodological Guidance on the Conduct of Narrative Synthesis in Systematic Reviews Effectiveness of Interventions to Promote Smoke Alarm Ownership and Function,” Evaluation, vol. 15, no. 1, pp. 49–73, Jan. 2009.

189

[104] K. Petersen and C. Wohlin, “Context in industrial software engineering research,” in 3rd International Symposium on Empirical Software Engineering and Measurement, 2009. ESEM 2009, 2009, pp. 401–404.

[105] B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. El Emam, and J. Rosenberg, “Preliminary guidelines for empirical research in software engineering,” IEEE Transactions on Software Engineering, vol. 28, no. 8, pp. 721–734, 2002.

[106] K. Petersen and C. Gencel, “Worldviews, Research Methods, and their Relationship to Validity in Empirical Software Engineering Research,” in 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 2013 Eighth International Conference on Software Process and Product Measurement (IWSM-MENSURA), 2013, pp. 81–89.

[107] C. Wohlin, M. Höst, and K. Henningsson, “Empirical Research Methods in Software Engineering,” in Empirical Methods and Studies in Software Engineering, R. Conradi and A. I. Wang, Eds. Springer Berlin Heidelberg, 2003, pp. 7–23.

[108] J. Fox, C. Murray, and A. Warm, “Conducting research using web-based questionnaires: Practical, methodological, and ethical considerations,” International Journal of Social Research Methodology, vol. 6, no. 2, pp. 167–180, Jan. 2003.

[109] J. H. Gray, I. L. Densten, and J. C. Sarros, “Size Matters: Organisational Culture in Small, Medium, and Large Australian Organisations,” Journal of Small Business & Entrepreneurship, vol. 17, no. 1, pp. 31–46, Sep. 2003.

[110] A. Forward and T. C. Lethbridge, “A Taxonomy of Software Types to Facilitate Search and Evidence-based Software Engineering,” in Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, New York, NY, USA, 2008, pp. 14:179–14:191.

[111] H. Saiedian and R. Dale, “Requirements engineering: making the connection between the software developer and customer,” Information and Software Technology, vol. 42, no. 6, pp. 419–428, Apr. 2000.

[112] M. E. Fagan, “Design and Code Inspections to Reduce Errors in Program Development,” IBM Systems Journal, vol. 15, no. 3, pp. 182–211, Sep. 1976.

[113] M. Ciolkowski, O. Laitenberger, and S. Biffl, “Software Reviews: The State of the Practice,” IEEE Software, vol. 20, no. 6, pp. 46–51, Nov. 2003.

[114] M. Ciolkowski, F. Shull, and S. Biffl, “A Family of Experiments to Investigate the Influence of Context on the Effect of Inspection Techniques”, Empirical Assessment of Software Engineering (EASE), Keele, UK, 2002.

[115] T. Punter, M. Ciolkowski, B. Freimut, and I. John, “Conducting on-line surveys in software engineering,” in 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings, 2003, pp. 80–88.

[116] A. Field, Discovering Statistics Using IBM SPSS Statistics, 4th edition. Los Angeles: SAGE Publications Ltd, 2013.

[117] “IEEE Standard for Software Reviews and Audits,” IEEE STD 1028-2008, pp. 1–52, Aug. 2008.

190

11 APPENDIX

The appendices could not be put together with this report due to the large size of the report. They are available online, and can be found online at the following link: https://drive.google.com/file/d/0BxCMqDKJt_1UVVpUX2hxYXVxXzA/view?usp=sharing

191