SATE V Report: Ten Years of Static Analysis Tool Expositions
Total Page:16
File Type:pdf, Size:1020Kb
NIST Special Publication 500-326 SATE V Report: Ten Years of Static Analysis Tool Expositions Aurelien Delaitre Bertrand Stivalet Paul E. Black Vadim Okun Athos Ribeiro Terry S. Cohen This publication is available free of charge from: https://doi.org/10.6028/NIST.SP.500-326 NIST Special Publication 500-326 SATE V Report: Ten Years of Static Analysis Tool Expositions Aurelien Delaitre Prometheus Computing LLC Bertrand Stivalet Paul E. Black Vadim Okun Athos Ribeiro Terry S. Cohen Information Technology Laboratory Software and Systems Division This publication is available free of charge from: https://doi.org/10.6028/NIST.SP.500-326 October 2018 U.S. Department of Commerce Wilbur L. Ross, Jr., Secretary National Institute of Standards and Technology Walter Copan, NIST Director and Under Secretary of Commerce for Standards and Technology Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose. National Institute of Standards and Technology Special Publication 500-326 Natl. Inst. Stand. Technol. Spec. Publ. 500-326, 180 pages (October 2018) CODEN: NSPUE2 This publication is available free of charge from: https://doi.org/10.6028/NIST.SP.500-326 Abstract Software assurance has been the focus of the National Institute of Standards and Technology (NIST) Software Assurance Metrics and Tool Evaluation (SAMATE) team for many years. The Static Analysis Tool Exposition (SATE) is one of the team’s prominent projects to advance research in and adoption of static analysis, one of several software assurance methods. This report describes our approach and methodology. It then presents and discusses the results collected from the fifth edition of SATE. This publication is available free of charge from: charge of free is available This publication Overall, the goal of SATE was not to rank static analysis tools, but rather to propose a methodology to assess tool effectiveness. Others can use this methodology to determine which tools fit their requirements. The results in this report are presented as examples and used as a basis for further discussion. Our methodology relies on metrics, such as recall and precision, to determine tool effectiveness. To calculate these metrics, we designed test cases that exhibit certain characteristics. Most of the test cases were large pieces of software with cybersecurity implications. Fourteen participants ran their tools on these test cases and sent us a report of their findings. We analyzed these reports and calculated the metrics to assess the tools’ effectiveness. Although a few results remained inconclusive, many key elements could be inferred based on our methodology, test cases, and analysis. In particular, we were able to estimate the propensity of tools to find critical vulnerabilities in real software, the degree of noise they produced, and the type of weaknesses they were able to find. Some shortcomings in the https://doi.org methodology and test cases were also identified and solutions proposed for the next edition of SATE. Key words /10.6028/NIST.SP. Security Weaknesses; Software Assurance; Static Analysis Tools; Vulnerability. 500 - 326 i Caution on Interpreting and Using the SATE Data SATE V, as well as its predecessors, taught us many valuable lessons. Most importantly, our analysis should NOT be used as a basis for rating or choosing tools; this was never the goal. There is no single metric or set of metrics that is considered by the research community to indicate or quantify all aspects of tool performance. We caution readers not to apply unjustified metrics based on the SATE data. Due to the nature and variety of security weaknesses, defining clear and comprehensive This publication is available free of charge from: charge of free is available This publication analysis criteria is difficult. While the analysis criteria have been much improved since the first SATE, further refinements are necessary. The test data and analysis procedure employed have limitations and might not indicate how these tools perform in practice. The results may not generalize to other software because the choice of test cases, as well as the size of test cases, can greatly influence tool performance. Also, we analyzed only a small subset of tool warnings. The procedure that we used for finding CVE locations in the CVE-selected test cases and selecting related tool warnings has limitations, so the results may not indicate tools’ actual abilities to find important security weaknesses. Synthetic test cases are much smaller and less complex than production software. Weaknesses may not occur with the same frequency in production software. Additionally, for every synthetic test case with a weakness, there is one test case without a weakness, whereas, in practice, sites with weaknesses appear much less frequently than sites without weaknesses. https://doi.org Due to these limitations, tool results, including false positive rates, on synthetic test cases may differ from results on production software. The tools were used differently in this exposition from their typical use. We analyzed tool warnings for correctness and looked for related warnings from other tools. Developers, on /10.6028/NIST.SP. the other hand, use tools to determine what changes need to be made to software. Auditors look for evidence of assurance. Also, in practice, users write special rules, suppress false positives, and write code in certain ways to minimize tool warnings. We did not consider the tools’ user interfaces, integration with the development environment, and many other aspects of the tools, which are important for a user to efficiently and 500 correctly understand a weakness report. - 326 Teams ran their tools against the test sets in June through September 2013. The tools continue to progress rapidly, so some observations from the SATE data may already be out of date. Because of the stated limitations, SATE should not be interpreted as a tool testing exercise. The results should not be used to make conclusions regarding which tools are best for a given application or the general benefit of using static analysis tools. ii Table of Contents 1. Introduction ..................................................................................................................... 1 1.1. Goals ..................................................................................................................................... 1 1.2. Scope ..................................................................................................................................... 2 1.3. Target Audience .................................................................................................................. 2 1.4. Terminology ......................................................................................................................... 2 1.5. Metrics ................................................................................................................................. 3 1.6. Types of Test Cases ............................................................................................................. 4 1.7. Related Work ...................................................................................................................... 6 This publication is available free of charge from: charge of free is available This publication 1.8. Evolution of SATE .............................................................................................................. 8 2. Overall Procedure ......................................................................................................... 10 2.1. Changes Since SATE IV ................................................................................................... 10 2.1.1. Confidentiality ............................................................................................................ 10 2.1.2. Environment ............................................................................................................... 10 2.1.3. Fairness ....................................................................................................................... 11 2.1.4. Soundness ................................................................................................................... 11 2.2. Steps / Organization .......................................................................................................... 11 2.3. Participation ...................................................................................................................... 12 2.4. Data Anonymization ......................................................................................................... 14 3. Procedure and Results for Classic Tracks .................................................................. 14 3.1. Production Software ......................................................................................................... 14 3.1.1. Test Sets ..................................................................................................................... 15 3.1.2. Procedure .................................................................................................................... 16 3.1.3. Results