Using the Juliet Test Suite to Compare Static Security Scanners

Using the Juliet Test Suite to Compare Static Security Scanners Andreas Wagner1and Johannes Sametinger2 1GAM Project, IT Solutions, Schwertberg, Austria 2Dept. of Information Systems – Software Engineering, Johannes Kepler University Linz, Linz, Austria Keywords: Juliet Test Suite, Security Scanner, Scanner Comparison, Static Analysis. Abstract: Security issues arise permanently in different software products. Making software secure is a challenging endeavour. Static analysis of the source code can help eliminate various security bugs. The better a scanner is, the more bugs can be found and eliminated. The quality of security scanners can be determined by letting them scan code with known vulnerabilities. Thus, it is easy to see how much they have (not) found. We have used the Juliet Test Suite to test various scanners. This test suite contains test cases with a set of security bugs that should be found by security scanners. We have automated the process of scanning the test suite and of comparing the generated results. With one exception, we have only used freely available source code scanners. These scanners were not primarily targeted at security, yielding disappointing results at first sight. We will report on the findings, on the barriers for automatic scanning and comparing, as well as on the de- tailed results. 1 INTRODUCTION The paper is structured as follows: Section 2 gives an overview of the Juliet Test Suite. In Section Software is ubiquitous these days. We constantly get 3, we introduce security scanners. The process in touch with software in different situations. For model for the analysis and comparison is shown in example, we use software for online banking; we use Section 4. Section 5 contains the results of our study. smartphones, we drive cars, etc. Security of software Related work is discussed in Section 6. is crucial in many, if not most situations. But news about security problems of software systems contin- ues to appear in the media. So the question arises: 2 JULIET TEST SUITE how can security errors be avoided or at least mini- mized? Security is complex and difficult to achieve. The Juliet Test Suite was developed by the Center It is commonly agreed on that security has to be for Assured Software (CAS) of the US American designed into software from the very start. Develop- National Security Agency (NSA) (Center for As- ers can follow Microsoft’s secure software life-cycle sured Software 2011). Its test cases have been creat- (Howard, Lippner 2006) or adhere to the security ed in order to test scanners or other software. There touch points (McGraw 2009). Source code reviews are two parts of the test suite. One part covers secu- depict an important piece of the puzzle in the direc- rity errors for the programming languages C and tion of secure software. Source code scanners pro- C++. The other one covers security errors for the vide a means to automatically review source code language Java. Code examples with security vulner- and to detect problems in the code. These scanners abilities are given in simple form as well as embed- typically have built-in, but mostly extensible sets of ded in variations of different control flow- and data- errors to look for in the code. The better the scanner flow patterns. The suite contains around 57,000 test and its rule set, the better the results of its scan. We cases in C/C++ and around 24,000 test cases in Java have used a test suite that contains source code with (Boland, Black 2012). A test suite can only cover a security weaknesses to make a point about the quali- subset of possible errors. The Juliet Test Suite co- ty of such scanners. We have analyzed several scan- vers the top 25 security errors defined by ners and compared their results with each other. SANS/MITRE (MITRE 2011). MITRE is a non- profit organization operating research and develop 244 Wagner A. and Sametinger J.. Using the Juliet Test Suite to Compare Static Security Scanners. DOI: 10.5220/0005032902440252 In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 244-252 ISBN: 978-989-758-045-1 Copyright c 2014 SCITEPRESS (Science and Technology Publications, Lda.) UsingtheJulietTestSuitetoCompareStaticSecurityScanners Table 1: Top 10 Security Errors (MITRE 2011). there are also cases, where the error is contained in some helper functions called “Bad-Helper”. Addi- No Score ID Name tionally, “Class-based” errors arise from class inher- Improper Neutralization of CWE- itance. Besides bad functions, there are also good 1 93.8 Elements used in an SQL 89 functions and good helper functions. These functions Command (SQL Injection) contain nearly the same logic as the bad functions Improper Neutralization of but without the security errors. The good functions CWE- Special Elements used in an 2 83.3 can be used to prove the quality of security scanners. 78 OS Command They should find the errors in the bad functions and (OS Command Injection) its helpers but not in the good functions (National Buffer Copy without CWE- Institute of Standards and Technology 2012). In 3 79.0 Checking Size of Input 120 version 1.1, the Juliet Test Suite covers 181 different (Classic Buffer Overflow) kinds of flaws, including authentication and access Improper Neutralization of control, buffer handling, code quality, control-flow CWE- Input During Web Page 4 77.7 79 Generation management, encryption and randomness, error (Cross-site Scripting) handling, file handling, information leaks, initializa- CWE- Missing Authentication for tion and shutdown, injection, and pointer and refer- 5 76.9 306 Critical Function ence handling (Boland, Black 2012). CWE- 6 76.8 Missing Authorization 862 2.2 Natural Code vs. Artificial Code CWE- Use of Hardcoded Creden- 7 75.0 798 tials We can distinguish two types of source code, i.e., CWE- Missing Encryption of Sen- artificial code and natural code. Natural code is used 8 75.0 311 sitive Data in real software like, for example, the Apache Web- CWE- Unrestricted Upload of File server or Microsoft Word. Artificial code has been 9 74.0 434 with Dangerous Type generated for some specific purpose, for example, to CWE- Reliance on Untrusted In- test security scanners. The Juliet Test Suite contains 10 73.8 807 puts in a Security Decision only artificial code, because such code simplifies the evaluation and the comparison of security scanners. ment centers funded by the US government. The In order to determine whether an error reported by a SANS Institute is a cooperative research and educa- security scanner is correct, it is necessary to know tion organization and is a trusted source for comput- where exactly there are any security bugs in the er security training, certification and research source code. This is a difficult task for natural code, (http://www.sans.org/). CWE is a community- because the complete source code would have to be developed dictionary for software weakness types subject of close scrutiny. For artificial code this is a (http://cwe.mitre.org/). These types have been used much easier task, because the code had been gener- for the classification of the security errors in the ated and documented with specific errors in mind Juliet test Suite. Each CWE entry describes a class anyway. of security errors. For example, CWE-89 describes Any security scanner may not find specific secu- “Improper Neutralization of Special Elements used rity errors in natural code. A manual code review is in an SQL Command (SQL Injection)”. This hap- necessary to find such problems, provided the avail- pens to be the top 1 security error according to ability of personnel with sufficient security SANS/MITRE. Table 1 shows the first 10 of the top knowledge. Otherwise, the existence of these securi- 25 security errors by SANS/MITRE (MITRE 2011). ty errors in the code may remain unknown. In con- trast, the number of errors in artificial code is 2.1 Structure of the Test Suite known. Only if the errors in the test suite are known, we can check whether scanners find all of these. But The Juliet Test Suite contains source code files that even for artificial code, it is hard to determine the are structured in different folders. Each folder covers exact source code line of a specific security error. one CWE entry. Therefore, there are several source Different scanners typically report these errors at code files in every folder that contain a collection of different source code lines. Thus, authors of artificial errors for the specific CWE entry. Every source code code have to pay close attention in order to define file targets one error. In most cases, this error is the exact locations of any errors they include in their located in a function called “Bad-Function”. But code. Control flow and data flow can appear in 245 SECRYPT2014-InternationalConferenceonSecurityandCryptography many different combinations. Natural code does not not capable of finding all security problems that may contain all of these combinations. With artificial exist in the software they are scanning. code, security scanners can be tested whether they find all these combinations (National Institute of 3.2 Security Problems Standards and Technology 2012). Artificial code has advantages, but it also has its limitation. Artificial Software security problems have to be uniquely test cases are typically simpler than what can be identified and classified. For example, when differ- found in natural code. In fact, test cases in the Juliet ent scanners use their own classification scheme, Test Suite are much simpler than natural code. comparison is difficult for customers who use more Therefore, security scanners may find something in than one tool from different vendors. The databases these test cases but fail at real programs that are CVE and CWE have been created for that purpose.

Load more