Experience Report: an Empirical Study of PHP Security Mechanism Usage
Total Page:16
File Type:pdf, Size:1020Kb
Experience Report: An Empirical Study of PHP Security Mechanism Usage Johannes Dahse and Thorsten Holz Horst Görtz Institute for IT-Security (HGI) Ruhr-University Bochum, Germany {firstname.lastname}@rub.de ABSTRACT 1. INTRODUCTION The World Wide Web mainly consists of web applications Empirical studies of security vulnerabilities found in the written in weakly typed scripting languages, with PHP being last years [3, 7, 28, 31] indicate that web-related vulnera- the most popular language in practice. Empirical evidence bilities such as cross-site scripting (XSS) and SQL injection based on the analysis of vulnerabilities suggests that security (SQLi) are among the most common software flaws. In fact, is often added as an ad-hoc solution, rather than planning such vulnerabilities are more frequently detected in the re- a web application with security in mind during the design cent years than memory corruption vulnerabilities. This ob- phase. Although some best-practice guidelines emerged, no servation suggests that security is only added as an ad-hoc comprehensive security standards are available for develop- solution in web applications, rather than planning such ap- ers. Thus, developers often apply their own favorite security plications with security in mind during the design phase. mechanisms for data sanitization or validation to prohibit Although some best-practice guidelines on secure web pro- malicious input to a web application. gramming emerged (e.g., recommendations by OWASP [17]), In the context of our development of a new static code no comprehensive security standards are available for devel- analysis tool for vulnerability detection, we studied com- opers. This leads to the observation that each developer monly used input sanitization or validation mechanisms in applies his own favorite security mechanisms for data sani- 25 popular PHP applications. Our analysis of 2.5 million tization or validation. As a result, many programming pat- lines of code and over 26 thousand secured data flows pro- terns emerged for input sanitization (e.g., type casting, data vides a comprehensive overview of how developers utilize encoding, converting data to HTML entities, or prepared security mechanisms in practice regarding different markup statements) and input validation (e.g., type validation, for- contexts. In this paper, we discuss these security mecha- mat validation, or whitelisting). Each of these patterns has nisms in detail and reveal common pitfalls. For example, its own advantages and drawbacks, and programming mis- we found certain markup contexts and security mechanisms takes due to common pitfalls can still lead to vulnerabilities. more frequently vulnerable than others. Our empirical study As a side effect, this variety in security mechanisms leads to helps researchers, web developers, and tool developers to many false reports in static code analysis tools, which have focus on error-prone markup contexts and security mecha- to detect and analyze these security mechanisms precisely. nisms in order to detect and mitigate vulnerabilities. PHP is the most popular server-side scripting language on the Web, running on 82% of all websites [30]. It is a weakly Categories and Subject Descriptors typed language, meaning that a memory location can hold a value of every possible data type. In contrast to a strongly D.2.0 [Software Engineering]: General|protection mech- typed language, the weak typing in PHP implies that more anisms; D.2.8 [Software Engineering]: Metrics|complex- security checks have to be implemented by the developer to ity measures, product metrics ensure a safe usage in scenarios where only a specific data type is allowed. Furthermore, weak typing introduces oddi- General Terms ties when comparing data of different data types, leading to even more (potential) bugs or insecurities [18]. Security, Measurement Based on our experience in the development of a static code analysis tool for finding security vulnerabilities in mod- Keywords ern PHP applications [4{6], we collected typical program- Static analysis, input sanitization, input validation, PHP ming patterns used by developers to sanitize and validate data. In this paper, we provide a comprehensive empiri- Permission to make digital or hard copies of all or part of this work for cal analysis of the 20 different data sanitization and valida- personal or classroom use is granted without fee provided that copies are not tion mechanisms we found being used in web applications. made or distributed for profit or commercial advantage and that copies bear Furthermore, we also study common implementation pitfalls this notice and the full citation on the first page. Copyrights for components that lead to vulnerabilities in practice. We extended our pro- of this work owned by others than ACM must be honored. Abstracting with totype in order to enumerate the security mechanisms used credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request in PHP applications and applied it to 25 of the most pop- permissions from [email protected]. ular web applications. We analyzed more than 2.5 million ISSTA’15 , July 12–17, 2015, Baltimore, MD, USA lines of code and present an analysis of the detected security Copyright 2015 ACM 978-1-4503-3620-8/15/07 ...$15.00. patterns in more than 26 thousand data flows. Our analysis of the data stay intact, while only certain characters are re- helps to answer the following essential questions for PHP moved or replaced. This way, the application can proceed developers, code auditors, and static analysis engineers to with the sanitized data without a request for resubmission. refine the focus during vulnerability detection: In the following, we present several ways to sanitize data generically against all type of injection flaws, such as cross- • Q1: Which security mechanisms are available? site scripting [13] and SQL injection [8]. • Q2: Which pitfalls might these mechanisms imply? 2.1.1 Explicit Type Casting • Q3: Which security mechanisms are used how often in modern (web) applications? Numeric characters can be safely used in security sensi- tive string operations. To ensure only numeric characters, a • Q4: Which security mechanism is used to prevent string can be explicitly typecasted to a number by using the which vulnerability type in which markup context? typecast operator or built-in functions. • Q5: Which pitfalls occur in practice? 1 $var=(int)$var; // safe We believe that our work is of great value for designers of 2 $var= intval($var);// safe static code analysis tools, especially since the basic insights 3 settype($var, 'int');// safe can also be applied to programs implemented in other lan- Listing 1: Examples for explicit type casting. guages. Developers can understand how to use the correct sanitization and validation mechanism in which situation. All three operations in Listing 1 ensure a secure use of the variable regarding injection flaws. PHP uses duck typ- In summary, we make the following three contributions: $var ing to determine the integer value of a string. An empty and • We survey common mechanisms to sanitize and vali- non-empty string is typecasted to the number 0. However, date data in PHP applications and highlight the pit- if the string starts with a number (\123abc"), the number falls that can occur to answer questions Q1 and Q2. is used as result of the typecast (123 ). We will introduce pitfalls associated with duck typing later on. • We present our approach to enumerate these mecha- nisms via static code analysis techniques. 2.1.2 Implicit Type Casting • We evaluate our approach with 25 popular real-world Similar to an explicit typecast, an implicit type cast auto- applications. To the best of our knowledge, this is the matically occurs if data is used in mathematical operations. largest study on the usage of security mechanisms in Listing 2 shows an addition in line 1 in which $var is safely modern PHP applications. Based on this analysis, we typecasted to integer before a number is added. can answer research questions Q3 { Q5. 1 $var= $var + 1;// safe 2. TYPES OF SECURITY MECHANISMS 2 $var= $var++;// unsafe Web applications receive remotely supplied (and poten- Listing 2: Examples for implicit type casting. tially malicious) user data via HTTP request parameters. In contrast, the increment operator in line 2 performs no In PHP applications, user input is always received as data typecast and also works on strings. For example, the last of type string. If this input is processed in a security sen- character in the string aaa will be incremented to aab. Thus, sitive operation of the web application, a vulnerability can $var can still contain malicious characters. occur. For example, a dynamically built SQL query with embedded user input may lead to a SQL injection vulnera- 2.1.3 Formatting bility [8]. In order to prevent such a critical vulnerability, the Type casting is also performed by PHP's built-in format user input has to be sanitized or validated beforehand. For string functions. Different specifiers (beginning with a per- this purpose, a security mechanism is applied between the centage sign) can be used in a format string that determine user input (source) and the sensitive operation (sink) such the data type of the data they will be replaced with. An ex- that malicious data cannot reach the sensitive operation. ample is given in Listing 3 that uses the identifier %s (string) To study the variety of security mechanisms used by devel- and %d (numeric). opers in practice (Q1), we collected common types of mech- anisms we experienced in real-world applications during the 1 $var= sprintf("%s%d", $var1, $var2);// unsafe/ safe development of our static analysis tool [4{6]. Although the Listing 3: Sanitization with a format string function. possible ways of implementation are endless, our list cov- ers all different mechanisms we encountered.