SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills
Total Page:16
File Type:pdf, Size:1020Kb
2017 IEEE/ACM 39th International Conference on Software Engineering SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills Zhengzi Xu∗, Bihuan Chen∗‡, Mahinthan Chandramohan∗, Yang Liu∗ and Fu Song† ∗School of Computer Science and Engineering, Nanyang Technological University, Singapore †School of Information Science and Technology, ShanghaiTech University, China ‡Corresponding Author Abstract—Software vulnerability is one of the major threats to analysis has been proposed to discover n-day vulnerabilities, software security. Once discovered, vulnerabilities are often fixed whose patches have been released but not deployed to every by applying security patches. In that sense, security patches carry instance of the software in the world. Since patch analysis valuable information about vulnerabilities, which could be used to discover, understand and fix (similar) vulnerabilities. However, is relatively accurate to find vulnerabilities, it has gained the most existing patch analysis approaches work at the source code popularity in both industry and academia. Also, security patches level, while binary-level patch analysis often heavily relies on a lot are good entry points to understanding the program weaknesses of human efforts and expertise. Even worse, some vulnerabilities and how the vulnerabilities work inside the program, especially may be secretly patched without applying CVE numbers, or only for security participants who do not have access to the source the patched binary programs are available while the patches are not publicly released. These practices greatly hinder patch anal- code. Moreover, the underlying information of patches has ysis and vulnerability analysis. been used to build automatic bug-fixing tools [8, 9, 10], In this paper, we propose a scalable binary-level patch analysis generating valid patches for similar vulnerabilities. However, framework, named SPAIN, which can automatically identify se- up till today, most existing researches on patch analysis work curity patches and summarize patch patterns and their corre- at the source code level [11, 12], but very few works have sponding vulnerability patterns. Specifically, given the original and patched versions of a binary program, we locate the patched been done to tackle this problem at the binary-level. functions and identify the changed traces (i.e., a sequence of basic Binary-level patch analysis can only be performed on ma- blocks) that may contain security or non-security patches. Then chine instructions for closed source programs without symbol we identify security patches through a semantic analysis of these tables, which often requires a significant amount of human traces and summarize the patterns through a taint analysis on efforts and expertise to understand the semantics of instructions. the patched functions. The summarized patterns can be used to search similar patches or vulnerabilities in binary programs. Due to this complex nature, the existing techniques [13, 14] of- Our experimental results on several real-world projects have ten heavily rely on manual or heavy program analysis, which be- shown that: i) SPAIN identified security patches with high accu- comes infeasible for real-world programs. racy and high scalability; ii) SPAIN summarized 5 patch patterns On the other hand, software companies may tend to patch the and their corresponding vulnerability patterns for 5 vulnerability vulnerabilities they find themselves in a secret way instead of types; and iii) SPAIN discovered security patches that were not documented, and discovered 3 zero-day vulnerabilities. making them public and applying Common Vulnerabilities and Exposures (CVE) numbers due to their security regulations or policies. As a result, security analysts cannot know the existence I. INTRODUCTION of particular vulnerabilities, which hinders the understanding Program vulnerability is one of the major threats to software and analysis of vulnerabilities. Even worse, in some cases, security. However, it is almost impossible to avoid vulnerabili- only the patched binary programs are available; while the ties at the development stage; and it is even difficult to discover patches themselves may not be publicly released, which hinders vulnerabilities at the production stage. Security experts usually the existing patch analysis techniques that often rely on the leverage dynamic fuzzing (e.g., [1, 2]), symbolic execution (e.g., availability of patches. Moreover, due to the application of patch [3, 4]) or static code auditing (e.g., [5, 6]) to find vulnerabilities. obfuscation and patch modification techniques such as honey- However, none of these techniques can provide a complete patching [15], the patch patterns in open source binaries may solution to win the war against vulnerabilities. Dynamic fuzzing differ greatly from close source ones. suffers from the code coverage problem and the initial seeds To address these problems, we propose a scalable binary- problem [7]. Symbolic execution cannot scale well to real- level patch analysis framework, named SPAIN, to automatically world programs, due to path explosion and constraint solving identify security patches and summarize patch patterns and problems. Static code auditing often requires human expertise, their corresponding vulnerability patterns. In particular, given and cannot scale well when the program complexity increases. the original and patched versions of a binary program, SPAIN Vulnerabilities, once discovered, are often fixed by applying locates the functions that have been changed from the original security patches. In that sense, security patches carry important binary to the patched binary. Then, it detects the changed traces information about vulnerabilities. By focusing on the remedy of (i.e., a sequence of basic blocks) for each patched function to vulnerabilities instead of the vulnerabilities themselves, patch capture the function-level changes. These traces may contain 1558-1225/17 $31.00 © 2017 IEEE 460462 DOI 10.1109/ICSE.2017.49 security or non-security patches. Finally, it identifies security BB1 BB1 patches through a semantic analysis of these traces and sum- call BN new call BN new 1: mov edi, eax a: test eax, eax BB3 error: ... mov edi, eax marizes the patterns through a taint analysis on the patched ... 2: cmp eax, [edi+8] b: jz error functions. As we run more binary programs, we can maintain ... BB2 a database of patterns that will be continuously enriched to ... c: cmp eax, [edi+8] embrace the evolving and emerging of various vulnerabilities. ... There are many impactful applications of the learned patterns Original version Patched version in binary security like patch detection, vulnerability detection, Fig. 1: Running Example: NULL Pointer Dereference Vulner- automatic patch synthesis, and even patch/vulnerability trend ability and its Patch from OpenSSL 1.0.1l. analysis with the help of data analytic techniques. A more detailed discussion can be found in Section III-E. B. Running Example We evaluated SPAIN on several real-world projects. The ex- Figure 1 lists the partial X86 assembly code of one NULL perimental results demonstrated that SPAIN can identify pointer dereference vulnerability abstracted from the OpenSSL security patches with high accuracy and high scalability, and 1.0.1l that consists of the original version and patched version. summarize 5 security patch patterns and their corresponding In the original version, the program first calls the function vulnerability patterns for 5 types of vulnerabilities. Moreover, NB_new by which the return value is stored in the register we discovered undocumented security patches and found 3 eax. Then, it assigns the return value to the register edi at the zero-day vulnerabilities in Adobe PDF Reader. location 1. Later, the return value is directly used to dereference In summary, our work makes the following contributions. the memory [edi+8] at the location 2. This vulnerability is • We proposed a scalable binary-level patch analysis frame- patched in OpenSSL 1.0.1m by checking whether the return work SPAIN, which can identify security patches and value of the function call NB_new is NULL or not. This summarize patch patterns and their corresponding vulner- checking is implemented in the patched version by adding ability patterns. two instructions: test eax, eax and jz error. It first • We implemented SPAIN in a prototype and conducted tests the return value at the location a by test eax, eax experiments to demonstrate the accuracy, scalability, and which performs a bitwise AND operation on the operand eax. application of SPAIN. The test operation sets the carry flag cf and overflow flag • We discovered undocumented security patches and found of to 0. The sign flag sf is set to the most significant bit of 3 zero-day vulnerabilities in Adobe PDF Reader. the result of the bitwise AND operation. The zero flag zf is set to 1 if the result of AND operation is 0, 0 otherwise. The II. PRELIMINARIES AND OVERVIEW parity flag pf is set to 1 if the number of ones in that byte is A. Preliminaries even, 0 otherwise. The value of the adjust flag af is undefined. In this work, we assume that binary programs are in the After assigning the return value to the register edi, it checks X86 32bit format. Each binary program contains a number of whether the zero flag zf is 1 or not (i.e., the value of eax is functions with an entry point (starting function). This section Null or not) at the location b.Ifitis1 (i.e., eax is NULL), introduces several basic concepts