Disco: Combining Disassemblers for Improved Performance
Total Page:16
File Type:pdf, Size:1020Kb
DisCo: Combining Disassemblers for Improved Performance Sri Shaila G Ahmad Darki Michalis Faloutsos University of California, Riverside University of California, Riverside University of California, Riverside USA USA USA [email protected] [email protected] [email protected] Nael Abu-Ghazaleh Manu Sridharan University of California, Riverside University of California, Riverside USA USA [email protected] [email protected] ABSTRACT KEYWORDS Malware infects thousands of systems globally each day causing Ensemble, Disassembly Tools, ARM and MIPS architecture, Ghidra millions of dollars in damages. Which disassembler should a mal- Bug Discovery, Improving Disassembly ware analyst choose in order to get the most accurate disassembly ACM Reference Format: and be able to detect, analyze and defuse malware quickly? There Sri Shaila G, Ahmad Darki, Michalis Faloutsos, Nael Abu-Ghazaleh, and Manu is no clear answer to this question: (a) the performance of disas- Sridharan. 2021. DisCo: Combining Disassemblers for Improved Perfor- semblers varies across configurations, and (b) most prior work on mance . In ACM Publications RAID ’21. ACM, New York, NY, USA, 15 pages. disassemblers focuses on benign software and the x86 CPU archi- https://doi.org/10.1145/nnnnnnn.nnnnnnn tecture. In this work, we take a different approach and ask: why 1 INTRODUCTION not use all the disassemblers instead of picking one? We present DisCo, a novel and effective approach to harness the collective ca- pability of a group of disassemblers combining their output into an ensemble consensus. We develop and evaluate our approach using 1760 IoT malware binaries compiled with different compilers and compiler options for the ARM and MIPS architectures. First, we show that DisCo can combine the collective wisdom of disassem- blers effectively. For example, our approach outperforms the best contributing disassembler by as much as 17.8% in the F1 score for Figure 1: DisCo effectively combines disassemblers for su- function start identification for MIPS binaries compiled using GCC perior performance: The gain in the performance over the with O3 option. Second, the collective wisdom of the disassemblers best input can be as high as +17.8% with a combined perfor- can be brought back to improve each disassembler. As a proof of mance 99.8% in F1 score combining five disassemblers. The concept, we show that byte-level signatures identified by DisCo can results shown are for malware binaries compiled with GCC improve the performance of Ghidra by as much as 13.6% in terms for MIPS with the O3 compilation level. The approach can of the F1 score. Third, we quantify the effect of the architecture, work with different sets of disassemblers. We show the im- the compiler, and the compiler options on the performance of dis- provement using only freely available disassemblers (+12.4% assemblers. Finally, the systematic evaluation within our approach with a total of 94.4%). led to a bug discovery in Ghidra v9.1, which was acknowledged by the Ghidra team. Binary disassemblers are essential front-line tools in malware defense: timely and efficient reverse engineering of the malware CCS CONCEPTS binary is critical. An incident in 2017 highlights the importance • Software and its engineering ! Software organization and of this issue: two ransomwares, WannaCry and Petya infected properties; • Computing methodologies ! Ensemble methods; over 230 000 Windows PCs across 150 countries by exploiting a • Security and privacy ! Software and application security. vulnerability in Microsoft’s implementation of the Server Message Block protocol in a span of one day. These attacks cost over $4 billion dollars in damages [11, 32]. The rapid spread of these malwares had malware analysts racing against time to understand their mode of Permission to make digital or hard copies of part or all of this work for personal or propagation and operation in order to contain them. classroom use is granted without fee provided that copies are not made or distributed Which disassembler should a malware analyst choose for a for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. rapidly-spreading malware binary to get the most accurate results? For all other uses, contact the owner/author(s). This is the question that motivates our work. Here we focus on RAID ’21, October 6-8, 2021, Donostia / San Sebastian, Spain malware that targets MIPS and ARM architectures, given that such © 2021 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3916-2. malware has received significantly less attention. In addition, these https://doi.org/10.1145/nnnnnnn.nnnnnnn architectures are widely used in IoT devices, which are increasingly 1 RAID ’21, October 6-8, 2021, Donostia / San Sebastian, Spain Sri Shaila G, Ahmad Darki, Michalis Faloutsos, Nael Abu-Ghazaleh, and Manu Sridharan becoming targets of choice for malware [4, 56]. Currently, there is Specifically, we consider the following configuration options: (a) no clear answer to the above question, for the ARM and, and even two architectures, MIPS and ARM, (b) two different compilers, more so, the MIPS architecture. We elaborate on two contributing GCC and Clang, and (c) five compiler optimization levels. Note factors to this problem. that we focus on the function start identification metric, which First, there is a plethora of disassemblers both free and commer- is a key disassembly metric [25]. To quantify the variability, in cial, but research so far has not determined a clear and consistent addition to the average performance, we consider the 5-percentile winner. The performance of a disassembler can vary based on the of the worst case performance (5PWC), which we define later. type of the malware binary, which can be created by using various: Finally, we introduce the Relative Performance Improvement (a) compilers, (b) compiler optimization flags, and (c) target CPU (RPI) metric as the difference between the performance of DisCo for architectures. These variations can lead to significant differences a group of disassemblers and that of the best performing individual in the assembly code found in the resultant binary [12, 18]. Note disassembler in the group. that determining the compilation parameters for a given stripped In summary, the contribution of our work can be summarized in binary is a challenging task. Despite some recent studies [2, 25], the following key observations: the effect of these variations on the accuracy of disassembly are a. DisCo is effective in combining disassemblers. DisCo can not well understood, especially for the MIPS architecture. combine the capabilities of different groups of disassemblers to Second, the average performance of a disassembler may not be achieve relative performance improvement, RPI. In table 1, we show sufficient to inform the correct answer: we need to assess its worst that DisCo achieves an RPI of up to 17.8% across various configu- case performance as well. This reliability aspect of a disassembler is ration options. Furthermore, there is an even larger improvement lost when we only report average performance, and even standard of up to 27.5% in the worst case 5PWC metric. We showcase the deviation does not fully capture it. The ideal disassembler should effectiveness of our approach visually in figure 1. Considering only offer accurate disassembly consistently for each binary. Going back our four non-commercial disassemblers, from table 1, we see that to our motivating example, the binary at hand may belong to the DisCo(Free) has an RPI of up to 12.4% for MIPS. minority group of binaries for which the overall-best disassem- b. DisCo can be used to improve other disassemblers. We bler performs poorly. This poor performance at "crunch-time" can show that the collective power of the disassemblers, which DisCo translate into massive financial and societal damage. synthesizes, can be brought back to improve each disassembler. There is limited prior work for the our problem here as it fo- As a proof of concept, we create, Ghidra+, an improved version of cuses on: (a) malware, and (b) the MIPS and ARM architectures. In Ghidra, which can achieve up to 13.6% better F1 score compared to contrast, most previous work seems to focus on benign binaries Ghidra. Our systematic evaluation of disassemblers also reveals a and the x86, and most recently, the ARM architecture. We highlight bug in Ghidra v9.1, which was acknowledged by the Ghidra team. the two most relevant studies. The most recent work [25] evaluates c. Configuration options affect disassembly performance several disassemblers using only benign binaries and for the ARM significantly including their relative ranking. We find that the architecture only. Another work [2] evaluates disassemblers for the ranking of the best performing disassemblers varies for different x86-64 architecture. Both studies [2, 25] endorse IDA Pro, a popular configurations. For example, Angr is the best among thegroup commercial disassembler, in terms of overall performance and find for MIPS with O3 for both GCC and Clang, but it performs the that accurate function start identification remains a challenge. We lowest for the ARM architecture (see table 2). Furthermore, we revisit previous work in section 6. find that the compiler optimization levels impact the performance In this work, we take a different approach and pose the question: significantly: most disassemblers do fairly well with O0 andO1, Why don’t we benefit from the wisdom of all the disassemblers in- but do worse with O3 option. These observations and our results stead of picking one? To this end, we present DisCo (Disassembler in general strongly argue in favor of the promise of a combined Combination), a systematic approach to harness the collective ca- solution. pabilities of a group of disassemblers to obtain superior results. Open-sourcing and data sharing.