Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries

Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries Clemens Kolbitsch Thorsten Holz Christopher Kruegel Engin Kirda Secure Systems Lab Secure Systems Lab University of California Institute Eurecom Vienna University of Technology Vienna University of Technology Santa Barbara, USA Sophia-Antipolis, France Vienna, Austria Vienna, Austria [email protected] [email protected] [email protected] [email protected] Abstract—Unfortunately, malicious software is still an un- via spam e-mails, new binary files that should be executed solved problem and a major threat on the Internet. An impor- on the compromised host, or a list of targets for logging tant component in the fight against malicious software is the keystrokes. This remote configuration mechanism gives an analysis of malware samples: Only if an analyst understands the behavior of a given sample, she can design appropriate attacker flexible control over the infected machine. Hence, countermeasures. Manual approaches are frequently used to she can arbitrarily configure the compromised host to carry analyze certain key algorithms, such as downloading of encoded out her malicious deeds. updates, or generating new DNS domains for command and Understanding what actions a given sample performs is control purposes. important to be able to design corresponding countermea- In this paper, we present a novel approach to automatically extract, from a given binary executable, the algorithm related sures and mitigation techniques. For a security analyst, un- to a certain activity of the sample. We isolate and extract these derstanding the remote control mechanisms is especially in- instructions and generate a so-called gadget, i.e., a stand-alone teresting as these provide valuable clues about the malware. component that encapsulates a specific behavior. We make sure Unfortunately, analyzing the configuration mechanisms (and that a gadget can autonomously perform a specific task by also all the other activities of a malware binary) is a including all relevant code and data into the gadget such that it can be executed in a self-contained fashion. challenging and complex task. Typically, the analyst does not Gadgets are useful entities in analyzing malicious software: have access to the source code of the malware sample. As a In particular, they are valuable for practitioners, as under- result, the analysis needs to operate on the binary executable. standing a certain activity that is embedded in a binary Furthermore, the analysis is complicated by the fact that sample (e.g., the update function) is still largely a manual and the adversary can arm the binary with different kinds of complex task. Our evaluation with several real-world samples demonstrates that our approach is versatile and useful in obfuscation and evasion techniques (e.g., [1], [2]) to hamper practice. and resist analysis. Thus, there is general consensus among practitioners that the static analysis of malware is generally a difficult task [3]. I. INTRODUCTION Because of the shortcomings of static techniques, dynamic Malicious software (malware) is the driving force behind analysis techniques are often used in practice. However, dy- many of the attacks on the Internet today. For example, spam namic analysis also has some limitations (e.g., execution of e-mails are commonly sent via spambots, denial-of-service a single path, identification of virtual environments, etc.) [4], attacks caused by botnets threaten the availability of hosts [5]. Furthermore, such systems do not provide support for on the Internet, and keyloggers steal confidential information automatically extracting the configuration mechanism or from infected machines. other aspects of a sample under analysis. Although malware has been around for a long time, it In practice, a human analyst often needs to spend a consid- has been significantly evolving in its nature. For exam- erable amount of time manually decoding and analyzing the ple, whereas malware was largely distributed as individual, malware sample in order to understand the key algorithms stand-alone programs ten years ago (e.g., viruses, worms), it embedded in the sample. An example for such a key is now being increasingly deployed as software that can be algorithm is the domain generation algorithm of malware remotely controlled by its creators. Most malware instances samples that use domain flux [6]. With domain flux, each bot implement some kind of communication channel between periodically generates a list of domains that are then used the running instance and the attacker. Typically, this channel to contact the attacker. As the attacker knows the domain is used to update, control, and communicate with malicious generation algorithm, she can set up an infrastructure and software. For example, the attacker can use the channel to register these domains in advance. During the analysis, the send a malware instance new URLs that should be advertised analyst is interested in extracting these embedded algorithms such that she can also precompute the domains that will be The gadgets we generate can perform all necessary actions used in the future [7]. that the original function embedded in the malware sample Another example of a key algorithm that needs to be man- is to perform. That is, we do not need additional helper ually analyzed is the decoding function that is embedded in a applications to relay the traffic between the extracted code sample. The malware uses this function to decode obfuscated and the network (e.g., such as network proxies as in [10]). configuration files [8]. With the decoding function at hand, The case studies we used in our evaluation demonstrate the analyst can decode and analyze spam templates that are that the gadgets we automatically generate provide the same sent to the malware. malicious functionalities that were originally embedded into In this paper, we aim at improving the state of the art by the malware samples. For example, we show that we can presenting a novel approach to automatically extract from a generate a gadget that autonomously downloads data from given malware binary the instructions that are responsible for the network, and decodes it using a proprietary algorithm to a certain activity of the sample. We term these instructions obtain an executable. Another gadget we extracted enables a gadget since they encapsulate a specific behavior that us to decode encrypted network traffic. Furthermore, our can autonomously perform a particular task. The key idea transformation enables an analyst to influence the behavior behind our approach is that the malware binary itself has to of a given gadget by manipulating the function calls invoked contain all necessary instructions to perform the malicious by the extracted code. Using this feature, the analyst can operations that we are interested in. Hence, if we are able to perform a deeper analysis of the malicious functionality isolate and extract these instructions (i.e., gadgets) in such a provided by the gadget. For example, she can intercept date way that we can reuse them again in another application, we checks, and return arbitrary values to the gadget to determine can perform a specific task of the malware (e.g., download the effect on the execution. the current set of URLs that should be advertised in spam In practice, executing extracted gadgets instead of the mails) in a self-contained way, without the need of executing original malware has the following important advantages: the whole malware binary. Note that we do not need to • Since we are dealing with malicious software, the understand the behavior of the malware. We can simply sample is potentially harmful. If we can extract only reuse the code extracted from the sample. the parts relevant to a certain computation and execute To achieve this goal, we have implemented a tool called them in a stand-alone fashion, we reduce our exposure INSPECTOR (abbreviation for Inspector Gadget) that au- to the malicious code. tomatically extracts gadgets from a given malware binary. • We can immediately carry out a certain operation the In a first phase, INSPECTOR performs dynamic program malware performs, instead of requiring to wait for time- slicing [9] on the malware binary to extract a slice (i.e., outs, sleep operations, or commands that are sent over an algorithm) with “interesting” behavior. This could be, the command and control server. for example, a slice that downloads a piece of binary data • We can identify in-memory buffers that hold decrypted from the Internet, deobfuscates this data to obtain a binary data. These can be extracted easily with the help of the executable, and then writes this file to the hard disk. gadget compared to running the sample in a debugging Clearly, applying program slicing to malicious input is environment, and manually inspecting memory. a difficult task. However, we show in several case studies Further, we also show how some gadgets can be inverted. that INSPECTOR can indeed handle common obfuscation That is, we can use a gadget as a black box to compute what techniques such as binary packing or self-modifying code specific input causes a given output. Inverting gadgets is use- found in real-world malware. Note that we extract com- ful in many real-world scenarios. For example, inversion can plete algorithms from the binary. This is more complex be invaluable for automatically decoding a network trace that and difficult than only extracting specific functions (such was encoded by a specific malware sample under analysis. as in [10]) since we need to consider all dependencies In this work, we show how INSPECTOR can use optimized between functions, their side-effects, and relevant auxiliary brute-forcing techniques to compute these inverse gadgets, instructions (e.g., stack manipulation, or loops). and demonstrate with the help of a practical example the In a second phase, INSPECTOR generates a stand-alone usefulness of this technique. gadget based on the extracted algorithm.

Load more