Formalization and Detection of Host-Based Code Injection Attacks in the Context of Malware

Formalization and Detection of Host-Based Code Injection Attacks in the Context of Malware Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultat¨ der Rheinischen Friedrich-Wilhelms-Universitat¨ Bonn vorgelegt von Thomas Felix Barabosch aus Andernach Bonn 2018 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultat¨ der Rheinischen Friedrich-Wilhelms-Universitat¨ Bonn 1. Gutachter: Prof. Dr. Peter Martini, Rheinische Friedrich-Wilhelms-Universitat¨ Bonn 2. Gutachter: Prof. Dr. Wim Mees, Konigliche¨ Militarakademie¨ Brussel¨ Tag der Promotion: 04.09.2018 Erscheinungsjahr: 2018 ii Summary The Internet faces an ever increasing flood of malicious software (malware). Threat actors distribute millions of new malware variants every year. They do so for a variety of reasons such as financial gain or political power. The sophistication of malware as well as their target platforms steadily increase. Since a couple of years malware has often utilizes a platform-independent technique called Host-Based Code Injection Attack (HBCIA). This attack denotes the local injection of code from an attacker entity into a victim entity. Both entities are usually operating system processes. The malicious code runs within the context of another process, which is contrary to the common belief that each program possesses its own process space. HBCIAs allow the attacker the intercep- tion of critical information, escalation of privileges, and covert operation. This trend of conducting local code injections creates a challenge in detecting malware since it blends into the behavior of benign processes. Therefore, this thesis addresses this challenge and elaborates on new ways to detect code injections. So far, no basic research on HBCIAs in the context of malware has been carried out. There is a lack of understanding in terms of problem definition and problem size. There- fore, we built a model and formally defined HBCIAs. Based on this model, we introduced a taxonomy that allowed us to classify malware according to their algorithms. Then, we showed that almost two thirds of malicious samples of our representative corpus lever- aged HBCIAs. This finding implies that local code injections are a relevant problem for security researchers since the detection of HBCIAs implies the detection of a huge share of today’s malware. Especially due to the fact that leading operating systems, such as Microsoft Windows, Linux, macOS and Android, are all prone to this attack. After the problem formalization and problem size estimation, we present two approaches: one static and one dynamic method to detect HBCIAs. Since HBCIAs are a behavior exhibited during execution, it is important to detect its occurrence to prevent further damages. Therefore, our first system Bee Master dynamically detects HBCIAs at runtime. It transfers the honeypot paradigm to processes. Its main component the Queen Bee observes several child processes called Worker Bees. Each Worker Bee mocks a pos- sible victim process like Explorer.exe. The behavior of each Worker Bee is a priori known so that new threads or new memory regions imply an HBCIA. Our approach differs from related work due to its platform-independence, high abstraction level and focus on malware. We implemented and evaluated Bee Master for several Windows and Ubuntu Linux versions. The evaluation with several prevalent representatives of HBCIA-employing malware families as well as many benign programs shows that Bee Master reliably detects HBCIAs without false positives. One major source of information is the memory of victim machines. It reflects the machine’s state and in case of an attack it allows us to derive valuable insights about the attacker as well as their hacking tools. However, it is difficult to pinpoint an attack in memory. If malware conducts HBCIAs, it is well hidden in benign processes. Hence, our second system Quincy addresses the challenge to statically detect HBCIAs in memory dumps. Its detection heuristic is based on machine learning. We constructed 36 features based on domain knowledge and selected the most appropriate ones. At its core, the detection heuristic leverages a tree-based machine learning algorithm. We evaluated Quincy with seven algorithms of which Extremely Randomized Trees performed best. Subsequently, we evaluated it on three Windows version with an high quality corpus comprising more than a thousand benign and malicious programs. The results show that Quincy improves upon the current state of the art by more than eight percent, when comparing both systems using the ROC AUC score. iv Publications This thesis is mainly based on the following three peer-reviewed publications: • Thomas Barabosch and Elmar Gerhards-Padilla, Host-Based Code Injection Attacks: A Popular Technique Used by Malware, 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE), IEEE, 2014 • Thomas Barabosch, Sebastian Eschweiler and Elmar Gerhards-Padilla, Bee Master: Detecting Host-Based Code Injection Attacks, Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), Springer International Publishing, 2014 • Thomas Barabosch, Niklas Bergmann, Adrian Dombeck and Elmar Padilla, Quincy: Detecting Host-Based Code Injection Attacks in Memory Dumps, Detection of Intru- sions and Malware, and Vulnerability Assessment (DIMVA), Springer International Publishing, 2017 The following peer-reviewed publications evolved (directly or indirectly) from our research for this thesis: • Thomas Barabosch, Andre Wichmann, Felix Leder and Elmar Gerhards-Padilla, Au- tomatic Extraction of Domain Name Generation Algorithms from Current Malware, NATO Symposium on Information Assurance and Cyber Defense (IST-111), 2012 • Thomas Barabosch, Adrian Dombeck and Elmar Gerhards-Padilla, ParasiteEx: Dis- infecting Parasitic Malware Platform-Independently, Future Security (FuSec), Fraun- hofer Verlag, 2015 • Thomas Barabosch, Adrian Dombeck, Khaled Yakdan and Elmar Gerhards-Padilla, BotWatcher: Transparent and Generic Botnet Tracking, Research in Attacks, Intru- sions, and Defenses (RAID), Springer International Publishing, 2015 • Thomas Barabosch and Elmar Gerhards-Padilla, Behavior-Driven Development in Malware Analysis, The Journal on Cybercrime & Digital Investigations, Volume 1, Number 1, CECyF, 2016 v Contents 1 Introduction1 1.1 Problem Statement...............................1 1.2 Research Questions...............................3 1.3 Main Contributions...............................4 1.3.1 Basic Research on HBCIAs.......................5 1.3.2 Detection of HBCIAs at Runtime...................5 1.3.3 Detection of HBCIAs in Memory Dumps...............5 1.4 Roadmap....................................6 2 Basics 9 2.1 Malware.....................................9 2.2 Malware Analysis................................ 10 2.2.1 Dynamic Analysis............................ 10 2.2.2 Static Analysis............................. 12 2.2.3 Memory Forensic Analysis....................... 13 2.3 Honeypots.................................... 15 2.4 Machine Learning................................ 15 2.4.1 Samples and Features......................... 16 2.4.2 The Classification Problem and Supervised Learning........ 16 2.4.3 Decision Trees............................. 17 2.4.4 Ensemble Learning........................... 18 2.4.5 Forests of Randomized Trees..................... 19 2.5 Conclusion................................... 19 3 Related Work 21 3.1 Formalization and Study of Code Injection Attacks............. 22 3.2 Prevention of Code Injection Attacks..................... 22 3.2.1 Complication and Prevention of Code Execution........... 22 3.2.2 Randomization............................. 23 3.2.3 Integrity................................. 27 3.3 Detection of Code Injection Attacks...................... 29 3.3.1 Host-based Approaches........................ 29 3.3.2 Network-based Approaches...................... 33 3.4 Non-Scientific Work.............................. 34 3.4.1 Patents Related to Code Injections.................. 34 3.4.2 Security Products Related to Code Injections............. 35 3.5 Conclusion................................... 36 4 Defining Host-Based Code Injection Attacks 37 vii 4.1 Defining Code Injections............................ 38 4.1.1 Attacker Model............................. 39 4.1.2 Code Injections............................. 39 4.1.3 Host-Based and Remote Code Injections............... 40 4.1.4 HBCI/RCI vs. HBCIA/RCIA...................... 40 4.2 Advantages and Disadvantages of HBCIAs.................. 41 4.2.1 Advantages of Employing HBCIAs................... 42 4.2.2 Disadvantages of Employing HBCIAs................. 44 4.3 HBCIA Algorithms............................... 46 4.3.1 Victim Process Selection Strategy................... 46 4.3.2 Code Copying.............................. 49 4.3.3 Code Execution Strategy........................ 49 4.3.4 An HBCIA Algorithm Taxonomy.................... 53 4.4 The Future of HBCIA-employing Malware.................. 54 4.5 Conclusion................................... 55 5 Measuring Host-Based Code Injection Attacks 57 5.1 Prevalence of HBCIAs in Malware....................... 58 5.1.1 Data Set................................. 59 5.1.2 Methodology.............................. 65 5.1.3 Results.................................. 66 5.2 Preferred Victim Processes........................... 66 5.2.1 Data Set & Methodology........................ 67 5.2.2

Load more